Welcome!

Serious play describes the act of pursuing serious topics playfully. Scot Osterweil's Four Freedoms of Play, for example, provide one lens into how an attitude of playfulness and not taking oneself too seriously can promote motivation and flow, even on the challenges that matter most.

Spinning plates is a reference to one of the classic Richard Feynman stories in which Richard Feynman becomes burned out after accomplishing so much in physics, then rediscovers what it's like to enjoy physics by delighting playfully in solving a seemingly-unimportant physics problem.

This is my personal website. Here, I write about topics that feel like spinning plates and serious play to me: whatever inspires curiosity and investigation.



𓁧

What is that scarf?

If you meet me in person, you might notice that I’m carrying around one of these knitting projects. Both projects bear patterns that are imbued with deeper mathematical stories.

Rule 110 Scarf

My first knitting project is a scarf that depicts Wolfram’s Rule 110 elementary cellular automaton.

Hang on, what’s an elementary cellular automaton?

Maybe you’ve heard of Conway’s Game of Life? Conway’s Game of Life is a two-dimensional cellular automaton; it features a grid of pixels, and the grid evolves according to a particular ruleset, timestep by timestep.

Wolfram’s elementary cellular automata are similar, but they’re one-dimensional; a single line of pixels evolves according to a particular ruleset, timestep by timestep. Typically, the time dimension is shown going downwards, as in the above depiction of rule 110.

Furthermore, elementary cellular automata are constrained so that each pixel in the next generation can only depend on the three pixels immediately above it.

(image credit)

There are many ways to create rules for elementary cellular automata. Each rule is uniquely defined by the set of eight pixels that it prescribes as the descendant for each of the eight possible three-pixel combinations of ancestor pixels. Therefore, there are 2^8 = 256 possible elementary cellular automaton rules, of which rule 110 is merely one.

Why choose Rule 110?

Rule 110 after 250 iterations

Rule 110 after 250 iterations

It just so happens that Rule 110 is Turing-complete, as proven in Universality in Elementary Cellular Automata (Cook 2004)!

(Conway’s Game of Life, incidentally, is also Turing-complete, as shown by such lovely constructions as Universal Turing Machine in Life or even Life in Life)

The proof is a cute little construction. First, it identifies several gliders which move predictably through a specially-defined “ether” (a repeating background pattern), and then it classifies the ways in which the gliders can collide and interact depending on their spacing at the time of their collision.

Three gliders

Three gliders (image credit)

Two types of collision

Two types of collision (image credit)

By carefully handling the ways in which gliders are initialized and are set up to collide, it is possible to carry out enough computation in order to emulate cyclic tag systems, a formalism which was already known to be Turing-complete.

This particular knitting project is close to my heart, as Rule 110’s Turing-completeness gives it a certain numinous beauty and mystery. Why, of Wolfram’s 256 elementary cellular automata, should it be the case that that there is even one that is Turing-complete? Amidst the unruly chaos of Rule 30’s pseudorandom number generation and the ordered austerity of Rule 90’s perfect Sierpinski triangles, what deep mathematical forces conspired to give us this – this simple rule which, by itself alone, is capable of just enough complex behavior to be capable of carrying out computations? It balances on the knife’s edge between the dynamic and the static, exhibiting a kind of critical behavior of the sort that can also be seen in discussions of neuroscience and the physics of phase transitions.

Rule 30, sometimes used as a pseudorandom number generator

Rule 90, which generates Sierpinski triangles when initialized with a single pixel

The scarf also serves for me as a physical symbol of the following cluster of philosophical problems: What ethics govern the simulation of minds? If I were to create a scarf with a simulation of a mind in a video game environment (such as the one in The Talos Principle), is it at all meaningful – ethically – for me to sit there and knit out the entire scarf so that the simulation’s output is legible to us, or is it equally meaningful for me to merely cast on the start state of the scarf, and note how I might allow the computation to proceed if I were to knit out the entire scarf? If the two situations are the same, is it also equally meaningful for me to merely think about creating the scarf, designing its start state in my head, but never casting it onto physical yarn? And if the full spectrum of situations are not the same, then why?

I used YouTube videos to learn the technique of double-knitting in order to be able to construct a scarf with a two-color pixelated pattern. Due to the double-knitting constraint, the pixels on one side of the scarf are inverted on the other side, so technically it’s a scarf bearing Rule 110 on one side and Rule 193 on the other.

Rule 105 Scarf

Unlike the rule 110 scarf, this knitting project is my current work in progress.

The premise: Let’s say you want to create a Möbius strip scarf… but you also want it to be a cellular automaton.

Simply diving in to knitting any old cellular automaton won’t do, because in order to make it be Möbius, you must flip it around and join it to itself as the end of the process. For example, I clearly cannot do this with my rule 110 scarf. As shown, the final row does not proceed smoothly into the starting row, as would have been required!

What’s more, even deeper than this requirement about the final row’s behavior is the requirement that the whole cellular automaton rule must continue to be the same, even when we flip the scarf around. As we can see in the above example, leftward-leaning white triangles on blue (Rule 110) is a completely different pattern texture from rightward-leaning blue triangles on white (Rule 193).

This imposes the following two constraints on any cellular automaton we choose to Möbiusify:

  1. The rule must be left-right symmetric.
  2. The rule must be zero-one symmetric.

Alternatively put, using Wolfram’s terminology, the rule must be equal to its mirrored complement.

This requirement limits us to a small handful of valid rules, many of which are so simple as to be blindingly, boringly unaesthetic. (Rule 51 is one example of a blindingly boring rule, not that I have any problem with stripes.)

I decided to choose Rule 105 for its aesthetics. Rule 150 is another nice one that I considered.

I also needed to choose how to deal with the boundary conditions. I decided to treat the edges as if the scarf wraps around on itself. In a certain sense, this makes it perhaps more like a Klein bottle than a Möbius strip, spiritually if not literally.

Given all of the above parameters, the only thing that remained to do was to find a starting state that produces a pattern that repeats on itself in an interesting, pretty way. Most starting states only repeat on themselves after a small number of timesteps – boring! I wanted to maximize the cycle length of my scarf, for maximum aesthetic appeal.

I sent over all of the above parameters to Anders as a small puzzle, and three lines of Mathematica later, received a characterization of the complete 22-dimensional optimal solution space on scarves of 38 columns, from which I chose the following start state. Using this start state, the pixels at row 511 are exactly what they need to be such that the row at 512 is a left-right-flipped and zero-one-flipped copy of row 1 – the mirrored complement!

Row 1 and the first several rows.

Row 512 is the mirrored complement of row 1.

I’m practicing the provisional cast-on knitting technique in order to make sure that once I reach row 511, I will be able to seamlessly knit the scarf to itself with nobody the wiser about where the scarf ever began or ended. That’s the red yarn in the below picture.

Right now I’m around 30 rows in. It’s going to be a long one!

I was inspired to start knitting by some friends who use knitting as a way to keep their hands busy while still attending to verbal inputs, such as having conversations or listening to others. The knitting projects themselves were inspired by the work of Fabienne Serriere (KnitYak), who creates scarves with cellular automata and other beautiful mathematical patterns using her industrial knitting machine. (She’s even got some of Rule 110 and Rule 105!)

Future work

Once I finish the Möbius strip scarf, the ultimate knitting project for me would be to create another Rule 110 project and have it actually encode a real Turing machine this time, unpacking the lessons from Cook’s proof of Rule 110’s Turing completeness in order to do so. I would pick a small Turing machine, convert it to a universal cyclic tag system using the polynomial reduction of Neary and Woods (2006), and then convert that into a Rule 110 start state.

A Concrete View of Rule 110 Computation (Cook 2009) demonstrates by example how to perform these reductions in order to compile a Turing machine into a Rule 110 cellular automaton. The resulting pattern would be wide enough that it would need be a blanket instead of a scarf!

If I wanted to truly satisfy my artistic preferences over such a project, I’d make sure to pick a Turing machine that’s a nice meaningful one. It’s too bad that A Relatively Small Turing Machine Whose Behavior Is Independent of Set Theory (Aaronson and Yedidia 2016) has 7,918 states, otherwise I would love to encode it!

The best-known five-state Busy Beaver Turing machine might be a nice, simple Turing machine to use for such a project. The best candidate currently known for BB(5) takes 47,176,870 steps before finally halting.

State transitions for a candidate BB(5), as given in Attacking the Busy Beaver 5.

A space-time history of the activity of Rule 110, started at the top with a row of randomly set cells, from Universality in Cellular Automata. Perhaps you see now what I mean by “blanket”.

𓍤

Reflections on Pilot’s Series B

Pilot, the startup I work at, announced our Series B today.

This is really exciting for me. I’ve been here for a little over a year and a half now. I joined right before the Series A, and since then we’ve outgrown offices twice and hired to more than 4x what our size was when I joined. This funding round raises our valuation to $355M (!) and also includes a strategic investment from Stripe, aligning their interests with ours (!!).

I joined Pilot in order to learn more stuff about engineering and also to learn more stuff about organizational efficiency and effectiveness, and I’ve been really satisfied on both fronts. Right after I joined I sometimes described the decision to friends as “Well, this is the founders’ third startup together, and their first two were successful exits, so… [eyebrow wiggle suggesting probable future success].” Waseem, Jessica, and Jeff really do know what they’re doing, and I haven’t been disappointed.

Optimize processes, not just tech

At Pilot I’ve learned a bunch about the optimization of human-centered processes and why it’s important. As someone who came from a hard computer science background, this lesson is one that I’ve only really absorbed during the last year or two, and it came both from observing Pilot and from observing other highly functional institutions within my broader community.

Pilot works on making bookkeeping less painful for humans to do, so that we can provide it at scale. This means not only that the engineering team’s automation tools have to be really good, but also that our human-centered processes for onboarding new team members and for spreading knowledge around have to be really good. As Catherine Olsson summarized my own description to me the other day, Pilot is “actually in the process optimization business.”

Checklists, common knowledge generation, and open lines of interpersonal communication – these and other tools are all really important to designing well-functioning processes for organizations made out of people.

Aim at real problems, and propagate incentives

I’ve also learned that the powerful engine of process optimization absolutely must be aimed at real problems in the world. I think this is a crucial puzzle piece that distinguishes startups that succeed from startups that don’t. I’ve liked getting to observe how the real-world problems faced by Pilot’s clients get propagated all the way down to the level of engineers’ day to day efforts.

Our organization is actually set up such that the incentives are really well aligned throughout. Potential clients (the source of real-world problems) come in via the sales team. Fulfilling our service to our clients becomes the problem of our operations team, who is tasked with closing the books. We record metrics about what’s taking our operations team the most time, as well as free-form feedback, and then the product team distills that data into goals for engineering and design.

The entire product process is legible and transparent to people on the engineering team, so I can audit what I’m working on and why, and question decisions at the appropriate level if I feel something’s off. Team members are also empowered to propose product directions directly.

If anything, the one thing I’d worry about with our setup is a hypothetical world where the most stress to deliver our service to our clients is felt by the operations team, and where the engineering team doesn’t feel the fire as much as the operations team does. We have been addressing this by specifying specific goals for the engineering team to meet in order to properly support the operations team, so I can trust that we will succeed inasmuch as we craft those goals carefully. And since we’re using data and metrics to motivate our choices, plus the management team sometimes ends up doing a little bookkeeping if we find ourselves under-capacity, I can trust that the organization’s incentives are aligned to craft those goals well.

Culture can be engineered

I’ve learned by example what it looks like for a community to deliberately engineer itself to have a good culture. On day 1, I was impressed when Jessica replied to a member of the team pointing out something to fix about part of our infrastructure by responding with “Good callout,” encouraging honest feedback. Just last month, a member of the engineering team took on the task of writing up a doc about our code review standards, making implicit social norms explicit. From various other members of the team I’ve learned about blameless retrospectives, about the New York Times rule, and about two kinds of feedback.

Pilot also has some employees with roots in the Recurse Center, whose social rules and attention to inclusiveness are another example of an organization achieving a well-engineered culture.

While we don’t currently have an explicit set of company values, as an individual participant in company culture I can tell from everyday communications that we think of ourselves as valuing work-life balance, diversity, and open communication, that we take actions that promote those values, and that we talk about those values often, thereby elevating them to common shared culture.

Short feedback loops can be engineered

Although I already knew that short feedback loops are important for getting good engineering velocity (write unit tests! use debugging tools!), one thing I’ve learned at Pilot is that it’s possible to structurally engineer an organizational system such that it has short feedback loops built in. In particular, our engineering team writes software for our in-house users, which means I have the luxury of not needing to regularly set up A/B tests in order to infer what’s going on in users’ minds; instead, we can ask our users directly.

Processes can become stale – change them

I’ve also seen firsthand that as organizations scale, the old processes and ways of doing things don’t necessarily work anymore and you have to find new ones. I’ve been pleased that as we’ve been growing, the management team has been attentive to whether there are any growing pains showing up in our processes, and I’ve been pleased that we’ve been actively working on moving to newer ways of doing things that work better for our current situation, rather than sticking with old solutions because they’re the way things have always been done.

Since I have joined, we have actively improved our systems for triaging issues, for tracking projects, and for conducting engineering team meetings, each time preventing ourselves from getting stuck in old ways that weren’t serving us as well anymore.

This seems like a pretty hard category of problem in general, and I’m interested in learning more heuristics for how to notice it and address it. It seems that as organizations grow larger, oftentimes problems like this can slip through the cracks.

Conclusions

There are countless other things I’ve learned that I could mention as well. I’ve learned a lot of stuff about good habits for frontend development in Vue.js from Pilot’s first engineer. I’ve learned what good management feels like as an employee from our engineering manager’s care and diligence. I’ve learned new things about computer security directly from the founder of the Twisted project, including how SSL certificate issuance works and how to do formal threat modeling.

Overall, I’ve been really interested in institutioncraft lately, and Pilot feels like a success case in institutioncraft from which I’ve learned a whole bunch. Of course, the things I’ve learned so far, like all things I learn, are a work in progress that is continued every day. I’m looking forward to building more stuff with these amazing people and to learning more as we grow together.

𓈵

Reversible computing, and a puzzle

If you’ve seen my other posts, you probably already know that I am a sucker for good visual notations. Some of my favorites include circuitry for lambda calculus and Feynman diagrams.

So when I heard about a graphical notation for linear algebra, I really wanted to learn how it works. I decided to learn enough about Graphical Linear Algebra to be able to use the notation to express and solve one of my favorite puzzles from quantum computing. I was curious whether graphical linear algebra would make the problem more intuitive, and I think the notation actually succeeds!

In this post, I’ll pose the puzzle, along with some relevant background about logic gates in reversible computing. In a future post, I’ll give the puzzle’s solution, both with and without the aid of graphical linear algebra.

Reversible computing

We usually think of quantum computing (as exemplified by quantum circuits) as being more powerful than classical computing (as exemplified by the Turing machine), because it operates over qubits and qubits can exist in superposition, unlike normal bits. However, it’s also true that programs written for quantum computers must obey some constraints that programs written for Turing machines need not obey! In particular, all operations in a quantum circuit must be reversible.

For example, the classical XOR gate consumes two inputs and produces a value that’s true if and only if the two inputs are different. XOR is an example of an irreversible operation; knowing the value of A XOR B does not always give you enough information to derive what the values of the two inputs had been beforehand. An XOR gate has no inverse.

In contrast, the quantum equivalent of an XOR gate must produce at least one more output bit in order to preserve reversibility. Not only that, it must also use that output bit in a way that ensures that the two outputs could in principle be reversed in order to recover the original two inputs. The CNOT gate is one way of implementing these constraints in order to create a component that can be used to calculate XOR.

X cnot Y

Unlike XOR, a CNOT gate has a logic gate that acts as its inverse. (All reversible gates must.)

The requirement for reversibility in quantum computing is, incidentally, an unavoidable consequence of the fact that the laws of physics require that information be conserved. In quantum physics, this concept is known as unitarity.

For more on this subject, I recommend Scott Aaronson’s excellent book Quantum Computing Since Democritus for a tour of the theory behind both classical and quantum computing. I also recommend the paper The Classification of Reversible Bit Operations by Aaronson, Grier, and Schaeffer if you want to learn more about logic gates in reversible computing.

Tangent. Real-world computers, like quantum computers (but unlike Turing machines), are actually built out of real physical stuff and must also use only reversible operations, because they work under the same laws of physics as quantum computers. Hidden under all of our careful abstractions, there are still trash bits and there is still physical entropy at play. It’s just that we programmers typically prefer to elide over those details and ignore them when thinking about algorithm design.

The details are still there, though; for example, it’s not possible to wipe a laptop’s hard drive, overwriting all of its data as 0s, without the entropy from that information being radiated out as some small amount of heat to the rest of the universe outside the laptop. This is a consequence of Landauer’s principle, first described by Rolf Landauer in a 1961 paper, Irreversibility and Heat Generation in the Computing Process. I would love to know more about the implications of Landauer’s principle on the energy efficiency of various algorithms when reversibility and entropy are taken into account – what low-hanging fruit are algorithm designers missing out on? – but haven’t yet found any good source material that combines the physics concepts with the theoretical computer science.

The puzzle

Consider the Fredkin gate, also known as CSWAP, a reversible gate which swaps bits 2 and 3 (below) only when bit 1 is true.

CSWAP

Consider also the Toffoli gate, also known as CCNOT, a reversible gate which inverts bit 3 if and only if bits 1 and 2 are both true.

CCNOT

Note that Toffoli gates are complete gates in reversible computing: that is, it’s possible to form any other reversible logic operation using only some combination of Toffoli gates. This sort of completeness is the same completeness that NAND has for Boolean logic.

(The Toffoli gate does not by itself form a complete gate set in quantum computing, but it does when it’s combined with only one other gate, the Hadamard gate. Here’s one proof of that fact.)

The puzzle: Construct a Fredkin gate using only Toffoli gates.

𓅰

Programming Languages as Notations, Deconstruct 2017

Last April, I attended GaryConf WATCON Deconstruct 2017, got to listen to some excellent speakers, and enjoyed the opportunity to give a talk of my own: Programming Languages as Notations. Here are the slides.

When I was deciding what to talk about, I had been reading a bunch about the history of notations in math and physics. It’s really fascinating how different people throughout history have designed different languages for representing fundamentally identical concepts – for example, we have different notations for arithmetic, different notations for solving problems in quantum electrodynamics, and different notations for manipulating vectors. Some thoughts on each of these I’d like to share:

  • Arithmetic: Compare Western civilization’s arithmetic notation and algorithms (for addition, multiplication, division, and fractions) to the notation and algorithms used in Ancient Egyptian mathematics. I didn’t end up mentioning Ancient Egyptian arithmetic in my talk, but it’s neat. Count Like An Egyptian by David Reimer is a fun resource for learning more.

  • Quantum electrodynamics: Although two techniques – brute-force algebra and Feynman diagrams – each target the same kinds of problem (quantum electrodynamics calculations), Feynman diagrams are an interesting innovation both because they create both a new abstraction for dealing with those problems at a higher level and because they represent that abstraction in a beautifully visual way. Read David Kaiser’s article Physics and Feynman’s Diagrams for some neat history here, or check out his book Drawing Theories Apart for a more in-depth look. I love finding examples of mathematical notations that provide a new abstraction over the problems they solve, and I love highly visual notations, so Feynman diagrams really hit my aesthetic buttons.

  • Vector notation is another example of a notation that creates a new abstraction: when we represent doubling some three-dimensional vector by writing down the notation 2x instead of 〈2x₁, 2x₂, 2x₃〉, for example, the vector abstraction has saved us from something that’s much like code duplication, and in the process has reduced the number of opportunities we have to accidentally introduce an error. Also, vector notation provides a hilarious example of Standardization Wars happening over one hundred years ago. Florian Cajori in his work A History of Mathematical Notation describes some of the vitriol that got thrown back and forth; some choice quotes from that book are in my slides. It’s nice to know (I suppose) that the modern impulse to fight over standards and their implementation details was shared by our ancestors as well.

Because notation had been on my mind, my talk centers around some parallels I see between mathematical notation design and programming language design.

One of those parallels: I like it when I find new programming paradigms that introduce new abstractions for the problems that we as programmers often solve. Here are some things in this general concept-space that have recently piqued my interest:

  • Dafny is a research programming language that has syntax built in for writing down a function’s preconditions (requires) and postconditions (ensures); the verifier then checks that those conditions are true.
  • Computational biology has a need for programming tools that model at many different levels of abstraction – from the molecular level (e.g. protein folding: tools – projects like Folding@Home and Foldit) to the cellular level (e.g. protein signalling networks: tools – programming languages like Kappa) to the whole-organism level (tools – we need them!).
  • The pi-calculus is a model of computation in the same way that lambda calculus or Turing machines are models of computation, but is unusual in that it allows both parallel composition and sequential composition of code. Normally we write code sequentially, without the ability to specify when two operations or sequences of operations are independent and could very well have happened in parallel. Instead, the pi-calculus formalizes the ability to specify code as running sequentially or in parallel, opening up the possibility that the compiler could optimize code to run concurrently both more easily and with less thought required from the programmer. Pict is a concurrent programming language which is built upon the pi-calculus.

Another of those parallels: I like it when I find programming tools that enable visual representation of code. For example, snakefood is pretty nifty. Towards giving an example of what a completely visual programming language could look like, I discuss a visual circuitry-like notation I designed for lambda calculus, giving examples that dive into lambda calculus and combinatory logic. This blog post has more detail on that project. I don’t claim that we should be using visual representations all the time – user interface design is tricky, and our text-based systems have a lot going for them – but I think there still exist areas in which to innovate visually.

𓍝

A circuit-like notation for lambda calculus

Lately, I’ve been playing around with inventing a visual writing system for lambda calculus.

Lambda calculus (λ-calculus) is a sort of proto-functional-programming, originally invented by Alonzo Church while he was trying to solve the same problem that led Turing to invent Turing machines. It’s another way of reasoning about computation.

Python’s lambda is an idea that was borrowed from λ-calculus. In Python, you can use a lambda expression like the following in order to define a function that returns the square of a number:

square = lambda x: x * x

In λ-calculus, the idea is the same: we create a function by using λ to specify which arguments a function takes in, then we give an expression for the function’s return value. Pure lambda calculus doesn’t include operators of any sort – just functions being applied to other functions – so if we try to write a square function, we have to suppose that multiply is a function of two variables that has already been defined:

square = λx. multiply x x

The square function, once defined, can be applied to arguments and evaluated into something simpler.

square 4 = (λx. multiply x x) 4
         = multiply 4 4
         = 16

One of the cool things about lambda calculus is that we can represent most common programming abstractions using λ-calculus, even though it’s nothing but functions: numbers, arithmetic, booleans, lists, if statements, loops, recursion… the list goes on. Before I introduce the visual writing system I’ve been using, let’s take a detour and discuss how we can represent numbers and arithmetic using lambda calculus.

Church numerals, in lambda calculus

Alonzo Church figured out how to represent numbers as lambda functions; these numbers are referred to as Church numerals.

We can represent any nonnegative integer as long as we have two things: (1) a value for zero, and (2) a successor function, which returns n + 1 for any number n. To represent numbers as functions, then, we require that z (zero) and s (successor) be passed in as arguments, and go from there. Each number is actually secretly a function of those two inputs.

zero = λs. λz. z
one = λs. λz. s z
two = λs. λz. s (s z)
three = λs. λz. s (s (s z))

The actual details of how to implement zero and successor should be implemented as are left as someone else’s problem — we can survive without them. All we care about is that our numbers do the right thing, given whatever zero and successor someone may provide.

What about addition? Addition is a function that takes in two numbers (let’s call them x and y), and produces a number representing their sum. To sum them together, we’ll want to produce a number that applies s, the successor function, a total of x + y times. For example, we could first apply it y times to the zero, then apply it x more times to that result.

plus = λx. λy. (λs. λz. x s (y s z))

Let’s try proving that one plus one equals two. In λ-calculus, this proof looks like the following:

one = λs. λz. s z
two = λs. λz. s (s z)

plus = λx. λy. (λs. λz. x s (y s z))

plus one one = (λx. λy. (λs. λz. x s (y s z))) one one
             = λs. λz. one s (one s z)
             = λs. λz. (λs. λz. s z) s (one s z)
             = λs. λz. s (one s z)
             = λs. λz. s ((λs. λz. s z) s z)
             = λs. λz. s (s z)
             = two

(Long, but at least conciser than Bertrand Russell’s.)

Lambda circuitry

There are a lot of lambdas, parentheses, and arguments being pushed around in that proof. Mentally matching up parentheses is annoying. Scope is especially annoying: which s am I looking at again in λs. λz. (λs. λz. s z) s (one s z), the inner one or the outer one?

A linear string of lambdas and parentheses is an ineffective way to provide intuition for the computations that are taking place. This problem isn’t unique to lambda calculus, either; consider trying to represent a binary tree using a linear string:

Node(2, Node(7, Leaf(2), Node(6, Leaf(5), Leaf(11))), Node(5, None, Node(9, Leaf(4), None)))

Unambiguous, but not very intuitive. Contrast that representation with the diagram we use when we’re trying to explain that same binary tree at a chalkboard, a more visual notation:

Binary tree diagram, from Wikipedia

Image from Wikipedia.

I remember programming constructs better when I can reason about them visually like this: when I imagine cutting an array in half for binary search, when I imagine pointers in a linked list being shuffled around to insert a new element, and when I imagine traversing up and down the branches of a binary tree.

Why can’t lambda calculus get some visual intuitions, in the same way? Lambda calculus is a dance of variables flowing through and being manipulated by functions, and I want a writing system for lambda calculus that will visually display this dance. It shouldn’t look like strings of parentheses and symbols: it should create visual intuition.

After some trial and error, here is the system I came up with. I aimed for something that would resemble circuitry.

Values flow along wires, where they may be passed in as arguments to functions or applied as functions themselves. Some are inputs, some are outputs.

Functions are represented as boxes which are applied to their inputs on one side and produce a single output on the other. The notation must indicate which function is applied; this may either be drawn within the box itself, or wired in to the middle of the box from some other value.

Arguments are represented as inputs, coming in from the right side of the diagram; these arguments might pass through functions, or they might be functions-to-apply themselves. If an argument has not been passed in yet, it’s an empty arrow beginning a wire; if an argument has been passed in, its value is attached to the wire. Arguments are always passed in from top to bottom, in order.

As an example, here’s a function which takes in two functions, f and g, then a value x, and returns f (g x):

lambda f. lambda g. lambda x. f (g x)

As another example, here’s the M combinator M = λx. x x (the “mockingbird” in To Mock a Mockingbird):

lambda x. x x

Church numerals, in lambda circuitry

Here’s the Church numeral four = λs. λz. s (s (s (s z))), drawn out in lambda circuitry:

Four, in lambda circuitry

Let’s take that proof from earlier that one plus one is two. What does it look like to draw that proof in lambda circuitry, instead?

Proof that one plus one is two, in lambda circuitry

We could also consider multiplication. A multiply function would take in two numbers, m and n, and computes a new number which is their product. In lambda calculus, we’d write:

multiply = λm. λn. λs. λz. m (n s) z

In the notation of lambda circuitry, this looks like this:

Multiplication function, in lambda circuitry

Using this function, we can check that multiply 2 3 evaluates to 6:

Multiply(2, 3), step 1

Multiply(2, 3), step 2

Multiply(2, 3), step 3

Multiply(2, 3), step 4

Multiply(2, 3), step 5

Sidenote: De Bruijn indices

One of the nice things about lambda circuitry is that it completely removes the need for variable names.

There’s another notation for lambda calculus that does this too: De Bruijn indices. A lambda expression written with De Bruijn indices indicates which variables are used where with a positive integer; the smaller the integer, the more recently the argument it refers to was passed in.

For example, the identity function λx. x may be written with De Bruijn indices like so:

identity = λ 1

The Church numeral for two, λs. λz. s (s z), may be written like so:

two = λ λ 2 (2 1)

The addition function, λx. λy. (λs. λz. x s (y s z)), may be written like so:

plus = λ λ (λ λ 4 2 (3 2 1))

An evaluation of plus one one looks like this:

plus one one = (λ λ (λ λ 4 2 (3 2 1))) (λ λ 2 1) (λ λ 2 1)
             = (λ (λ λ (λ λ 2 1) 2 (3 2 1))) (λ λ 2 1)
             = λ λ (λ λ 2 1) 2 ((λ λ 2 1) 2 1)
             = λ λ (λ λ 2 1) 2 (2 1)
             = λ λ 2 (2 1)

One of the tricky things about writing a lambda calculus interpreter is getting the renaming rules right; De Bruijn indices are convenient because they remove the need for this. Lambda circuitry is similar in spirit to De Bruijn indices in that it doesn’t require variable names at all, but instead indicates which variables are passed where by connecting values directly to an arrow indicating when they were passed in.

Argument-switching function, in lambda circuitry

I’ll provide more examples just to further demonstrate how the notation works in different situations. Let’s consider the “argument-switching” function C, where C f x y returns f y x. (This is actually the C combinator.)

C = λf. λx. λy. f y x

C combinator

Suppose we try applying this to a silly function f where f x y discards y and just returns x. Then, C f should switch around f‘s arguments and create a function which returns y instead. Let’s check:

f = λx. λy. x

C f = λf. λx. λy. (f y x) f
    = λx. λy. f y x
    = λx. λy. (λx. λy. x) y x
    = λx. λy. y

f = lambda x. lambda y. x

C(f), step 1

C(f), step 2

C(f), step 3

We could also try a function g where g x y returns x y. Then C g x y should return y x. Let’s check:

g = λx. λy. x y

C g x y = λf. λx. λy. (f y x) g x y
        = g y x
        = (λx. λy. x y) y x
        = y x

g = lambda x. lambda y. x(y)

C(g), step 1

C(g), step 2

C(g), step 3

C(g), step 4

Exercise: Show that applying C twice reverses it. That is, show that C (C f) returns f, for any f. (Note that C f is a function which takes in two arguments, x and y, and returns f y x. Applying C only to f like this is partial application.)

Prior work

There are some other systems that give visual intuition to lambda calculus.

To Dissect a Mockingbird describes a notation that is actually very similar to the one I’ve described, and demonstrates it on various problems from To Mock a Mockingbird. I like the way this looks, especially how every function is enclosed by two halves of a circle which make it obvious how that function might be applied. My notation doesn’t have this feature, but requires drawing fewer enclosing boxes as a result.

Visual Lambda (code, basics, paper) represents lambda expressions as colored bubbles, and provides an interface for manipulating them.

Alligator Eggs is a description of a puzzle game based on lambda calculus, which also happens to provide a visual way of working with and evaluating lambda expressions.

These last two don’t happen to satisfy the aesthetic that I personally was aiming for: they use color to represent variable names, whereas I wanted something that would be closer in spirit to De Bruijn indices, providing computational meaning by the careful placement of symbols or wires – but they are nifty nonetheless.

Further reading

An explanation of some of the nittier, grittier details of lambda calculus can be found at Wikipedia: Lambda calculus as well as in the textbook Types and Programming Languages.

To Mock a Mockingbird is a great puzzle book, and an introduction to combinator calculus; I had a lot of fun reading it and writing out some of the proofs for answers to some of the problems in lambda circuitry notation.

𓁁

Schrödinger's deploys no more: how we update translations

Cross-posted from the Khan Academy engineering blog.

If you’re trying to bring the best learning experience to people around the world, it’s important to, well, think about the world.

Khan Academy is translated into Spanish and Turkish and Polish and more – and this includes not only text, but also the articles, exercises, and videos. Thanks to the efforts of translators, learners around the world can use Khan Academy to learn in their language.

"You can learn anything" in several languages

Internationalization is important. Internationalization is also an engineering challenge: it requires infrastructure to mark which strings in a codebase need to exist in multiple different languages, to store and look up the translated versions of those strings, and to show the user a different website accordingly. Additionally, since our translations are crowdsourced, we need infrastructure to allow translators to translate strings, to show translators where their effort is most needed, and to show these translations once they’re ready. There are many moving parts.

When I arrived at Khan Academy at the beginning of this summer, some of these moving parts in our internationalization infrastructure were responsible for most of the time our deploys took to finish. One of the things I accomplished this summer during my internship here was to banish this slowness from our deploy times.

The problem

Whenever we download the latest translation data from Crowdin, which hosts our crowdsourced translations, we rebuild the translation files – files which the Khan Academy webapp can read in order to show translated webpages. The next time an engineer deploys a new version of the webapp, these new translation files are then deployed as well.

Uploading files to Google App Engine, which hosts the Khan Academy website, is usually the slowest part of our deploys; the translation files are big, so translation files in particular are a major contributor to this. So, whenever the latest translations are downloaded and rebuilt, the next deploy would be quite a bit slower while it uploaded the changed files.

Furthermore, since it’s not always the case that translation files have been rebuilt recently, as an engineer it’s hard to tell whether the deploy you’re about to make will be hit with Translations Upload Duty or not. Sometimes deploys would take around 30 minutes, sometimes they would take around 75 minutes or more:

Graph of deploy times, before translation server

The previous state of affairs – 30-minute deploys punctuated by 75-minute deploys. There are a couple of lulls in this graph where deploys are consistently near 30 minutes: these reflect times when our download from Crowdin was not working.

The fix

We decided to rearrange the infrastructure around this so that instead of uploading translation files to Google App Engine (GAE) along with the rest of the webapp, we would upload the translation files to Google Cloud Storage (GCS) in a separate process and then modify the webapp to read the files from there.

Implementing this required making a few different changes, and the changes had to be coordinated in such a way as to keep internationalized sites up and running throughout the entire process:

  • Upload the translation files to GCS whenever they’re updated.

  • Change the webapp to read translations from GCS instead of from GAE.

  • Stop uploading the now-unnecessary translation files to GAE.

These steps by themselves are enough to implement the change to make deploys faster, but we also want to make sure that this project won’t break anything – so instead, the steps look something like:

  • Upload the translation files to GCS whenever they’re updated. Measure everything. (How much will this cost? How fast will the upload be?)

  • Change the webapp to read translations from GCS instead of from GAE. Measure everything. (How much slower is this than reading from disk? Will requests be slower?)

  • Stop uploading the now-unnecessary translation files to GAE. Measure everything. (Do translated sites still work?)

One thing I experimented with while working on this project was keeping a lab notebook of sorts in a Google Doc; this was where I went to record everything I learned, from little commands that might be useful later, to dependencies I had to install, to all of the measurements I ended up making. This was a good decision. This habit and these notes did in fact turn out to be useful, frequently.

Mythbusters' Adam Savage: "Remember kids, the only difference between screwing around and science is writing it down."

The consequences

Deploys are faster!

I deployed the last piece of this project on August 20, 2015. Deploy times have been more consistent since: the graph of deploy times is free of the spikes that previously indicated translation uploads.

Graph of deploy times, after translation server

Before and after; the change happened on 8/20.

Translations can be updated independently!

Now we don’t require an engineer to deploy new code in order to change the translations that appear on Khan Academy – translations are updated by a separate job. Also, this opens up new possibilities – we can now do exciting things like updating our languages independently of each other, and make it so that the time between when a translator makes a translation and when that translation shows up on the main site becomes even shorter. Our internationalization efforts will be able to push forward even faster!

More posts