Deriving the Theta Combinator
This post is based on a lovely article on “Deriving the \(Y\) Combinator”.
In a course on programming language fundamentals, our lecturer proved the Turing completeness of untyped \(\lambda\) calculus by encoding Gödel’s recursive functions into the calculus. This proof included a translation of the primitive recursion operation into a parametrized extension of the much simpler function composition operation. Sadly, the rigid nature of this translation meant that he needed new ideas to encode the \(\mu\)-operator. These were precisely the \(Y\) and \(\Theta\) combinators, two magical operators that exploit the structure of the reduction rules to simulate \(\mu\).
In this entry, I’d like to illustrate a more natural way to represent recursion that leads quite neatly onto one (the other one) of these combinators. This way, I hope to de-mystify them a little.
Consider the following function. \[ f(x) = \begin{cases} 1, & \text{$x = 0$} \\ f(x - 1) + 1 & \text{otherwise} \end{cases} \] It’s easy to see that \(f\) is primitive recursive. Consider the following (incorrect) \(\lambda\) translation: \[ f \triangleq \lambda x.\; [\mathsf{if\_zero}]\, x\, [1]\, ([\mathsf{sum}]\, [1]\, (f\, ([\mathsf{pred}]\, x))) \] Here, \([\mathsf{if\_zero}]\) is a \(\lambda\) function of three variables that reduces to the second if the first is zero, and to the third otherwise. \([\mathsf{sum}]\) and \([\mathsf{pred}]\) can be similarly understood as translations of the addition and predecessor operations in \(\lambda\) calculus.
The above translation is syntactically incorrect because \(f\) is a free variable on the right hand side. It is, however, an almost mechanical translation of the function definition in \(\lambda\)-calculus. This makes it easy to generalize.
A natural fix for this syntactic error is to simply declare \(f\) a parameter in the function definition. This gives \[ M \triangleq \lambda fx.\; [\mathsf{if\_zero}]\, x\, [1] ([\mathsf{sum}]\, [1]\, (f\, ([\mathsf{pred}]\, x))) \] One might hope that \(MM\) computes \(f\) correctly. Sadly, this won’t work because \(f\) (in the body) is designed to take in a single parameter (whereas \(M\) takes 2). A creative solution suggests \[ M \triangleq \lambda fx.\; [\mathsf{if\_zero}]\, x\, [1] ([\mathsf{sum}]\, [1]\, (f\, f\, ([\mathsf{pred}]\, x)) \] \(MM\) will now properly simulate \(f\): \[ MM \to_\beta \lambda x.\; [\mathsf{if\_zero}]\, x\, [1] ([\mathsf{sum}]\, [1]\, (M\, M\, ([\mathsf{pred}]\, x))) \]
Let’s take a step back and observe what we’ve accomplished. We began by realizing that to encode recursion, we needed to reference a function in its definition. While we weren’t able to do that directly, we were able to abstract that reference away as a variable that theoretically, any function could replace. But when we replaced that variable with the right function (which turned out to be the same function), we were left with an expression for the input (recursive) function.
(Notice the potential for an infinite sequence of \(\beta\) reductions at \(MM\), characteristic of Turing completeness!)
It turns out that this translation scheme is sufficient to encode all \(\mu\)-recursive functions. Recall that the objective was to produce \(C\) such that \[ C\, x\, [n_1]\, [n_2] \ldots [n_k] \xrightarrow{*}_\beta [\mathsf{if\_then\_else}]\, ([\mathsf{if\_zero}]\, g\, x\, [n_1] \ldots [n_k])\, x\, (C\, ([\mathsf{succ}]\, x)\, [n_1] \ldots [n_k]) \] with \(C(x) = \mu_{x}(g(x, n_1, \ldots n_k))\) (this is just the definition of the \(\mu\)-operator). Consider \[ D \triangleq \lambda f x n_1 n_2 \ldots n_k .\; [\mathsf{if\_then\_else}]\, ([\mathsf{if\_zero}]\, g\, x\, n_1 \ldots n_k)\, x\, (f\, f\, ([\mathsf{succ}]\, x)\, n_1 \ldots n_k) \] Our discussion shows that taking \(C = DD\) captures general recursion, proving the Turing completeness of untyped \(\lambda\) calculus! Notice that \[ DD \to_\beta \lambda x n_1 \ldots n_k .\; [\mathsf{if\_then\_else}]\, ([\mathsf{if\_zero}]\, g\, x\, n_0 \ldots n_k]\, x\, (D\, D\, ([\mathsf{succ}]\, x)\, n_1 \ldots n_k) \]
Deriving the \(\Theta\) Combinator
The first step in the translation scheme discussed so far is to regard the self-references as a parameter. In our toy example, this caused \(f'\): \[ f'(g, x) = \begin{cases} 1, & \text{$x = 0$}\\ 1 + g(x - 1) & \text{otherwise} \end{cases} \] We can easily write this interim function in \(\lambda\) calculus: \[ \begin{aligned} F \triangleq \lambda gx.\; [\mathsf{if\_zero}]\, x\, [1]\, ([\mathsf{sum}]\, [1]\, (g\, ([\mathsf{pred}]\, x))) \end{aligned} \]
Our goal now is to construct an expression \(C\) that always reduces to \(F\, C\). The usefulness of \(C\) is captured by the following reduction sequence: \[ \begin{aligned} F \, C \, x &\to_\beta [\mathsf{if\_zero}]\, x\, [1]\, ([\mathsf{sum}]\, [1]\, (C\, ([\mathsf{pred}]\, x)))\\ &\to_\beta C (\mathsf{[pred]}\, x) && \text{\textit{(assuming $x > 0$)}}\\ &\to_\beta F\, C\, (\mathsf{[pred]}\, x) \end{aligned} \] Using \(C\), we can trick a reference to \(F\) inside \(F\). All that remains now is to solve for \(C\).
Begin by observing that we want \[ \begin{aligned} C \to_\beta F\, C \end{aligned} \] \(C\) must thus be a reducible expression. For simplicity, suppose that \(C = (\lambda x. \; A)B\) for some \(A\) and \(B\). This gives \[\begin{aligned} A[x \coloneqq B] \equiv F(C) \end{aligned} \] The simplest solution here is to set \(A = x\, C'\) and \(B = F\) for some \(C'\). \[\begin{aligned} \begin{split} F(C'[x \coloneqq F]) &= F(C) \\ \implies C'[x \coloneqq F] &= C = (\lambda x. \; x \, C') F \end{split} \end{aligned} \] Again, the simplest substitution is \(C' = D\, x\). \[ \begin{aligned} D[x \coloneqq F]\, F &= (\lambda x. \; x (D\, x)) F \\ \implies D[x \coloneqq F] &= \lambda x. \; x (D\, x)) \\ \implies D &= \lambda x. \; x (D\, x) && \text{\textit{(replacing free variables has no effect)}} \end{aligned} \] This is a recursive equation of the form we’ve solved in the earlier section! Hence, set \[ \begin{aligned} M \triangleq \lambda Dx. \; x (D \, D \, x) \end{aligned} \] where \(N = MM\) is the solution. Write this explicitly. \[ \begin{aligned} N = (\lambda Dx.\; x (D\, D\, x)) (\lambda Dx. \; x(D\, D\, x)) = \Theta \end{aligned} \] It appears that we’ve stumbled onto the \(\Theta\) Combinator!
To summarize, we have \[ \begin{aligned} C = (\lambda x. \; x (\Theta \, x)) F \end{aligned} \] Such a \(C\) can be written for any \(F\), and is guaranteed to reduce to \(F(CF)\): \[ CF \to_\beta F(CF) \to_\beta F(F(CF)) \] Expanding \[ \begin{split} (\lambda x. \; x (\Theta \, x)) F &\to_\beta F(\Theta \, F) \\ &\to_\beta F(F(\Theta F)) \end{split} \] Indicating that \(\Theta F \to_\beta F(\Theta F)\), which makes \(C = \Theta F\) an even simpler solution.