The Devil of the Excluded Middle

February 14, 2022

The tale of the “Devil of the Excluded Middle” appears in [Wadl03] as a humorous sketch depicting the execution of a program typed by the law of the excluded middle. With apologies to Wadler, I reproduce it here:

Once upon a time, the devil approached a man and made an offer: “Either (a) I will give you one billion dollars, or (b) I will grant you any wish if you pay me one billion dollars. Of course, I get to choose whether I offer (a) or (b).”

The man was wary. Did he need to sign over his soul? “No”, said the devil, all the man need do is accept the offer.

The man pondered. If he was offered (b) it was unlikely that he would ever be able to buy the wish, but what was the harm in having the opportunity available?

“I accept,” said the man at last. “Do I get (a) or (b)?”

The devil paused. “I choose (b).”

The man was disappointed but not surprised. That was that, he thought. But the offer gnawed at him. Imagine what he could do with his wish! Many years passed, and the man began to accumulate money. To get the money he sometimes did bad things, and dimly he realized that this must be what the devil had in mind. Eventually he had his billion dollars, and the devil appeared again.

“Here is a billion dollars,” said the man, handing over a valise containing the money. “Grant me my wish!”

The devil took possession of the valise. Then he said, “Oh, did I say (b) before? I’m so sorry. I meant (a). It is my great pleasure to give you one billion dollars.”

And the devil handed back to the man the same valise that the man had just handed to him.

It seems obvious that the devil, being the spirit of evil, has cheated the man. We can interpret the devil’s choice of (b) as a change in the nature of his offer. Accordingly, the man acts on the belief that the offer is transactional, i.e., in exchange for a billion dollars, the devil would grant him anything he desired. It’s almost as though the devil merely created an illusion at the beginning by presenting two possibilities.

A more amenable (and more confusing) version of this tale (by Peter Selinger) appears as exercise (6.19) in [SøUr06]. In it, an evil king gives a shepherd the following orders:

“You must bring me the philosopher’s stone, or you have to find a way to turn the philosopher’s stone into gold. If you don’t, your head will be taken off tomorrow!”

[SøUr06] presents the following solution to the shepherd’s quandary:

The next day the poor shepherd brings to the king’s palace a huge machine. The machine has two openings. One is marked “Put the philosopher’s stone here!” and on the other it reads “The gold will fall out from here”. That will perfectly work as long as the king cannot put the philosopher’s stone into the machine. But what if, somehow, the king comes into the possession of the philosopher’s stone? Then the shepherd’s brother, hidden inside the machine, will grab the stone, and hand it discretely to the shepherd. The shepherd now can say: “Oops, Your Majesty, I’ve been mistaken. Here is the philosopher’s stone!”

Again, this seems wrong. The shepherd’s machine cannot produce gold from the philosopher’s stone, and it isn’t the philosopher’s stone (by virtue of not being a stone). The machine seems only capable of producing the philosopher’s stone from itself.

An infuriating (and therefore fascinating?) aspect of the computational interpretation of classical logic is that both these stories mirror the behaviour of a program of type $α \lor \neg α$ somewhat accurately. (To be pedantic, the type here is $α \lor (α \to β)$ , a classical tautology that behaves nearly identically to $α \lor \neg α$ ). Studying this interpretation can add important details to these tales, such as the wonderful hint hidden in Martens’ presentation.

The contents of this post assumes some knowledge of typed $λ$ calculus, natural deduction style proofs, and a basic understanding of the Curry-Howard Isomorphism. Some knowledge of the Call-by-Value evaluation strategy is helpful (look here for a readable explanation). To keep this post short, I won’t be presenting the proofs for the claims made in [Grif89], but I believe I’ve provided enough details in the next section to explain the devil’s behaviour.

Preliminaries

It’s well known that the Law of the Excluded Middle (LEM) is not accepted as truth in constructive logics. This is unfortunate, as simply typed $λ$ terms correspond to proofs over (the propositional fragment of) these logics and not their classical counterparts. This means that, in order to understand the devil, we need a new set of computational rules that can be typed with LEM.

In [Grif89], Griffin shows how control operators can be typed with the Double Negation Elimination (DNE) rule. These operators are based on the call/cc subroutine in the programming language Scheme. [FFKD87] presents a syntax and a set of rewrite rules for Idealized Scheme, a language that captures this control operator. Griffin types terms in this language with DNE.

The syntax of Idealized Scheme is: $M : : = x ∣ M M ∣ λ x . M ∣ C (M) ∣ A (M)$ The elements $C$ and $A$ are the control operators.

The operational semantics of Idealized Scheme uses a Call-By-Value (CBV) style evaluation strategy. The principle of CBV is to fully reduce all parameters of a function before beginning its evaluation. It’s also deterministic; the first valid reducible expression (redex) encountered in a left-to-right sweep of the term (i.e., the leftmost-outermost one) is the one to get reduced. Felleisen et. al. use the notion of Evaluation Contexts (or simply contexts) to capture this strategy. Its syntax is: $E : : = [] ∣ V E ∣ E M$ Here, the hole $[]$ is meant to contain a redex. $V$ is a value: i.e., it contains no valid redexes. Notably, in a CBV execution, functions who’s bodies contain redexes are considered values; this is because those redexes are considered invalid. As a consequence, badly written functions remain badly written until absolutely required. In terms of execution contexts, this means $E ⧸ : : = λ x . E$ .

Any term $M$ can be viewed as $E [M^{'}]$ where $M^{'}$ is the leftmost-outermost valid redex inside $M$ . It’s expected that this context will change in the course of the evaluation of a term as the leftmost-outermost redex changes. The rewrite rules are specified over these contexts. They are: $\begin{aligned} E [(λ x . M)] V & \to_{β_{v}} E [M [x : = V]] \\ E [A (M)] & \to_{A} M \\ E [C (M)] & \to_{C} M (λ z . A (E [z])) \end{aligned}$ These rules inspire an understanding of $A$ as an aborting or an escaping operation that drops the context surrounding the term it contains. $C$ is similar to call/cc: the execution of $C (M)$ passes to $M$ an abstraction of the continuation of $C (M)$ such that, if this continuation is invoked, the evaluating machine jumps to the subterm requiring it. The following is a representative use-case of these operators: $\begin{aligned} E [C (M)] & \to_{C} M (λ z . A (E [z])) \\ \overset{*}{\to} E^{'} [(λ x . M^{'}) (λ z . A (E [z]))] \\ \to_{β_{v}} E^{'} [M^{'} [x := λ z . A (E [z])]] \\ \overset{*}{\to} E^{″} [(λ z . A (E [z])) N] \\ \to_{β_{v}} E^{″} [A (E [N])] \\ \to_{A} E [N] \end{aligned}$

In [Grif89], Griffin argues for the consistency of the typing rules $\begin{matrix} \begin{matrix} Γ ⊢ M : (φ \to ⊥) \to ⊥ \end{matrix} \\ Γ ⊢ C (M) : φ \end{matrix}$ And $\begin{matrix} \begin{matrix} Γ ⊢ M : ⊥ \end{matrix} \\ Γ ⊢ A^{φ} (M) : φ \end{matrix}$ He also imports the typing rules of Church’s simply-typed calculus for typing $λ$ abstractions and applications. It’s clear that the new rules correspond to the double negation elimination and $⊥$ elimination rules respectively.

Unfortunately, Griffin’s analysis requires the overall term $E [C (M)]$ to be typed by $⊥$ to use the $\to_{C}$ rule. The typed rule now looks like: $E^{⊥} [C^{φ} (M^{\neg \neg φ})] \to_{C} M^{\neg \neg φ} (λ z^{φ} . A^{⊥} (E^{⊥} [z]))$ This is a problem, as no closed term can be typed $⊥$ . Griffin gets around this by mandating a continuation variable on each typed term. Formally, transform $M^{α} ⟹ C (λ k^{\neg α} . k M^{α})$ Where $k$ is not free in $M$ . Griffin’s typing rules give $\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} ⊢ M : α \end{matrix} \\ k : \neg α ⊢ k M : ⊥ \end{matrix} \end{matrix} \\ ⊢ λ k^{\neg α} . k M : \neg \neg α \end{matrix} \end{matrix} \\ ⊢ C (λ k^{\neg α} . k M) : α \end{matrix}$ He then restricts the scope of evaluation contexts to be within the scope of this continuation variable. This allows us to use the $C$ rewrite rule, by changing it to $\to_{t C}$ : $C (λ k^{\neg α} . E^{⊥} [C^{φ} (M^{\neg \neg φ})]) \to_{t C} C (λ k^{\neg α} . M^{\neg \neg φ} (λ z^{φ} . A^{⊥} (E^{⊥} [z])))$ Griffin removes the $C (λ k . \dots)$ for brevity. I will do the same.

Typing and Evaluating LEM

As per Glivenko’s Theorem, $\neg \neg (α \lor \neg α)$ is true in constructive logics. Its proof is: $\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \neg (α \lor \neg α) ⊢ \neg (α \lor \neg α) & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \neg (α \lor \neg α), α ⊢ \neg (α \lor \neg α) & \begin{matrix} \begin{matrix} \neg (α \lor \neg α), α ⊢ α \end{matrix} \\ \neg (α \lor \neg α), α ⊢ α \lor \neg α \end{matrix} \end{matrix} \\ \neg (α \lor \neg α), α ⊢ ⊥ \end{matrix} \end{matrix} \\ \neg (α \lor \neg α) ⊢ α \to ⊥ \end{matrix} \end{matrix} \\ \neg (α \lor \neg α) ⊢ α \lor \neg α \end{matrix} \end{matrix} \\ \neg (α \lor \neg α) ⊢ ⊥ \end{matrix} \end{matrix} \\ ⊢ \neg \neg (α \lor \neg α) \end{matrix}$ This proof corresponds to the term $λ x^{\neg (α \lor \neg α)} . x {inj}_{2}^{α \lor \neg α} (λ y^{α} . x {inj}_{1}^{α \lor \neg α} (y))$ Applying the $C$ operator gives us a term typed by LEM $C (λ x^{\neg (α \lor \neg α)} . x {inj}_{2}^{α \lor \neg α} (λ y^{α} . x {inj}_{1}^{α \lor \neg α} (y)))$ The typed execution of this term in a context $E$ evaluated under the scope of a continuation variable $k$ is: $\begin{aligned} E [C (λ x^{\neg (α \lor \neg α)} . x {inj}_{2}^{α \lor \neg α} (λ y^{α} . x {inj}_{1}^{α \lor \neg α} (y)))] \\ \to_{t C} (λ x^{\neg (α \lor \neg α)} . x {inj}_{2}^{α \lor \neg α} (λ y^{α} . x {inj}_{1}^{α \lor \neg α} (y))) (λ z^{α \lor \neg α} . A^{⊥} (E [z])) \\ \to_{β_{v}} (λ z^{α \lor \neg α} . A^{⊥} (E [z])) ({inj}_{2}^{α \lor \neg α} (λ y^{α} . (λ z^{α \lor \neg α} . A^{⊥} (E [z])) ({inj}_{1}^{α \lor \neg α} (y)))) \\ \Rightarrow_{β_{v}} A^{⊥} (E [{inj}_{2}^{α \lor \neg α} (λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)]))]) \\ \to_{t A} E [{inj}_{2}^{α \lor \neg α} (λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)]))] \end{aligned}$ As previously mentioned, I omit the $C (λ k^{\neg φ} . \dots)$ to avoid (even more) clutter. The resulting term produces the type $α \lor \neg α$ from the subterm $λ y^{α} . A^{⊥} (\dots)$ of type $\neg α$ . Importantly, a copy of the context $E$ is inside the aborting $A$ operator. At both points, its hole contains a value of type $α \lor \neg α$ .

The evaluation must now proceed through a different redex. Suppose the only way this term can appear in a redex is in a $case$ expression. Let’s fast-forward to this point, and abstract away the new context to $E^{'}$ : $\begin{aligned} E [{inj}_{2}^{α \lor \neg α} (λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)]))] \\ \overset{*}{\to} E^{'} [case ({inj}_{2}^{α \lor \neg α} (λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)])), [x_{α}] F_{α}, [x_{\neg α}] F_{\neg α})] \end{aligned}$ Semantically, $case (N^{α \lor β}, [x_{α}] F_{α}, [x_{β}] F_{β})$ is understood as passing the contents of $N$ to the variable $x_{α}$ in $F_{α}$ or $x_{β}$ in $F_{β}$ based on its inner type. Hence, $\begin{aligned} E^{'} [case ({inj}_{2}^{α \lor \neg α} (λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)])), [x_{α}] F_{α}, [x_{\neg α}] F_{\neg α})] \\ \to_{c a s e} E^{'} [F_{\neg α} [x_{\neg α} := λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)])]] \end{aligned}$ Evaluation now proceeds at a different redex. Again, for our term to appear in one, it will need a value (say $V^{α}$ ) of type $α$ . Let’s suppose this happens under the context $E_{F_{\neg α}}$ . This causes the following cascade: $\begin{aligned} E^{'} [F_{\neg α} [x_{\neg α} := λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)])]] \\ \overset{*}{\to} E_{F_{\neg α}} [(λ y^{α} . A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (y)])) V^{α}] \\ \to_{β_{v}} E_{F_{\neg α}} [A^{⊥} (E [{inj}_{1}^{α \lor \neg α} (V^{α})])] \\ \to_{t A} E [{inj}_{1}^{α \lor \neg α} (V^{α})] \end{aligned}$ Remarkably, this final context jump transports us back to the original context $E$ with one key difference: its hole is constructed from a value of type $α$ ! Remember, this value was generated after the initial evaluation of the $case$ expression. The context $E$ had proceeded under the assumption that it had a term that could produce $⊥$ from $α$ and generated $V^{α}$ accordingly. This is akin to the bag containing a billion dollars that the man earns (steals?) for the devil. After receiving the value, the execution jumps to the copy of the initial context $E$ surrounding an abstraction of this value. This jump indicates that the Devil, now a billion dollars richer, “turns back the clock” to the point where his offer was made. This is the detail Martens adds to the tale.

All this causes a different behaviour at the (eventual) $case$ expression: $\begin{aligned} E [{inj}_{1}^{α \lor \neg α} (V^{α})] \\ \overset{*}{\to} E^{'} [case ({inj}_{1}^{α \lor \neg α} (V^{α}), [x_{α}] F_{α}, [x_{\neg α}] F_{\neg α})] \\ \to_{c a s e} E^{'} [F_{α} [x_{α} := V^{α}]] \end{aligned}$ This produces what very well might be the final term.

Issues

The earlier section assumes that a $case$ expression is the only way to interact with a value of type $α \lor β$ . However, [Grif89] doesn’t provide a separate syntax for disjunctive types. Instead, they’re coded into the language using implication and negation: $α \lor β ≜ \neg α \to \neg \neg β$ To simulate $\lor$ -elimination in this encoding, a $C$ expression is required. This is because the encoding is only classically correct; $α ⊢ γ$ and $β ⊢ γ$ does not constructively imply $\neg α \to \neg \neg β ⊢ γ$ .

The $case$ expression corresponds to this $\lor$ -elimination. For the Call-by-Name variant, it’s defined as: ${case}^{δ} (M, F_{1}, F_{2}) ≜ C (λ j^{δ} . M (λ x . j (F_{1} x)) (λ x . j (F_{2} x)))$ The expression simulating $case$ in Call-by-Value is more complicated, and its presentation isn’t necessary.

All this gives the impression that a $case$ expression isn’t the only way to use a term of a disjunctive type. One could supply a value of type $\neg α$ to it and get a value of type $\neg \neg β$ in return.

Unfortunately, I can’t counter this objection with a technical argument. One possible approach would involve extending the syntax of Idealized Scheme with separate injection, pairing and case categories that play nice with the existing encoding schemes. Perhaps the Dual Calculus detailed in [Wadl03] achieves this.

I’ll only say this: $\neg α \to \neg \neg β$ is provably not a disjunctive type without the applicability of the $case$ expression. In a sense, it’s the application of a $case$ that makes it a disjunctive type. Without it, the conclusions we can arrive at wouldn’t necessarily only hold for these types.

Explaining the tales

The devil’s actions were mostly explained in the previous section. To repeat, the devil, being (probably) omnipotent, has the ability to alter the state of the world at will. Martens gives the devil the simple power to turn back time. To faithfully simulate $C$ and $A$ , I grant the devil these greater powers. The devil remembers the state of the world when he makes his offer to the man. Once the man gives him the billion dollars, he reverts the state of the world to the remembered state, and makes his offer again. This time, he chooses (a). Whether the man remembers his misdeeds is up-to the devil.

The shepherd’s tale is less satisfying. His only way out is the approach suggested in [SøUr06]. To pull off this sleight-of-hand, he needs to acquire the philosopher’s stone from the king. Being the crafty man he is, he produces a list of instructions to be followed by the King’s trusted knights. He then informs the king that those instructions would provide the king with what he wants when augmented with the entire sequence of actions the king wishes to perform on what he wants. When the king hesitates, he points out that the instructions would be carried out by his trusted knights: men incapable of disloyalty.

Now convinced, the king begins preparing his instructions. Naturally, he must account for both possibilities: the shepherd creating a machine, or producing the stone. Therefore, his instructions would be something like:

private KING_STONE;

switch (shepherd_output.type) {
    case STONE:
        printf("It's the stone!");
        return shepherd_output.stone;

    case MACHINE:
        printf("It's a machine. Inputting stone...\n");
        if (shepherd_output.run(KING_STONE) != GOLD) {
            kill_shepherd();
        }
}

After receiving the augmented set of instructions, the knights begin their work. First, they duplicate the king’s instructions and embed the copy inside the shepherd’s list of instructions. Next, the shepherd requires the knights to execute the king’s instructions as though he had produced a machine that would convert the philosopher’s stone to gold. To do this, they retrieve the philosopher’s stone from the king’s quarters. The next step requires them to embed the stone into the list of instructions.

At this point, the knights encounter the copy of the king’s instructions. They are told to forget everything they’ve done so far and execute the king’s instructions with the stone they find (which they placed!) inside. If outputs could be logged in medieval Europe, they would say:

It's a machine. Inputting stone...
It's the stone!

Thus, the knights return to the king with the philosopher’s stone. The king, now satisfied, spares the shepherd.

References

[FFKD87]

Felleisen, Matthias ; Friedman, Daniel P. ; Kohlbecker, Eugene ; Duba, Bruce: A syntactic theory of sequential control. In: Theoretical Computer Science Bd. 52 (1987), Nr. 3, S. 205–237

[Grif89]

Griffin, Timothy G.: A Formulae-as-Type Notion of Control. In: Proceedings of the 17th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’90. New York, NY, USA : Association for Computing Machinery, 1989 — ISBN 0897913434, S. 47–58

[SøUr06]

Sørensen, Morten Heine ; Urzyczyn, Pawel: Lectures on the Curry-Howard Isomorphism, Volume 149 (Studies in Logic and the Foundations of Mathematics). USA : Elsevier Science Inc., 2006 — ISBN 0444520775

[Wadl03]

Wadler, Philip: Call-by-Value is Dual to Call-by-Name. In: SIGPLAN Not. Bd. 38. New York, NY, USA, Association for Computing Machinery (2003), Nr. 9, S. 189–201