The Y Combinator

A 10-minute introduction for anyone at least a little familiar with Lisp, written in MIT Scheme. If you want to follow along you can grab the MIT Scheme Jupyter Kernel.


Suppose we want write a factorial function. We could do this recursively:

In [2]:
(define (recursive-factorial n)
  (if (> n 0)
      (* n (recursive-factorial (- n 1)))

(recursive-factorial 10)

Or we could do this iteratively:

In [4]:
(define (iterative-factorial n)
  (let loop ((n n) (product 1))
    (if (> n 0)
        (loop (- n 1) (* n product))

(iterative-factorial 10)

The recursive factorial is fewer lines, and is easier to think about, but the iterative factorial is faster, since it doesn't need to push a stack frame every time it multiplies by a new n.

The Problem

But recursion is a strange thing - it may seem ordinary to us, but many beginners struggle to understand it. How can something reference itself in its definition? How can a function exist before it does? What does it really mean to call oneself? Recursion seems like a very unnartural property to expect a language to have, which begs the question of whether it's some irreducible fundamental, or whether we can do we can do with recursion, without recursion.

So suppose we had some mysterious language that doesn't support recursion: there are no named functions - or at least, a function can't be reference from inside itself. Can we still write factorial functions?

"Wait!" you say. "We can just use an iterative factorial!"

Yes; you're very clever. Or we could (fold-left * 1 (cdr (iota n))) or just about anything else. But let's set our sights on replicating specifically the recursive factorial, or at least a factorial that feels recursive without ever actually recursing.

The Foil

Let's give this factorial buisness out best shot:

In [6]:
(define (fact-1 n)
(fact-1 1)

Look! It works! The factorial if 1 is 1! It got the right answer!

In [7]:
(fact-1 10)

Well, maybe it's not as great as we hoped. But it worked for some inputs! We might say that it's a valid factorial for every input 1 or below.

This isn't totally useless: if we know how to find the factorials of numbers 1 or less, we can re-use this ability:

In [8]:
(define (fact-2 n)
  (if (< n 2)
      (* n (fact-1 (- n 1)))))
In [9]:
(fact-2 1)
In [10]:
(fact-2 2)

Amazing! Now we have a factorial function that's valid for all numbers 2 or less! It's still not perfect, but we're getting there.

In [11]:
(define (fact-3 n)
  (if (< n 2)
      (* n (fact-2 (- n 1)))))
In [12]:
(fact-3 1)
In [13]:
(fact-3 2)
In [14]:
(fact-3 3)

We get the idea: we could keep going. Let's formalize that:

In [17]:
;; fact-i+1 takes any valid fact-i and "extends" it by one
(define (fact-i+1 fact-i)
  (lambda (n)
      (if (< n 2)
          (* n (fact-i (- n 1))))))

(define fact-4 (fact-i+1 fact-3))

(fact-4 4)

Huh. This is interesting.

In [19]:
(define fact-8 (fact-i+1 (fact-i+1 (fact-i+1 (fact-i+1 fact-4)))))

(fact-8 8)

Let's formalize this too:

In [22]:
;; this generates an arbitrarily deep fact-i by folding through a list of i fact-i+1s
;; don't worry about the details: it's a "factory" for automating how we made fact-8
(define (fact-i i)
  (fold-right (lambda (f g) (f g)) fact-1 (make-list i fact-i+1)))

(define fact-10 (fact-i 10))
(fact-10 10)
In [24]:
(define (fact n)
  ((fact-i n) n))

(fact 100)

This is incredible! It turns out that we never actually need truly recursive factorial function: only one that's just recursive enough.

Fixed Points in Function Space

So it turns out that this little trick - generating pseudo-recursive functions only as deep as you actually need - is a instance of a really deep idea. To see this, this weird result:

In [25]:
(fact-i+1 fact)
#[compound-procedure 13]

What could this possibly mean? We know that fact-i+1 extends the domain of its input-factorial by one - but fact already has infinite domain! That was the whole point!

So unlike fact-3, where (fact-i+1 fact-3) was fact-4, or fact-1000, where (fact-i+1 fact-1000) would be fact-1001, this new (fact-i+1 fact) function is the exact same function that we started with. It has identical outputs for every possible input. This is another way of saying that fact is a fixed point of the fact-i+1 function.

A fixed point of a function is an input that produces itself as an output. For example, 0 is a fixed point of sine because


Are all the same number*.

Similarily, we can say that fact is a fixed point of fact-i+1 beacuse

(fact-n+1 fact)
(fact-n+1 (fact-n+1 fact))
(fact-n+1 (fact-n+1 (fact-n+1 fact))) 

Are all the same function (which is to say they all produce the same output for any input).

*fun fact: one fixed point of cosine is approximately 0.739085 and has no known closed form expression. So cos(0.739085) = 0.739085, but nobody really knows what 0.739085 is.

The Y Combinator

So here it is: the Y Combinator generates fixed points for functions. If you have a function that operates on other functions, like fact-n+1, you can pass that function into the Y Combinator and it'll return a new function that is a fixed point of your original one. Magic!

Surely the Y Combinator must be a huge, complicated beast, right?

In [28]:
(define (Y F)
  ((lambda (X) (F (lambda (n) ((X X) n))))
   (lambda (X) (F (lambda (n) ((X X) n))))))

(define fact (Y fact-i+1))

(fact 100)

So what's actually happening here? We've got this terribly unsightly mess of parenthesis:

(define (Y F)
  ((lambda (X) (F (lambda (n) ((X X) n))))
   (lambda (X) (F (lambda (n) ((X X) n))))))

One oberservation that pops out is its symmetry: The Y Combinator has two copies of the same lambda, and feeds one into the other. So

(lambda (X) (F (lambda (n) ((X X) n))))

gets evaluated immediately, with an X that is an exact copy of that function (lambda (X) ...) itself. This is a way of side-stepping our recursion issue: a function can't reference itself, but it might have been given an argument that it knows is its own clone.

So now that we a copy of our own clone, what do we do with ourselves? Let's indent better and walk through step by step:

(define (Y F)        ; F is a "generator" that makes functions out of functions, like fact-n+1
  ((lambda (X)       ; X is a copy of (lambda (X) ...) itself
     (F              ; so we return the result of passing F some thing
      (lambda (n)    ; and that thing is this function of n
        ((X X) n)))) ; which passes n into (X X), which *generates another copy of itself*
   (lambda (X) (F (lambda (n) ((X X) n))))))

The ((X X) n) is the real magic here. If (lambda (n) ((X X) n)) - the thing that's passed into the F that we're trying to find a fixed point of - if that lambda is never called (like if n is 1 and we hit the base case of our factorial), then (X X) will never be evaluated. It will just sit there, and nobody will care what it would have done. But if that lambda is evaluated, it'll pass F a brand new copy of the whole (lambda (X) ...) deal.

But (F ...) is what gets returned from (lambda (X) ...), and (lambda (X) ...) gets passed into (F ...)! Thus the magical property that

(Y F)
(F (Y F))
(F (F (Y F)))
(F (F (F (Y F))))

are all the same function! Which explains part of the MIT Scheme logo:

(Y F) = (F (Y F))