Abstract Binding Trees (Part 1)

I’ve been thinking a bit about abstract binding trees lately because of a slackbot I wrote to as a prank. (I might be bad at pranks.)

Abstract binding trees (ABTs) are a fundamental idea in Robert Harper’s Practical Foundations of Programming Languages book. One of the most common things I’ve heard from people that try to do self-study with PFPL is that they struggle with the first chapter on ABTs, which unapologetically focuses on being rigorous and complete rather than beginner-friendly. Neel has a good introduction for people who want to understand how to program with ABTs. This is more aimed at people who want to read PFPL, but could use a gentler presentation of Chapter 1.

This is part 1 in a 2-or-3 part series.

Abstract Syntax Trees

As background, I’m assuming you’ve seen picture of a tree that looks something like this before:

Furthermore, I’m assuming you have some notion of how this kind of tree might be described in a Prolog-like or ML-like language like this:

node(node(leaf(),node(leaf(),leaf())),leaf())

…more or less. If it’s not, then I would recommend my tutoring sequence on Trees and the Structure of Data!

(Some nitpicks: In an ML-like language you’d probably write Node and Leaf instead of node and leaf. In Prolog you’d write leaf instead of leaf(). In Haskell you’d write Node (Node Leaf (Node Leaf Leaf)) Leaf. All totally the same idea. I’m using this notation because it matches PFPL, and PFPL picks its notation for good reasons.)

Holey Abstract Syntax Trees

Often times, we want to think about tree-structured data with some bits missing. Here’s a tree with some holes in it:

We want to be able to talk about specific holes, so we give these holes names, just to keep track of which one we’re talking about:

The fundamental interesting thing we get to do with trees-that-have-bits-missing is we plug things into the holes:

(We can also have new named holes in the things we plug in for other holes!)

Now, let’s go back to the “flat” notation from before. We’d represent our first “before” tree as node(node(x,y),leaf()). Remember when I made a point about how leaving the parentheses after leaf() before? In the notation used by PFPL, the parentheses are there to make it clear that a leaf is a concrete thing, and not a hole with a name “leaf.” This brings us to The First Big Idea.

The First Big Idea

I’m going start calling the holes-with-names (x, y, …) variables, and calling the names-of-concrete-things (node, leaf, …) operators.

Plugging in a tree for a variable is called substitution. Substitution always plugs in all the holes with a given name:

We write this in flat notation like this:

[leaf() / x] node(node(x,x),leaf()) = node(node(leaf(),leaf()),leaf())

This is the first big idea that we need in order to get from abstract syntax trees to abstract binding trees: that you can augment your tree structures with variables and get the ability to have holes in your data structure that can then be filled in with other things later.