Formal Introduction to Unification
$$ \def\por{\ \ | \ \ } \def\dom{\textsf{dom}} \def\tsf#1{\textsf{#1}} \def\unifyj#1#2#3{#1 \sim #2 \Downarrow #3} \def\dict#1#2{\textsf{Dict}\left( #1, #2 \right)} \def\pf{\hookrightarrow}%\rightharpoonup \def\tone{{t_1}} \def\ttwo{{t_2}} \def\rule#1#2#3{\dfrac{#3}{#2}\ {#1}} \def\eqdef{\overset{\textsf{def}}{=}} \def\freevars{\textsf{free}} $$
Unification is probably one of the most important concepts in computer science. If you are not aware, unification is basically the process you undergo to find out how two syntactic objects like $a+b$ and $a+2c$ can be the same thing. In this case, you replace $b$ with $2c$. Note that you don’t replace $2c$ with $b$, as $2\cdot c$ is less general than $b$, since $2\cdot c$ contains the literal $2$.
Daniel created a language called Bergamot that allows you to write expressions and unify them, while being able to see a trace of the unification process. Prolog is a very popular language that is based on unification, and belongs to the class of Logic Programming Languages. There are several projects that implement unification algorithms and give some sort of trace of the process. Though, Daniel’s Bergamot is distinguished among these, as Bergamot produces a specific kind of trace called a proof tree.
Proof trees can come in different forms. The kind of proof trees that Bergamot outputs are widely used in Programming Language Theory, and are the kind that I am interested in. They are generated by recursive application of inference rules (see Bergamot post or my post) of the form $$ \dfrac{P_1\quad \ldots \quad P_n}{Q}\ \texttt{RuleName}. $$
Proof trees are great explanations. But how do you make them? By hand, it is usually not so hard, since humans are generally pretty good at unification. But how does a computer do it? More precisely, how do you make a program that looks at a bunch of rules, along with a conclusion, and produces a proof tree?
While the unification process has been formalized in many places, the process of proof tree construction has not. I really want to formally describe the proof tree generation process in its entirety, but in order for it to make any sense, I need to tell you about how unification works. Here I will try to give a formal(ish) intro to unification. Then later, I will tell you about how you can make proof trees.
Unification
Two terms $t_1$ and $t_2$ are unifiable if we can replace shared variables of $t_1$ and $t_2$ with certain terms, resulting in two equal terms. This replacement information is encoded in an object called a substitution, which is denoted by $\sigma$ (and sometimes $\tau$). You can think of substitutions as a dictionary that maps variables to terms. The result of replacing $t$’s variables with their coresponding terms in $\sigma$, is denoted by $t[\sigma]$. Then, if $t_1[\sigma]=t_2[\sigma]$ is the case, then $\sigma$ is called a unifier for $t_1$ and $t_2$. In other words, $\sigma$ unifies $t_1$ and $t_2$. Now, to improve the initial definition, two terms $t_1$ and $t_2$ are said to be unifiable if they have a unifier.
The formalization starts with the language of terms for which we are unifying. Terms can be one of two things, (i) a variable $x\in \textsf{Var}$, or (ii) a construct $f(t_1,\ldots,t_n)$ of zero or more terms $t_1,\ldots,t_n\in\tsf{Term}$, with the constructor $f\in \tsf{Name}$. The $\tsf{Term}$ language is defined by the grammar $$ %\begin{align*} %t\in\tsf{Term} ::= & \ x \por f(t^n)\ %t^n \eqdef & \begin{cases} & \text{if $n = 0$}\ t_1,\ldots,t_n & \text{otherwise}\end{cases} %\end{align*} t\in\tsf{Term} ::= x \por f(t_1,\ldots,t_n) $$ where $x\in \tsf{Var}$, $f\in \tsf{Name}$, and $n\in\N$. It may seem odd that constructors are just names, but this is what allows us to represent an expression from any language as a term.
Consider our $a+b$ and $a+2c$ example from above. We can write $a+b$ as $+(a,b)$, where $a,b\in \tsf{Var}$ and $+\in \tsf{Name}$. Then for $a+2c=a+2\times c$, we have $+(a,\times(2(),c))$, where $a,c\in \tsf{Var}$ and the constructors $+,\times,2\in \tsf{Name}$. For the sake of aesthetics, constructed terms with zero arguments will be written without parenthesis. That is, if $f\in \tsf{Name}$, then $f()\in\tsf{Term}$ is written as $f\in\tsf{Term}$. In summary, $$ \begin{align*} a+b & \to +(a,b)\\[8pt] a+2c=a+2\times c & \to +(a,\times(2(),c))=+(a,\times(2,c)) \end{align*} $$ where $a,b,c\in \tsf{Var}$ and $+,\times,2\in \tsf{Name}$.
When a variable $x\in\tsf{Var}$ is used in a term $t\in \tsf{Term}$, it represents an abstraction over terms that can exist in place of $x$. Another view that I like, is that terms which have one or more variables act as templates for all other terms that may be unified. The set of variables of a term $t$ is called the free variables of $t$, and is denoted by $\freevars(t)$. The function $\freevars$ from terms to sets of variables can easily be defined, $$ \freevars : \tsf{Term}\to \mathcal{P}\left(\tsf{Var}\right) \qquad\qquad \freevars(t)\eqdef\begin{cases}\lbrace x \rbrace & \text{if $t=x\in\tsf{Var}$}\\ \bigcup\limits_{i=1}^n \freevars(t_i) & \text{if $t=f(t_1,\ldots,t_n)$}\\ \varnothing &\text{otherwise}\end{cases} $$ A term $t$ is said to be closed if $t$ does not have any variables, that is $\freevars(t)=\varnothing$.
Substitutions
Substitutions are objects that carry information about replacing variables with terms. Substitutions can be represented in several different ways (we will pick one later), but we can describe them generally by certain rules they must follow.
The set of substitutions is denoted by $\tsf{Subst}$. Along with the set $\tsf{Subst}$, we must have two operations,
- a method for applying a substitution $\sigma \in \tsf{Subst}$ to a term $t$, by replacing every occurance of a free variable of $t$ with its coresponding term mapped by $\sigma$, resulting in the term $t[\sigma]\in\tsf{Term}$,
- and a function $\odot : \tsf{Subst}\times \tsf{Subst} \to \tsf{Subst}$, which is way to compose any two substitutions $\sigma,\tau\in\tsf{Subst}$, resulting in a new substitution $\sigma\odot\tau=\tau\sigma\in\tsf{Subst}$ by applying $\tau$ first, then $\sigma$. (it is convention to write $\tau\sigma$ to mean $\sigma \odot \tau$)
Composing substitutions must obey certain rules, that is, for all $t\in\tsf{Term}$ and $\sigma,\tau\in\tsf{Subst}$, $$ (t[\tau])[\sigma]=t[\tau\sigma]=t[\sigma\odot\tau], $$ meaning that applying $\tau$ to $t$, then applying $\sigma$ is the same as applying $\tau\sigma=\sigma\odot\tau$ to $t$. Substitutions must also distribute through constructed terms, that is, $$ f(t_1,\ldots,t_n)[\sigma]=f(t_1[\sigma],\ldots,t_n[\sigma]). $$
Partial Functions
Suppose $A$ and $B$ are sets and $f\subseteq A\times B$ is a relation on $A$ and $B$. The domain of $f$ is the set $$ \dom(f)=\lbrace a\in A \ | \ \exists b\in B:(a,b) \in f\rbrace \subseteq A. $$ A relation $f$ is a partial function from $A$ to $B$ if $A’=\dom(f)\subseteq A$ and $f$ is a function from $A’$ to $B$, that is, $f:A’ \to B$. When $f$ is a partial function, we write $f: A\pf B$.
Partial functions are one such representation for substitutions. We will consider substitutions to be partial functions from variables to terms, $$ \tsf{Subst} = \left\lbrace \sigma \subseteq \tsf{Var}\times \tsf{Term} \ | \ \sigma : \tsf{Var} \pf \tsf{Term} \right\rbrace. $$ The operation for applying a substitution (partial function) $\sigma \in \tsf{Subst}$ to a term $t\in \tsf{Term}$, is defined in a pretty straightforward way, $$ t[\sigma]\eqdef \begin{cases}\sigma(x) & \text{if $t=x\in \tsf{Var}$ and $x\in\dom(\sigma)$}\\ f(t_1[\sigma],\ldots,t_n[\sigma]) & \text{if $t=f(t_1,\ldots,t_n)$}\\ t & \text{otherwise}\end{cases} $$ which means every variable $x\in\freevars(t)$ will be replaced by the corresponding term $\sigma(x)$, provided $x$ is a key in $\sigma$, that is $x\in\dom(\sigma)$. Otherwise, $x$ will remain unchanged in $t$ if $x\not\in\dom(\sigma)$.
As an example, consider the substitution $\sigma=\lbrace (b,\times(2,c))\rbrace\in\tsf{Subst}$, and then applying $\sigma$ to the term $+(a,b)$, $$ +(a,b)[\sigma]=+(a,b)[\lbrace (b,\times(2,c))\rbrace]=+(a[\lbrace (b,\times(2,c))\rbrace],b[\lbrace (b,\times(2,c))\rbrace])=+(a,\times(2,c)). $$
Now we must define the operation $\odot : \tsf{Subst}\times \tsf{Subst} \to \tsf{Subst}$ that composes substitutions $\sigma, \tau \in \tsf{Subst}$. The substitution $\sigma \odot \tau\in \tsf{Subst}$ is defined by $$ \sigma\odot \tau\eqdef x\mapsto \begin{cases}\tau(x)[\sigma] & \text{if $x\in \dom(\tau)$}\\ \sigma(x) & \text{if $x\in \dom(\sigma)\setminus \dom(\tau)$} \end{cases} $$ for all $x\in \dom(\tau)\cup \dom(\sigma)$. Alternatively, we could have used an equivalent definition to the one above, $$ \sigma\odot \tau\eqdef\lbrace (x[\tau])[\sigma]\ | \ x\in \dom(\tau)\cup \dom(\sigma)\rbrace. $$ To see how this works, consider the two substitutions, $$ \sigma = \lbrace a \mapsto c, c\mapsto b\rbrace \qquad\qquad \tau=\lbrace b\mapsto \times(2,c)\rbrace, $$ where $(x,y)$ is written as $x\mapsto y$. We then have $$ \tau\sigma=\sigma \odot \tau =\lbrace a \mapsto c, c\mapsto b\rbrace \odot \lbrace b\mapsto \times(2,c)\rbrace = \lbrace a \mapsto c, b\mapsto \times (2,d)\rbrace $$ and so $$ \begin{gather*} +(a,b)\overset{\tau}{\longrightarrow}+(a,\times(2,c))\overset{\sigma}{\longrightarrow}+(a,\times(2,d))\\[4pt] +(a,b)\overset{\tau\sigma}{\longrightarrow}+(a,\times(2,d)). \end{gather*} $$ If you like, here is an implementation of $\odot$ in Python,
def compose(sigma,tau):
domain = sigma.keys() + tau.keys()
return {x : apply(tau[x],sigma) if x in tau.keys() else sigma[x] for x in domain}
The Rules for Unification
Unification in general is a nondeterministic process. Two terms $t_1$ and $t_2$ can be unified in many different ways, as in, there may be more than one unifier $\sigma$ such that $\sigma$ unifies $t_1$ and $t_2$, that is $t_1[\sigma]=t_2[\sigma]$. Thankfully, there is a deterministic process to find a unifier with minimal effort.
The statement that $t_1,t_2\in \tsf{Term}$ are unified by $\sigma \in \tsf{Subst}$ is written $\unifyj {t_1} {t_2} \sigma$. This can be formalized as a relation on pairs of terms and substitutions, $\unifyj \cdot \cdot \cdot \subseteq \tsf{Term} \times \tsf{Term} \times \tsf{Subst}$, and can be describd using inference rules.
Let’s start off with the most basic rule, $$ \rule{\tsf{UnifySame}}{\unifyj \tone \ttwo \varnothing}{\tone = \ttwo} $$ which says that the empty substitution $\varnothing\in \tsf{Subst}$ unifies $\tone$ and $\ttwo$ if $\tone=\ttwo$. This is obvious since if two terms are the same, there is nothing to substitute to make them the same. The next two rules are applied when one of the terms in question is a variable, $$ \rule{\tsf{UnifyVar}_L}{\unifyj x t {\lbrace x\mapsto t\rbrace}}{x\not\in \freevars(t)}\qquad \rule{\tsf{UnifyVar}_R}{\unifyj t x {\lbrace x\mapsto t\rbrace}}{t\not\in \tsf{Var} \qquad x\not\in \freevars(t)}. $$ The $\tsf{UnifyVar}_L$ rule says that a variable $x$ is unified with any term $t$ by the substitution that maps $x$ to $t$, provided that $x$ is not used somewhere in $t$. The right version of this rule is the same, except that $t$ cannot be a variable.
This restriction prevents circularity. With this, unification for variables is covered. For constructed terms, we have the rule $$ \dfrac{\unifyj{t_n}{s_n}{\sigma_1} \qquad \unifyj{f_1(t_1,\ldots,t_{n-1})[\sigma_1]}{f_2(s_1,\ldots,s_{n-1})[\sigma_1]}{\sigma_2}}{\unifyj{f_1(t_1,\ldots,t_n)}{f_2(s_1,\ldots,s_n)}{\sigma_1\sigma_2}}\ \textsf{UnifyCons}_n $$ which unifies two constructed terms, composing the substitutions generated by their arguments. With this, we need the rule for constructed terms with no arguments, $$ \rule{\tsf{UnifyCons}_0}{\unifyj {f_1()} {f_2()} \varnothing}{f_1 = f_2} $$ which acts like the base case for unifying constructed terms.
These rules describe a process for finding the most minimal unifier for two terms. The relation $\unifyj \cdot \cdot \cdot$ described is in fact a partial function from $\tsf{Term}\times \tsf{Term}$ to $\tsf{Subst}$. To see it this way, we will define the $\sim$ operation to mean $$ \tone \sim \ttwo =\sigma \iff \unifyj \tone \ttwo \sigma $$ and thus, $$ \sim \ : \tsf{Term}\times \tsf{Term} \pf \tsf{Subst}. $$ $\sim$ is partial because two terms may not be unifiable. By adding an error value $\bot$ to the codomain $\tsf{Subst}$, we can add rules that turn $\unifyj \cdot \cdot \cdot\subseteq \tsf{Term}\times \tsf{Term} \times (\tsf{Subst} \cup \lbrace\bot\rbrace)$ into a function, which means $$ \sim \ : \tsf{Term}\times \tsf{Term} \to (\tsf{Subst} \cup \lbrace\bot\rbrace) $$
and we can know if two terms cannot be unified.
If $x\in \freevars(t)$, then $x$ cannot be unified with $t$, as no simultaneous substitution would yield the same two terms. For this, we have the rules, $$ \rule{\tsf{FailCircular}_L}{\unifyj x t \bot}{x\in\freevars(t)}\qquad \rule{\tsf{FailCircular}_R}{\unifyj t x \bot}{x\in\freevars(t)}. $$
Unification should also not be possible for constructed terms that either have two different constructors $f_1,f_2\in \tsf{Name}$, or have a different number of arguments, $$ \rule{\tsf{FailDiffCons}}{\unifyj {f_1()} {f_2()} \bot }{f_1 \neq f_2}\qquad \rule{\tsf{FailDiffArgs}}{\unifyj {f_1(t_1,\ldots,t_n)} {f_2(s_1,\ldots,s_k)} \bot}{n\neq k}. $$ If an arguement fails, then the rest should fail, $$ \rule{\tsf{FailArg}}{\unifyj {f_1(t_1,\ldots,t_n)} {f_2(s_1,\ldots,s_n)} \bot}{\unifyj {t_n}{s_n} \bot}. $$ Lastly, failure should propagate up through constructed terms, $$ \rule{\tsf{FailProp}}{\unifyj {f_1(t_1,\ldots,t_n)} {f_2(s_1,\ldots,s_k)} \bot}{\unifyj {f_1(t_1,\ldots,t_{n-1})} {f_2(s_1,\ldots,s_{k-1})} \bot}. $$ This last rule relies on $\tsf{FailDiffCons}$ and $\tsf{FailDiffArgs}$, and ties error checking for constructed terms all together.