Anti-unification is the process of constructing a generalization common to two given symbolic expressions.
As in unification, several frameworks are distinguished depending on which expressions (also called terms) are allowed, and which expressions are considered equal.
If variables representing functions are allowed in an expression, the process is called "higher-order anti-unification", otherwise "first-order anti-unification".
If the generalization is required to have an instance literally equal to each input expression, the process is called "syntactical anti-unification", otherwise "E-anti-unification", or "anti-unification modulo theory".
An anti-unification algorithm should compute for given expressions a complete and minimal generalization set, that is, a set covering all generalizations and containing no redundant members, respectively.
Depending on the framework, a complete and minimal generalization set may have one, finitely many, or possibly infinitely many members, or may not exist at all;[note 1] it cannot be empty, since a trivial generalization exists in any case.
For first-order syntactical anti-unification, Gordon Plotkin[1][2] gave an algorithm that computes a complete and minimal singleton generalization set containing the so-called "least general generalization" (lgg).
The latter means the process of solving systems of inequations, that is of finding values for the variables such that all given inequations are satisfied.
[note 2] This task is quite different from finding generalizations.
Formally, an anti-unification approach presupposes Given a set
of variable symbols, a set
of constant symbols and sets
, the set of (unsorted first-order) terms
is recursively defined to be the smallest set with the following properties:[3] For example, if x ∈ V is a variable symbol, 1 ∈ C is a constant symbol, and add ∈ F2 is a binary function symbol, then x ∈ T, 1 ∈ T, and (hence) add(x,1) ∈ T by the first, second, and third term building rule, respectively.
The latter term is usually written as x+1, using Infix notation and the more common operator symbol + for convenience.
refers to a substitution mapping each variable
Applying that substitution to a term t is written in postfix notation as
; it means to (simultaneously) replace every occurrence of each variable
The result tσ of applying a substitution σ to a term t is called an instance of that term t. As a first-order example, applying the substitution
is literal (syntactic) identity of terms, a term may be both more general and more special than another one only if both terms differ just in their variable names, not in their syntactic structure; such terms are called variants, or renamings of each other.
of anti-unifiers is called complete if each generalization subsumes some term
is called minimal if none of its members subsumes another one.
The framework of first-order syntactical anti-unification is based on
has a complete, and obviously minimal, singleton solution set
is called the least general generalization (lgg) of the problem, it has an instance syntactically equal to
are both complete and minimal solution sets of the same syntactical anti-unification problem, then
Plotkin[1][2] has given an algorithm to compute the lgg of two given terms.
[note 4] The algorithm consists of two rules: For example,
; this least general generalization reflects the common property of both inputs of being square numbers.
Plotkin used his algorithm to compute the "relative least general generalization (rlgg)" of two clause sets in first-order logic, which was the basis of the Golem approach to inductive logic programming.