E-graph

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

be a set of uninterpreted functions, where

consisting of functions of arity

be a countable set of opaque identifiers that may be compared for equality, called e-class IDs.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:[1] In addition to the above structure, a valid e-graph conforms to several data structure invariants.

[2] Two e-nodes are equivalent if they are in the same e-class.

The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes

The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

operations from the union-find that preserve the e-graph invariants.

An e-graph can also be formulated as a bipartite graph

where There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.

be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols.

A term containing variables is called a pattern, a term without variables is called ground.

represents a ground term

e-matching is an operation that takes a pattern

is an e-class ID such that the term

There are several known algorithms for e-matching,[4][5] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.

to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class.

[7] There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem.

However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.

[8] Equality saturation is a technique for building optimizing compilers using e-graphs.

[10] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached.

After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

E-graphs are used in automated theorem proving.

They are a crucial part of modern SMT solvers such as Z3[11] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.

[12] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.

[13] E-graphs are also used in the Simplify theorem prover of ESC/Java.

[14] Equality saturation is used in specialized optimizing compilers,[15] e.g. for deep learning[16] and linear algebra.

[17] Equality saturation has also been used for translation validation applied to the LLVM toolchain.

[18] E-graphs have been applied to several problems in program analysis, including fuzzing,[19] abstract interpretation,[20] and library learning.