Abstract interpretation

In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices.

It can be viewed as a partial execution of a computer program which gains information about its semantics (e.g., control-flow, data-flow) without performing all the calculations.

Its main concrete application is formal static analysis, the automatic extraction of information about the possible executions of computer programs; such analyses have two main usages: Abstract interpretation was formalized by the French computer scientist working couple Patrick Cousot and Radhia Cousot in the late 1970s.

[1][2] This section illustrates abstract interpretation by means of real-world, non-computing examples.

Assume a unique identifier for each person in the room, like a social security number in the United States.

To prove that someone is not present, all one needs to do is see if their social security number is not on the list.

If the name of a person is not found in the list, we may safely conclude that that person was not present; but if it is, we cannot conclude definitely without further inquiries, due to the possibility of homonyms (for example, two people named John Smith).

Note that this imprecise information will still be adequate for most purposes, because homonyms are rare in practice.

", keeping a list of all names and dates of births is unnecessary.

We may safely and without loss of precision restrict ourselves to keeping a list of the participants' ages.

In the case of computing, concrete, precise information is in general not computable within finite time and memory (see Rice's theorem and the halting problem).

One crucial requirement is to add enough vagueness so as to make problems manageable while still retaining enough precision for answering the important questions (such as "might the program crash?").

A semantics is a mathematical characterization of a possible behavior of the program.

For instance, the concrete semantics of an imperative programming language may associate to each program the set of execution traces it may produce – an execution trace being a sequence of possible consecutive states of the execution of the program; a state typically consists of the value of the program counter and the memory locations (globals, stack and heap).

The goal of static analysis is to derive a computable semantic interpretation at some point.

For some other operations, the abstraction may lose precision: for instance, it is impossible to know the sign of a sum whose operands are respectively positive and negative.

Sometimes a loss of precision is necessary to make the semantics decidable (see Rice's theorem and the halting problem).

The first large scale automated analysis of computer programs with abstract interpretation was motivated by the accident that resulted in the destruction of the first flight of the Ariane 5 rocket in 1996.

These two sets are related to each other by defining total functions that map elements from one to the other.

Program semantics are generally described using fixed points in the presence of loops or recursive procedures.

which satisfies the following conditions: In some cases, it is possible to define abstractions using Galois connections

of real numbers by enclosing convex polyhedra, there is no optimal abstraction to the disc defined by

); note that these are exact abstractions, since the set of possible outcomes for, say,

More complex formulas can be derived for multiplication, division, etc., yielding so-called interval arithmetics.

[5] Let us now consider the following very simple program: With reasonable arithmetic types, the result for z should be zero.

While each of the operations taken individually was exactly abstracted, their composition isn't.

The problem is evident: we did not keep track of the equality relationship between x and y; actually, this domain of intervals does not take into account any relationships between variables, and is thus a non-relational domain.

Non-relational domains tend to be fast and simple to implement, but imprecise.

Some examples of relational numerical abstract domains are: and combinations thereof (such as the reduced product,[2] cf.

When one chooses an abstract domain, one typically has to strike a balance between keeping fine-grained relationships, and high computational costs.

Example: abstraction of integer sets (red) to sign sets (green)
Combination of interval arithmetic ( green ) and congruence mod 2 on integers ( cyan ) as abstract domains to analyze a simple piece of C code ( red : concrete sets of possible values at runtime). Using the congruence information ( 0 =even, 1 =odd), a zero division can be excluded. (Since only one variable is involved, relational vs. non-relational domains is not an issue here.)
A 3-dimensional convex example polyhedron describing the possible values of 3 variables at some program point. Each of the variables may be zero, but all three can't be zero simultaneously. The latter property cannot be described in the interval arithmetics domain.