Information projection

In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is where

is the "closest" distribution to q of all the distributions in P. The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex:[1]

This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.

and continuous in p, if P is closed and non-empty, then there exists at least one minimizer to the optimization problem framed above.

Furthermore, if P is convex, then the optimum distribution is unique.

The reverse I-projection also known as moment projection or M-projection is Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior.

will typically under-estimate the support of

to make sure KL divergence stays finite.

will typically over-estimate the support of

to make sure KL divergence stays finite.

The reverse I-projection plays a fundamental role in the construction of optimal e-variables.

The concept of information projection can be extended to arbitrary f-divergences and other divergences.

This probability-related article is a stub.