Inequalities in information theory

Inequalities are very important in the study of information theory.

finitely (or at most countably) supported random variables on the same probability space.

They satisfy the following inequalities (which together characterize the range of the marginal and joint entropies of two random variables): In fact, these can all be expressed as special cases of a single inequality involving the conditional mutual information, namely where

each denote the joint distribution of some arbitrary (possibly empty) subset of our collection of random variables.

is said to be entropic if there is a joint, discrete distribution of n random variables

is known to be convex and hence it can be characterized by the (infinitely many) linear inequalities satisfied by all entropic vectors, called entropic inequalities.

Matus[3] proved that no finite set of inequalities can characterize (by linear combinations) all entropic inequalities.

A great many important inequalities in information theory are actually lower bounds for the Kullback–Leibler divergence.

Even the Shannon-type inequalities can be considered part of this category, since the interaction information can be expressed as the Kullback–Leibler divergence of the joint distribution with respect to the product of the marginals, and thus these inequalities can be seen as a special case of Gibbs' inequality.

On the other hand, it seems to be much more difficult to derive useful upper bounds for the Kullback–Leibler divergence.

This is because the Kullback–Leibler divergence DKL(P||Q) depends very sensitively on events that are very rare in the reference distribution Q. DKL(P||Q) increases without bound as an event of finite non-zero probability in the distribution P becomes exceedingly rare in the reference distribution Q, and in fact DKL(P||Q) is not even defined if an event of non-zero probability in P has zero probability in Q.

This fundamental inequality states that the Kullback–Leibler divergence is non-negative.

[4] If P and Q are probability distributions on the real line with P absolutely continuous with respect to Q, and whose first moments exist, then where

Pinsker's inequality relates Kullback–Leibler divergence and total variation distance.

It states that if P, Q are two probability distributions, then where is the Kullback–Leibler divergence in nats and is the total variation distance.

is non-negative, i.e. Hirschman conjectured, and it was later proved,[6] that a sharper bound of

which is attained in the case of a Gaussian distribution, could replace the right-hand side of this inequality.

This is especially significant since it implies, and is stronger than, Weyl's formulation of Heisenberg's uncertainty principle.

This is a simple consequence of Pinsker's inequality.

(Note: the correction factor log 2 inside the radical arises because we are measuring the conditional mutual information in bits rather than nats.)

Several machine based proof checker algorithms are now available.

Proof checker algorithms typically verify the inequalities as either true or false.

[9]ITIP is a Matlab based proof checker for all Shannon type Inequalities.

Xitip is an open source, faster version of the same algorithm implemented in C with a graphical front end.

Xitip also has a built in language parsing feature which support a broader range of random variable descriptions as input.

AITIP and oXitip are cloud based implementations for validating the Shannon type inequalities.

oXitip uses GLPK optimizer and has a C++ backend based on Xitip with a web based user interface.

AITIP uses Gurobi solver for optimization and a mix of python and C++ in the backend implementation.

It can also provide the canonical break down of the inequalities in terms of basic Information measures.

[9]Quantum information-theoretic inequalities can be checked by the contraction map proof method.