Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function.

It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.

[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below.

The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability.

Then Jensen's inequality can be applied to get[6] The same result can be equivalently stated in a probability theory setting, by a simple change of notation.

More generally, let T be a real topological vector space, and X a T-valued integrable random variable.

This general statement reduces to the previous ones when the topological vector space T is the real axis, and

Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where X is a real number (see figure).

Assuming a hypothetical distribution of X values, one can immediately identify the position of

This "proves" the inequality, i.e. with equality when φ(X) is not strictly convex, e.g. when it is a straight line, or when X follows a degenerate distribution (i.e. is a constant).

The finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2.

In order to obtain the general inequality from this finite form, one needs to use a density argument.

The finite form can be rewritten as: where μn is a measure given by an arbitrary convex combination of Dirac deltas: Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of

Let X be an integrable random variable that takes values in a real topological vector space T. Since

evaluated at x in the direction y is well-defined by It is easily seen that the subdifferential is linear in y [citation needed] (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for θ = 1, one gets In particular, for an arbitrary sub-σ-algebra

on both sides of the previous expression, we get the result since: by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation: Suppose Ω is a measurable subset of the real line and f(x) is a non-negative function such that In probabilistic language, f is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals: If g is any real-valued measurable function and

is convex over the range of g, then If g(x) = x, then this form of the inequality reduces to a commonly used special case: This is applied in Variational Bayesian methods.

dividing n. Let Ω = {x1, ... xn}, and take μ to be the counting measure on Ω, then the general form reduces to a statement about sums: provided that λi ≥ 0 and There is also an infinite discrete form.

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving: where the expected values are with respect to some probability distribution in the random variable X.

If p(x) is the true probability density for X, and q(x) is another density, then applying Jensen's inequality for the random variable Y(X) = q(X)/p(X) and the convex function φ(y) = −log(y) gives Therefore: a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q.

a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed.

The relation between risk aversion and declining marginal utility for scalar outcomes can be stated formally with Jensen's inequality: risk aversion can be stated as preferring a certain outcome

[11] Beyond its classical formulation for real numbers and convex functions, Jensen’s inequality has been extended to the realm of operator theory.

Hansen and Pedersen[12] established a definitive version of this inequality by considering genuine non‐commutative convex combinations.

satisfying then the following operator Jensen inequality holds: This result shows that the convex transformation “respects” non-commutative convex combinations, thereby extending the classical inequality to operators without the need for additional restrictions on the interval of definition.

, then one has This inequality naturally extends to C*-algebras equipped with a finite trace and is particularly useful in applications ranging from quantum statistical mechanics to information theory.

Extensions to continuous fields of operators and to settings involving conditional expectations on C-algebras further illustrate the broad applicability of these generalizations.

Jensen's inequality generalizes the statement that a secant line of a convex function lies above its graph.
Visualizing convexity and Jensen's inequality
A graphical "proof" of Jensen's inequality for the probabilistic case. The dashed curve along the X axis is the hypothetical distribution of X , while the dashed curve along the Y axis is the corresponding distribution of Y values. Note that the convex mapping Y ( X ) increasingly " stretches " the distribution for increasing values of X .
This is a proof without words of Jensen's inequality for n variables. Without loss of generality, the sum of the positive weights is 1 . It follows that the weighted point lies in the convex hull of the original points, which lies above the function itself by the definition of convexity. The conclusion follows. [ 10 ]