In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.
[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, then
is the rate function, i.e. the convex conjugate of the cumulant-generating function, of
The Cramér–Rao bound is a corollary of this result.
Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P << Q.
Consider the natural exponential family of Q given by
{\displaystyle Q_{\theta }(A)={\frac {\int _{A}e^{\theta x}Q(dx)}{\int _{-\infty }^{\infty }e^{\theta x}Q(dx)}}={\frac {1}{M_{Q}(\theta )}}\int _{A}e^{\theta x}Q(dx)}
for every measurable set A, where
is the moment-generating function of Q.
(Note that Q0 = Q.)
{\displaystyle D_{KL}(P\parallel Q)=D_{KL}(P\parallel Q_{\theta })+\int _{\operatorname {supp} P}\left(\log {\frac {\mathrm {d} Q_{\theta }}{\mathrm {d} Q}}\right)\mathrm {d} P.}
By Gibbs' inequality we have
{\displaystyle D_{KL}(P\parallel Q)\geq \int _{\operatorname {supp} P}\left(\log {\frac {\mathrm {d} Q_{\theta }}{\mathrm {d} Q}}\right)\mathrm {d} P=\int _{\operatorname {supp} P}\left(\log {\frac {e^{\theta x}}{M_{Q}(\theta )}}\right)P(dx)}
Simplifying the right side, we have, for every real θ where
is called the cumulant-generating function.
Taking the supremum completes the process of convex conjugation and yields the rate function:
Let Xθ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions.
lim
lim
is the convex conjugate of the cumulant-generating function of
The left side of this inequality can be simplified as follows:
lim
lim
lim
lim
Taylor series for
{\displaystyle {\begin{aligned}\lim _{h\to 0}{\frac {D_{KL}(X_{\theta +h}\parallel X_{\theta })}{h^{2}}}&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\log \left({\frac {\mathrm {d} X_{\theta +h}}{\mathrm {d} X_{\theta }}}\right)\mathrm {d} X_{\theta +h}\\&=-\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\log \left({\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)\mathrm {d} X_{\theta +h}\\&=-\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\log \left(1-\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)\right)\mathrm {d} X_{\theta +h}\\&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\left[\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)+{\frac {1}{2}}\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}+o\left(\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}\right)\right]\mathrm {d} X_{\theta +h}&&{\text{Taylor series for }}\log(1-t)\\&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\left[{\frac {1}{2}}\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}\right]\mathrm {d} X_{\theta +h}\\&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\left[{\frac {1}{2}}\left({\frac {\mathrm {d} X_{\theta +h}-\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}\right]\mathrm {d} X_{\theta +h}\\&={\frac {1}{2}}{\mathcal {I}}_{X}(\theta )\end{aligned}}}
which is half the Fisher information of the parameter θ.
The right side of the inequality can be developed as follows:
This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is