Stein's lemma

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory.

[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

Note that the name "Stein's lemma" is also commonly used[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence.

This result is also known as the Chernoff–Stein lemma[3] and is not related to the lemma discussed in this article.

Suppose X is a normally distributed random variable with expectation μ and variance σ2.

Further suppose g is a differentiable function for which the two expectations E(g(X) (X − μ)) and E(g ′(X)) both exist.

(The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.)

Then In general, suppose X and Y are jointly normally distributed.

Then For a general multivariate Gaussian random vector

it follows that Similarly, when

Stein's lemma can be used to stochastically estimate gradient:

are IID samples from the standard normal distribution

This form has applications in Stein variational gradient descent[4] and Stein variational policy gradient.

[5] The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is Since

we get from integration by parts: The case of general variance

Isserlis' theorem is equivalently stated as

Cov ⁡ (

{\displaystyle \operatorname {E} (X_{1}f(X_{1},\ldots ,X_{n}))=\sum _{i=1}^{n}\operatorname {Cov} (X_{1},X_{i})\operatorname {E} (\partial _{X_{i}}f(X_{1},\ldots ,X_{n})).}

is a zero-mean multivariate normal random vector.

Suppose X is in an exponential family, that is, X has the density Suppose this density has support

η ′

is any differentiable function such that

η ′

Then The derivation is same as the special case, namely, integration by parts.

, then it could be the case that

lim

η

To see this, simply put

with infinitely spikes towards infinity but still integrable.

Extensions to elliptically-contoured distributions also exist.