Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory.
[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.
Note that the name "Stein's lemma" is also commonly used[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence.
This result is also known as the Chernoff–Stein lemma[3] and is not related to the lemma discussed in this article.
Suppose X is a normally distributed random variable with expectation μ and variance σ2.
Further suppose g is a differentiable function for which the two expectations E(g(X) (X − μ)) and E(g ′(X)) both exist.
(The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.)
Then In general, suppose X and Y are jointly normally distributed.
Then For a general multivariate Gaussian random vector
it follows that Similarly, when
Stein's lemma can be used to stochastically estimate gradient:
are IID samples from the standard normal distribution
This form has applications in Stein variational gradient descent[4] and Stein variational policy gradient.
[5] The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is Since
we get from integration by parts: The case of general variance
Isserlis' theorem is equivalently stated as
Cov (
{\displaystyle \operatorname {E} (X_{1}f(X_{1},\ldots ,X_{n}))=\sum _{i=1}^{n}\operatorname {Cov} (X_{1},X_{i})\operatorname {E} (\partial _{X_{i}}f(X_{1},\ldots ,X_{n})).}
is a zero-mean multivariate normal random vector.
Suppose X is in an exponential family, that is, X has the density Suppose this density has support
η ′
is any differentiable function such that
η ′
Then The derivation is same as the special case, namely, integration by parts.
, then it could be the case that
lim
η
To see this, simply put
with infinitely spikes towards infinity but still integrable.
Extensions to elliptically-contoured distributions also exist.