Stein's example

In decision theory and estimation theory, Stein's example (also known as Stein's phenomenon or Stein's paradox) is the observation that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately.

It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.

[1] An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters.

Suppose the measurements are known to be independent, Gaussian random variables, with mean

Under these conditions, it is intuitive and common to use each measurement as an estimate of its corresponding parameter.

This so-called "ordinary" decision rule can be written as

The quality of such an estimator is measured by its risk function.

A commonly used risk function is the mean squared error, defined as

Surprisingly, it turns out that the "ordinary" decision rule is suboptimal (inadmissible) in terms of mean squared error when

In other words, in the setting discussed here, there exist alternative estimators which always achieve lower mean squared error, no matter what the value of

Thus, Stein's example can be simply stated as follows: The "ordinary" decision rule of the mean of a multivariate Gaussian distribution is inadmissible under mean squared error risk.

Many simple, practical estimators achieve better performance than the "ordinary" decision rule.

towards a particular point (such as the origin) by an amount inversely proportional to the distance of

An alternative proof is due to Larry Brown: he proved that the ordinary estimator for an

-dimensional multivariate normal mean vector is admissible if and only if the

the new estimator will improve at least one of the individual mean square errors

whose mean square error is improved, and its improvement more than compensates for any degradation in mean square error that might occur for another

mean square errors are improved, so you can't use the Stein estimator only for those parameters.

Stein's example is surprising, since the "ordinary" decision rule is intuitive and commonly used.

To demonstrate the unintuitive nature of Stein's example, consider the following real-world example.

Suppose we are to estimate three unrelated parameters, such as the US wheat yield for 1993, the number of spectators at the Wimbledon tennis tournament in 2001, and the weight of a randomly chosen candy bar from the supermarket.

Suppose we have independent Gaussian measurements of each of these quantities.

Stein's example now tells us that we can get a better estimate (on average) for the vector of three parameters by simultaneously using the three unrelated measurements.

At first sight it appears that somehow we get a better estimator for US wheat yield by measuring some other unrelated statistics such as the number of spectators at Wimbledon and the weight of a candy bar.

However, we have not obtained a better estimator for US wheat yield by itself, but we have produced an estimator for the vector of the means of all three random variables, which has a reduced total risk.

Also, a specific set of the three estimated mean values obtained with the new estimator will not necessarily be better than the ordinary set (the measured values).

We may simplify the middle term by considering a general "well-behaved" function

met the "well-behaved" condition (it doesn't, but this can be remedied—see below), we would have and so Then returning to the risk function of

It remains to justify the use of This function is not continuously differentiable, since it is singular at

However, the function is continuously differentiable, and after following the algebra through and letting