Bayes estimator

[1] If the prior is improper then an estimator which minimizes the posterior expected loss for each

Using the MSE as risk, the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution,[3] This is known as the minimum mean square error (MMSE) estimator.

In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.

Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter.

The MSE is the most common risk function in use, primarily due to its simplicity.

Other loss functions can be conceived, although the mean squared error is the most widely used and validated.

has thus far been assumed to be a true probability distribution, in that However, occasionally this can be a restrictive requirement.

, but this would not be a proper probability distribution since it has infinite mass, Such measures

In this case, the posterior expected loss is typically well-defined and finite.

Recall that, for a proper prior, the Bayes estimator minimizes the posterior expected loss.

[2] A typical example is estimation of a location parameter with a loss function of the type

This yields so the posterior expected loss The generalized Bayes estimator is the value

This is equivalent to minimizing In this case it can be shown that the generalized Bayes estimator has the form

This is done under the assumption that the estimated parameters are obtained from a common prior.

There are both parametric and non-parametric approaches to empirical Bayes estimation.

[4] The following is a simple example of parametric empirical Bayes estimation.

using the maximum likelihood approach: Next, we use the law of total expectation to compute

For example, the generalized Bayes estimator of a location parameter θ based on Gaussian samples (described in the "Generalized Bayes estimator" section above) is inadmissible for

be a sequence of Bayes estimators of θ based on an increasing number of measurements.

for large n. To this end, it is customary to regard θ as a deterministic parameter whose true value is

In other words, for large n, the effect of the prior probability on the posterior is negligible.

The relations between the maximum likelihood and Bayes estimators can be shown in the following simple example.

On the other hand, when n is small, the prior information is still relevant to the decision problem and affects the estimate.

In applications, one often knows very little about fine details of the prior distribution; in particular, there is no reason to assume that it coincides with B(a,b) exactly.

In such a case, one possible interpretation of this calculation is: "there is a non-pathological prior distribution with the mean value 0.5 and the standard deviation d which gives the weight of prior information equal to 1/(4d2)-1 bits of new information."

Another example of the same phenomena is the case when the prior estimate and a measurement are normally distributed.

Combining this prior with n measurements with average v results in the posterior centered at

Compare to the example of binomial distribution: there the prior has the weight of (σ/Σ)²−1 measurements.

One can see that the exact weight does depend on the details of the distribution, but when σ≫Σ, the difference becomes small.

The Internet Movie Database uses a formula for calculating and comparing the ratings of films by its users, including their Top Rated 250 Titles which is claimed to give "a true Bayesian estimate".