Minimum mean square error

In the Bayesian setting, the term MMSE more specifically refers to estimation with quadratic loss function.

Since the posterior mean is cumbersome to calculate, the form of the MMSE estimator is usually constrained to be within a certain class of functions.

The term MMSE more specifically refers to estimation in a Bayesian setting with quadratic cost function.

This is in contrast to the non-Bayesian approach like minimum-variance unbiased estimator (MVUE) where absolutely nothing is assumed to be known about the parameter in advance and which does not account for such situations.

Two basic numerical approaches to obtain the MMSE estimate depends on either finding the conditional expectation

Direct numerical evaluation of the conditional expectation is computationally expensive since it often requires multidimensional integration usually done via Monte Carlo methods.

While these numerical methods have been fruitful, a closed form expression for the MMSE estimator is nevertheless possible if we are willing to make some compromises.

That is, it solves the following optimization problem: One advantage of such linear MMSE estimator is that it is not necessary to explicitly calculate the posterior probability density function of

are jointly Gaussian, it is not necessary to make this assumption, so long as the assumed distribution has well defined first and second moments.

The form of the linear estimator does not depend on the type of the assumed underlying distribution.

, we get Lastly, the covariance of linear MMSE estimation error will then be given by The first term in the third line is zero due to the orthogonality principle.

Thus the minimum mean square error achievable by such a linear estimator is For the special case when both

The above two equations allows us to interpret the correlation coefficient either as normalized slope of linear regression or as square root of the ratio of two variances When

can be solved twice as fast with the Cholesky decomposition, while for large sparse systems conjugate gradient method is more effective.

Here the required mean and the covariance matrices will be Thus the expression for the linear MMSE estimator matrix

The estimate for the linear observation process exists so long as the m-by-m matrix

is now a random variable, it is possible to form a meaningful estimate (namely its mean) even with no measurements.

Every new measurement simply provides additional information which may modify our original estimate.

An alternative form of expression can be obtained by using the matrix identity which can be established by post-multiplying by

as In this form the above expression can be easily compared with ridge regression, weighed least square and Gauss–Markov estimate.

In the Bayesian framework, such recursive estimation is easily facilitated using Bayes' rule.

It is more convenient to represent the linear MMSE in terms of the prediction error, whose mean and covariance are

The generalization of this idea to non-stationary cases gives rise to the Kalman filter.

As an important special case, an easy to use recursive expression can be derived when at each k-th time instant the underlying linear observation process yields a scalar such that

cannot be directly observed, these methods try to minimize the mean squared prediction error

are real Gaussian random variables with zero mean and its covariance matrix given by then our task is to find the coefficients

In terms of the terminology developed in the previous sections, for this problem we have the observation vector

A few weeks before the election, two independent public opinion polls were conducted by two different pollsters.

Suppose that a musician is playing an instrument and that the sound is received by two microphones, each of them located at two different places.

denote the sound produced by the musician, which is a random variable with zero mean and variance