Linear least squares

It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals.

the goal of solving (1) exactly is typically replaced by finding the value of

The three main linear least squares formulations are: Other formulations include: In OLS (i.e., assuming unweighted observations), the optimal value of the objective function is found by substituting the optimal expression for the coefficient vector:

It can be shown from this[9] that under an appropriate assignment of weights the expected value of S is

For WLS, the ordinary objective function above is replaced for a weighted average of residuals.

In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model.

Linear least squares problems are convex and have a closed-form solution that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations.

If prior distributions are available, then even an underdetermined system can be solved using the Bayesian MMSE estimator.

In statistics, linear least squares problems correspond to a particularly important type of statistical model called linear regression which arises as a particular form of regression analysis.

The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and statistical inferences related to these being dealt with in the articles just mentioned.

, has the minimum variance of all estimators that are linear combinations of the observations.

Note particularly that this property is independent of the statistical distribution function of the errors.

However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased.

If the conditions of the Gauss–Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be.

[11] These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.

An assumption underlying the treatment given above is that the independent variable, x, is free of error.

This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure.

[12][13] In some cases the (weighted) normal equations matrix XTX is ill-conditioned.

[citation needed] In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate.

[citation needed] Various regularization techniques can be applied in such cases, the most common of which is called ridge regression.

Another drawback of the least squares estimator is the fact that the norm of the residuals,

is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter

This is an example of more general shrinkage estimators that have been applied to regression problems.

The primary application of linear least squares is in data fitting.

The approach chosen then is to find the minimal possible value of the sum of squares of the residuals

Because of exploratory data analysis or prior knowledge of the subject matter, the researcher suspects that the

-values contain some uncertainty or "noise", because of the phenomenon being studied, imperfections in the measurements, etc.

In other words, the researcher would like to solve the system of linear equations

Suppose that the hypothetical researcher wishes to fit a parabola of the form

The figure shows an extension to fitting the three parameter parabola using a design matrix