Weighted least squares

WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

The fit of a model to a data point is measured by its residual,

, defined as the difference between a measured value of the dependent variable,

If the errors are uncorrelated and have equal variance, then the function

If, however, the measurements are uncorrelated but have different uncertainties, a modified approach might be adopted.

Aitken showed that when a weighted sum of squared residuals is minimized,

is the BLUE if each weight is equal to the reciprocal of the variance of the measurement

which, in a linear least squares system give the modified normal equations,

When the observational errors are uncorrelated and the weight matrix, W=Ω−1, is diagonal, these may be written as

When the errors are uncorrelated, it is convenient to simplify the calculations to factor the weight matrix as

The normal equations can then be written in the same form as ordinary least squares:

This is a type of whitening transformation; the last expression involves an entrywise division.

For non-linear least squares systems a similar argument shows that the normal equations should be modified as follows.

For this feasible generalized least squares (FGLS) techniques may be used; in this case it is specialized for a diagonal covariance matrix, thus yielding a feasible weighted least squares solution.

After the outliers have been removed from the data set, the weights should be reset to one.

The weights should, ideally, be equal to the reciprocal of the variance of the measurement.

Let the variance-covariance matrix for the observations be denoted by M and that of the estimated parameters by Mβ.

When unit weights are used (W = I, the identity matrix), it is implied that the experimental errors are uncorrelated and all equal: M = σ2I, where σ2 is the a priori variance of an observation.

where S is the minimum value of the weighted objective function:

The standard deviation is the square root of variance,

The true uncertainty in the parameters is larger due to the presence of systematic errors, which, by definition, cannot be quantified.

Note that even though the observations may be uncorrelated, the parameters are typically correlated.

It is often assumed, for want of any concrete evidence but often appealing to the central limit theorem—see Normal distribution#Occurrence and applications—that the error on each observation belongs to a normal distribution with a mean of zero and standard deviation

(given here): The assumption is not unreasonable when n >> m. If the experimental errors are normally distributed the parameters will belong to a Student's t-distribution with n − m degrees of freedom.

When n ≫ m Student's t-distribution approximates a normal distribution.

Note, however, that these confidence limits cannot take systematic error into account.

[4] When the number of observations is relatively small, Chebychev's inequality can be used for an upper bound on probabilities, regardless of any assumptions about the distribution of experimental errors: the maximum probabilities that a parameter will be more than 1, 2, or 3 standard deviations away from its expectation value are 100%, 25% and 11% respectively.

The sum of weighted residual values is equal to zero whenever the model function contains a constant term.

Left-multiply the expression for the residuals by XT WT:

Thus, in the motivational example, above, the fact that the sum of residual values is equal to zero is not accidental, but is a consequence of the presence of the constant term, α, in the model.