Generalized least squares

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model.

It is used when there is a non-zero amount of correlation between the residuals in the regression model.

GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods.

If this is unknown, estimating the covariance matrix gives the method of feasible generalized least squares (FGLS).

In standard linear regression models, one observes data

is a vector of unknown constants, called "regression coefficients", which are estimated from the data.

by minimizing the squared Mahalanobis length of this residual vector:

The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with

GLS is equivalent to applying ordinary least squares (OLS) to a linearly transformed version of the data.

can be efficiently estimated by applying OLS to the transformed data, which requires minimizing the objective,

This transformation effectively standardizes the scale of and de-correlates the errors.

A special case of GLS, called weighted least squares (WLS), occurs when all the off-diagonal entries of Ω are 0.

This situation arises when the variances of the observed values are unequal or when heteroscedasticity is present, but no correlations exist among the observed variances.

[2] Ordinary least squares can be interpreted as maximum likelihood estimation with the prior that the errors are independent and normally distributed with zero mean and common variance.

In GLS, the prior is generalized to the case where errors may not be independent and may have differing variances.

, the conditional probability density function of the errors are assumed to be:

where the optimization problem has been re-written using the fact that the logarithm is a strictly increasing function and the property that the argument solving an optimization problem is independent of terms in the objective function which do not involve said terms.

,[3] using an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator.

In FGLS, modeling proceeds in two stages: Whereas GLS is more efficient than OLS under heteroscedasticity (also spelled heteroskedasticity) or autocorrelation, this is not true for FGLS.

The feasible estimator is asymptotically more efficient (provided the errors covariance matrix is consistently estimated), but for a small to medium-sized sample, it can be actually less efficient than OLS.

This is why some authors prefer to use OLS and reformulate their inferences by simply considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial autocorrelation.

However, for large samples, FGLS is preferred over OLS under heteroskedasticity or serial correlation.

One case in which FGLS might be inconsistent is if there are individual-specific fixed effects.

For large samples (i.e., asymptotically), all properties are (under appropriate conditions) common with respect to GLS, but for finite samples, the properties of FGLS estimators are unknown: they vary dramatically with each particular model, and as a general rule, their exact distributions cannot be derived analytically.

For finite samples, FGLS may be less efficient than OLS in some cases.

Thus, while GLS can be made feasible, it is not always wise to apply this method when the sample is small.

However, this method does not necessarily improve the efficiency of the estimator very much if the original sample was small.

A reasonable option when samples are not too large is to apply OLS but discard the classical variance estimator (which is inconsistent in this framework) and instead use a HAC (Heteroskedasticity and Autocorrelation Consistent) estimator.

This approach is much safer, and it is the appropriate path to take unless the sample is large, where "large" is sometimes a slippery issue (e.g., if the error distribution is asymmetric the required sample will be much larger).

may be constructed by: It is important to notice that the squared residuals cannot be used in the previous expression; an estimator of the errors' variances is needed.