Total least squares

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account.

[1] In the least squares method of data modeling, the objective function to be minimized, S, is a quadratic form: where r is the vector of residuals and W is a weighting matrix.

X is a m×n matrix whose elements are either constants or functions of the independent variables, x.

The parameter estimates are found by setting the gradient equations to zero, which results in the normal equations [note 1] Now, suppose that both x and y are observed subject to error, with variance-covariance matrices

Clearly[further explanation needed] these residuals cannot be independent of each other, but they must be constrained by some kind of relationship.

[2] Thus, the problem is to minimize the objective function subject to the m constraints.

where M is the variance-covariance matrix relative to both independent and dependent variables.

As was shown in 1980 by Golub and Van Loan, the TLS problem does not have a solution in general.

[4] The following considers the simple case where a unique solution exists without making any particular assumptions.

The computation of the TLS using singular value decomposition (SVD) is described in standard texts.

[5] We can solve the equation for B where X is m-by-n and Y is m-by-k. [note 2] That is, we seek to find B that minimizes error matrices E and F for X and Y respectively.

Using the Eckart–Young theorem, the approximation minimising the norm of the error is such that matrices

That is, we want so by linearity, We can then remove blocks from the U and Σ matrices, simplifying to This provides E and F so that Now if

to bring the bottom block of the right matrix to the negative identity, giving[6] and so A naive GNU Octave implementation of this is: The way described above of solving the problem, which requires that the matrix

is nonsingular, can be slightly extended by the so-called classical TLS algorithm.

[7] The standard implementation of classical TLS algorithm is available through Netlib, see also.

[8][9] All modern implementations based, for example, on solving a sequence of ordinary least squares problems, approximate the matrix

[10][11] For non-linear systems similar reasoning shows that the normal equations for an iteration cycle can be written as where

When the independent variable is error-free a residual represents the "vertical" distance between the observed data point and the fitted curve (or surface).

In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction.

In fact, if both variables are measured in the same units and the errors on both variables are the same, then the residual represents the shortest distance between the data point and the fitted curve, that is, the residual vector is perpendicular to the tangent of the curve.

Secondly, if we rescale one of the variables e.g., measure in grams rather than kilograms, then we shall end up with different results (a different line).

To avoid these problems it is sometimes suggested that we convert to dimensionless variables—this may be called normalization or standardization.

However, there are various ways of doing this, and these lead to fitted models which are not equivalent to each other.

One approach is to normalize by known (or estimated) measurement precision thereby minimizing the Mahalanobis distance from the points to the line, providing a maximum-likelihood solution;[citation needed] the unknown precisions could be found via analysis of variance.

A way forward is to realise that residuals (distances) measured in different units can be combined if multiplication is used instead of addition.

Nobel laureate Paul Samuelson proved in 1942 that, in two dimensions, it is the only line expressible solely in terms of the ratios of standard deviations and the correlation coefficient which (1) fits the correct equation when the observations fall on a straight line, (2) exhibits scale invariance, and (3) exhibits invariance under interchange of variables.

[13] This solution has been rediscovered in different disciplines and is variously known as standardised major axis (Ricker 1975, Warton et al., 2006),[14][15] the reduced major axis, the geometric mean functional relationship (Draper and Smith, 1998),[16] least products regression, diagonal regression, line of organic correlation, and the least areas line (Tofallis, 2002).

[17] Tofallis (2015, 2023)[18][19] has extended this approach to deal with multiple variables.

The calculations are simpler than for total least squares as they only require knowledge of covariances, and can be computed using standard spreadsheet functions.

The bivariate (Deming regression) case of total least squares. The red lines show the error in both x and y . This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations measured perpendicularly, arises when errors in x and y have equal variances.