Least absolute deviations

It attempts to find a function which closely approximates a set of data by minimizing residuals between points generated by the function and corresponding data points.

[1] Suppose that the data set consists of the points (xi, yi) with i = 1, 2, ..., n. We want to find a function f such that

To attain this goal, we suppose that the function f is of a particular form containing some parameters that need to be determined.

For instance, the simplest form would be linear: f(x) = bx + c, where b and c are parameters whose values are not known but which we would like to estimate.

(More generally, there could be not just one explanator x, but rather multiple explanators, all appearing as arguments of the function f.) We now seek estimated values of the unknown parameters that minimize the sum of the absolute values of the residuals: Though the idea of least absolute deviations regression is just as straightforward as that of least squares regression, the least absolute deviations line is not as simple to compute efficiently.

Simplex-based methods are the “preferred” way to solve the least absolute deviations problem.

Though simple, this final method is inefficient for large sets of data.

, where yi is the value of the ith observation of the dependent variable, and xij is the value of the ith observation of the jth independent variable (j = 1,...,k).

We rewrite this problem in terms of artificial variables ui as These constraints have the effect of forcing each

Since this version of the problem statement does not contain the absolute value operator, it is in a format that can be solved with any linear programming package.

There exist other unique properties of the least absolute deviations line.

More generally, if there are k regressors (including the constant), then at least one optimal regression surface will pass through k of the data points.

The "latching" also helps to understand the "robustness" property: if there exists an outlier, and a least absolute deviations line must latch onto two data points, the outlier will most likely not be one of those two points because that will not minimize the sum of absolute deviations in most cases.

One known case in which multiple solutions exist is a set of points symmetric about a horizontal line, as shown in Figure A below.

To understand why there are multiple solutions in the case shown in Figure A, consider the pink line in the green region.

Least absolute deviations is robust in that it is resistant to outliers in the data.

LAD gives equal emphasis to all observations, in contrast to ordinary least squares (OLS) which, by squaring the residuals, gives more weight to large residuals, that is, outliers in which predicted values are far from actual observations.

This may be helpful in studies where outliers do not need to be given greater weight than other observations.

If it is important to give greater weight to outliers, the method of least squares is a better choice.

is a column vector of coefficients to be estimated, b is an intercept to be estimated, xi is a column vector of the ith observations on the various explanators, yi is the ith observation on the dependent variable, and k is a known constant.

Regularization with LASSO (least absolute shrinkage and selection operator) may also be combined with LAD.

Figure A: A set of data points with reflection symmetry and multiple least absolute deviations solutions. The “solution area” is shown in green. The vertical blue lines represent the absolute errors from the pink line to each data point. The pink line is one of infinitely many solutions within the green area.