In the presence of outliers that do not come from the same data-generating process as the rest of the data, least squares estimation is inefficient and can be biased.
(In many situations, including some areas of geostatistics and medical statistics, it is precisely the outliers that are of interest.)
The reduction of the type I error rate has been labelled as the conservatism of classical methods.
Despite their superior performance over least squares estimation in many situations, robust methods for regression are still not widely used.
One possible reason is that there are several competing methods [citation needed] and the field got off to many false starts.
Another reason may be that some popular statistical software packages failed to implement the methods (Stromberg, 2004).
Perhaps the most important reason for the unpopularity of robust regression methods is that when the error variance is quite large or does not exist, for any given dataset any estimate, robust or otherwise, of the regression coefficients will likely be practically worthless unless the sample is quite large.
Also, modern statistical software packages such as R, SAS, Statsmodels, Stata and S-PLUS include considerable functionality for robust estimation (see, for example, the books by Venables and Ripley, and by Maronna et al.[vague]).
Even then, gross outliers can still have a considerable impact on the model, motivating research into even more robust approaches.
Least trimmed squares (LTS) is a viable alternative and is currently (2007) the preferred choice of Rousseeuw and Ryan (1997, 2008).
The Theil–Sen estimator has a lower breakdown point than LTS but is statistically efficient and popular.
This method is highly resistant to leverage points and is robust to outliers in the response.
MM-estimation attempts to retain the robustness and resistance of S-estimation, whilst gaining the efficiency of M-estimation.
The estimated scale is then held constant whilst a close by M-estimate of the parameters is located (the second M).
A t-distribution with 4–6 degrees of freedom has been reported to be a good choice in various practical situations.
Bayesian robust regression, being fully parametric, relies heavily on such distributions.
Lange, Little and Taylor (1989) discuss this model in some depth from a non-Bayesian point of view.
Samuel S. Wilks (1938) showed that nearly all sets of regression weights sum to composites that are very highly correlated with one another, including unit weights, a result referred to as Wilks' theorem (Ree, Carretta, & Earles, 1998).
Robyn Dawes (1979) examined decision making in applied settings, showing that simple models with unit weights often outperformed human experts.
The two regression lines appear to be very similar (and this is not unusual in a data set of this size).
However, the advantage of the robust approach comes to light when the estimates of residual scale are considered.
This inefficiency leads to loss of power in hypothesis tests and to unnecessarily wide confidence intervals on estimated parameters.
The horizontal reference lines are at 2 and −2, so that any observed scaled residual beyond these boundaries can be considered to be an outlier.