Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.

[a] It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters.

[3] In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias (see bias–variance tradeoff).

[4] The theory was first introduced by Hoerl and Kennard in 1970 in their Technometrics papers "Ridge regressions: biased estimation of nonorthogonal problems" and "Ridge regressions: applications in nonorthogonal problems".

is alleviated by adding positive elements to the diagonals, thereby decreasing its condition number.

serves as the constant shifting the diagonals of the moment matrix.

[8] It can be shown that this estimator is the solution to the least squares problem subject to the constraint

heuristically or find it via additional data-fitting strategies, see Determination of the Tikhonov factor.

, in which case the constraint is non-binding, the ridge estimator reduces to ordinary least squares.

It became widely known through its application to integral equations in the works of Andrey Tikhonov[10][11][12][13][14] and David L.

The finite-dimensional case was expounded by Arthur E. Hoerl, who took a statistical approach,[16] and by Manus Foster, who interpreted this method as a Wiener–Kolmogorov (Kriging) filter.

In such cases, ordinary least squares estimation leads to an overdetermined, or more often an underdetermined system of equations.

Most real-world phenomena have the effect of low-pass filters[clarification needed] in the forward direction where

In addition, ordinary least squares implicitly nullifies every element of the reconstructed version of

In order to give preference to a particular solution with desirable properties, a regularization term can be included in this minimization:

This regularization improves the conditioning of the problem, thus enabling a direct numerical solution.

this reduces to the unregularized least-squares solution, provided that (ATA)−1 exists.

[22] Since Tikhonov Regularization simply adds a quadratic term to the objective function in optimization problems, it is possible to do so after the unregularised optimisation has taken place.

No detailed knowledge of the underlying likelihood function is needed.

and the data error, one can apply a transformation of the variables to reduce to the case above.

Typically discrete linear ill-conditioned problems result from discretization of integral equations, and one can formulate a Tikhonov regularization in the original infinite-dimensional context.

This demonstrates the effect of the Tikhonov parameter on the condition number of the regularized problem.

is usually unknown and often in practical problems is determined by an ad hoc method.

Other approaches include the discrepancy principle, cross-validation, L-curve method,[26] restricted maximum likelihood and unbiased predictive risk estimator.

Grace Wahba proved that the optimal parameter, in the sense of leave-one-out cross-validation minimizes[27][28]

The probabilistic formulation of an inverse problem introduces (when all uncertainties are Gaussian) a covariance matrix

representing the a priori uncertainties on the model parameters, and a covariance matrix

seems rather arbitrary, the process can be justified from a Bayesian point of view.

[32] Note that for an ill-posed problem one must necessarily introduce some additional assumptions in order to get a unique solution.

[34] If the assumption of normality is replaced by assumptions of homoscedasticity and uncorrelatedness of errors, and if one still assumes zero mean, then the Gauss–Markov theorem entails that the solution is the minimal unbiased linear estimator.