Principal component regression

One typically uses only a subset of all the principal components for regression, making PCR a kind of regularized procedure and also a type of shrinkage estimator.

[2] One major use of PCR lies in overcoming the multicollinearity problem which arises when two or more of the explanatory variables are close to being collinear.

[3] PCR can aptly deal with such situations by excluding some of the low-variance principal components in the regression step.

In addition, by usually regressing on only a subset of all the principal components, PCR can result in dimension reduction through substantially lowering the effective number of parameters characterizing the underlying model.

Also, through appropriate selection of the principal components to be used for regression, PCR can lead to efficient prediction of the outcome based on the assumed model.

The PCR method may be broadly divided into three major steps: Data representation: Let

The fitting process for obtaining the PCR estimator involves regressing the response vector on the derived data matrix

Under multicollinearity, two or more of the covariates are highly correlated, so that one can be linearly predicted from the others with a non-trivial degree of accuracy.

This issue can be effectively addressed through using a PCR estimator obtained by excluding the principal components corresponding to these small eigenvalues.

The corresponding reconstruction error is given by: Thus any potential dimension reduction may be achieved by choosing

, the number of principal components to be used, through appropriate thresholding on the cumulative sum of the eigenvalues of

Since the smaller eigenvalues do not contribute significantly to the cumulative sum, the corresponding principal components may be continued to be dropped as long as the desired threshold limit is not exceeded.

The same criteria may also be used for addressing the multicollinearity issue whereby the principal components corresponding to the smaller eigenvalues may be ignored as long as the threshold limit is maintained.

Since the PCR estimator typically uses only a subset of all the principal components for regression, it can be viewed as some sort of a regularized procedure.

denotes the regularized solution to the following constrained minimization problem: The constraint may be equivalently written as: where: Thus, when only a proper subset of all the principal components are selected for regression, the PCR estimator so obtained is based on a hard form of regularization that constrains the resulting solution to the column space of the selected principal component directions, and consequently restricts it to be orthogonal to the excluded directions.

is such that the excluded principal components correspond to the smaller eigenvalues, thereby resulting in lower bias.

, Park (1981) [4] proposes the following guideline for selecting the principal components to be used for regression: Drop the

Practical implementation of this guideline of course requires estimates for the unknown model parameters

Park (1981) however provides a slightly modified set of estimates that may be better suited for this purpose.

, which is probably more suited for addressing the multicollinearity problem and for performing dimension reduction, the above criteria actually attempts to improve the prediction and estimation efficiency of the PCR estimator by involving both the outcome as well as the covariates in the process of selecting the principal components to be used in the regression step.

Alternative approaches with similar goals include selection of the principal components based on cross-validation or the Mallow's Cp criteria.

In general, PCR is essentially a shrinkage estimator that usually retains the high variance principal components (corresponding to the higher eigenvalues of

) as covariates in the model and discards the remaining low variance components (corresponding to the lower eigenvalues of

Thus it exerts a discrete shrinkage effect on the low variance components nullifying their contribution completely in the original model.

Therefore, the resulting PCR estimator obtained from using these principal components as covariates need not necessarily have satisfactory predictive performance for the outcome.

covariates that turn out to be the most correlated with the outcome (based on the degree of significance of the corresponding estimated regression coefficients) are selected for further use.

Thus, the underlying regression model in the kernel machine setting is essentially a linear regression model with the understanding that instead of the original set of covariates, the predictors are now given by the vector (potentially infinite-dimensional) of feature elements obtained by transforming the actual covariates using the feature map.

It turns out that it is only sufficient to compute the pairwise inner products among the feature maps for the observed covariate vectors and these inner products are simply given by the values of the kernel function evaluated at the corresponding pairs of covariate vectors.

It can be easily shown that this is the same as regressing the outcome vector on the corresponding principal components (which are finite-dimensional in this case), as defined in the context of the classical PCR.

However, for arbitrary (and possibly non-linear) kernels, this primal formulation may become intractable owing to the infinite dimensionality of the associated feature map.