Sparse PCA

It extends the classic method of principal component analysis (PCA) for the reduction of dimensionality of data by introducing sparsity structures to the input variables.

A particular disadvantage of ordinary PCA is that the principal components are usually linear combinations of all input variables.

SPCA overcomes this disadvantage by finding components that are linear combinations of just a few input variables (SPCs).

This means that some of the coefficients of the linear combinations defining the SPCs, called loadings,[note 1] are equal to zero.

The number of nonzero loadings is called the cardinality of the SPC.

rows represents an independent sample from data population.

, the sparse PCA problem can be formulated as maximizing the variance along a direction represented by vector

while constraining its cardinality: The first constraint specifies that v is a unit vector.

pseudo-norm of v, which is defined as the number of its non-zero components.

So the second constraint specifies that the number of non-zero components in v is less than or equal to k, which is typically an integer that is much smaller than dimension p. The optimal value of Eq.

If one takes k=p, the problem reduces to the ordinary PCA, and the optimal value becomes the largest eigenvalue of covariance matrix Σ.

In order to achieve orthogonality, additional constraints must be enforced.

Moreover, the rank constraint in this formulation is actually redundant, and therefore sparse PCA can be cast as the following mixed-integer semidefinite program[1] Because of the cardinality constraint, the maximization problem is hard to solve exactly, especially when dimension p is high.

In fact, the sparse PCA problem in Eq.

[2] As most sparse problems, variable selection in SPCA is a computationally intractable nonconvex NP-hard problem,[3] therefore greedy sub-optimal algorithms are often employed to find solutions.

1) have been proposed, including The methodological and theoretical developments of Sparse PCA as well as its applications in scientific studies are recently reviewed in a survey paper.

[12] It has been proposed that sparse PCA can be approximated by semidefinite programming (SDP).

While the semidefinite program does not scale beyond n=300 covariates, it has been shown that a second-order cone relaxation of the semidefinite relaxation is almost as tight and successfully solves problems with n=1000s of covariates [13] Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may generate principal components that are weighted combination of all the assets.

In contrast, sparse PCA would produce principal components that are weighted combination of only a few input assets, so one can easily interpret its meaning.

Furthermore, if one uses a trading strategy based on these principal components, fewer assets imply less transaction costs.

Consider a dataset where each input variable corresponds to a specific gene.

Contemporary datasets often have the number of input variables (

1, then the optimal value does not converge to the largest eigenvalue of data population when the sample size

, and the optimal solution does not converge to the direction of maximum variance.

The k-sparse largest eigenvalue (the optimal value of Eq.

are generated from a multivariate normal distribution with mean 0 and covariance equal to an identity matrix, and the alternative hypothesis specifies that data

is generated from a spiked model with signal strength

The largest k-sparse eigenvalue can discriminate the two hypotheses if and only if

Since computing k-sparse eigenvalue is NP-hard, one can approximate it by the optimal value of semidefinite programming relaxation (Eq.

term cannot be improved by any other polynomial time algorithm if the planted clique conjecture holds.