Sliced inverse regression (SIR) is a tool for dimensionality reduction in the field of multivariate statistics.
[1] In statistics, regression analysis is a method of studying the relationship between a response variable y and its input variable
For example, parametric methods include multiple linear regression, and non-parametric methods include local smoothing.
As the number of observations needed to use local smoothing methods scales exponentially with high-dimensional data (as p grows), reducing the number of dimensions can make the operation computable.
Dimensionality reduction aims to achieve this by showing only the most important dimension of the data.
SIR uses the inverse regression curve,
, to perform a weighted principal component analysis.
of explanatory variables, SIR is based on the model where
are unknown projection vectors,
is a random variable representing error with
The model describes an ideal solution, where
dimensional subspace; i.e., one can reduce the dimension of the explanatory variables from
It is assumed that this reduced vector is as informative as the original
are called the effective dimension reducing directions (EDR-directions).
The space that is spanned by these vectors is denoted by the effective dimension reducing space (EDR-space).
, the set of all linear combinations of these vectors is called a linear subspace and is therefore a vector space.
is equal to the maximum number of linearly independent vectors in
The dimension of a vector space is unique, but the basis itself is not.
Dependent vectors can still span a space, but the linear combinations of the latter are only suitable to a set of vectors lying on a straight line.
Computing the inverse regression curve (IR) means instead of looking for it is actually The center of the inverse regression curve is located at
Therefore, the centered inverse regression curve is which is a
The centered inverse regression curve lies on a
This is a connection between the model and inverse regression.
, the centered inverse regression curve
is contained in the linear subspace spanned by
After having had a look at all the theoretical properties, the aim now is to estimate the EDR-directions.
For that purpose, weighted principal component analyses are needed.
The algorithm to estimate the EDR-directions via SIR is as follows.
Transform the standardized EDR-directions back to the original scale.
The estimates for the EDR-directions are given by: (which are not necessarily orthogonal)