Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.
[1][2][3] It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality,[4] as it leverages the Kernel trick to non-linearly map the original data into an inner-product space.
MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps:[5] The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.
be the original input and
be the embedding.
are two neighbors, then the local isometry constraint that needs to be satisfied is:[7][8][9] Let
be the Gram matrices of
We can express the above constraint for every neighbor points
:[10][11] In addition, we also want to constrain the embedding
to center at the origin:[12][13][14]
As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points.
The objective function to be maximized is:[15][16][17]
Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold.
The local isometry constraint [18] Let
{\displaystyle \eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}}
prevents the objective function from diverging (going to infinity).
Since the graph has N points, the distance between any two points
We can then bound the objective function as follows:[19][20] The objective function can be rewritten purely in the form of the Gram matrix:[21][22][23] Finally, the optimization can be formulated as:[24][25][26]
{\displaystyle {\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}}
After the Gram matrix
is learned by semidefinite programming, the output
can be obtained via Cholesky decomposition.
In particular, the Gram matrix can be written as
λ
is the i-th element of eigenvector
λ
-th element of the output
λ
α
α i