Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.
[1][2][3] It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality,[4] as it leverages the Kernel trick to non-linearly map the original data into an inner-product space.
MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps:[5] The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.
be the original input and
are two neighbors, then the local isometry constraint that needs to be satisfied is:[7][8][9] Let
be the Gram matrices of
We can express the above constraint for every neighbor points
:[10][11] In addition, we also want to constrain the embedding
to center at the origin:[12][13][14]
As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points.
The objective function to be maximized is:[15][16][17]
Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold.
The local isometry constraint [18] Let
{\displaystyle \eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}}
prevents the objective function from diverging (going to infinity).
Since the graph has N points, the distance between any two points
We can then bound the objective function as follows:[19][20] The objective function can be rewritten purely in the form of the Gram matrix:[21][22][23] Finally, the optimization can be formulated as:[24][25][26]
{\displaystyle {\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}}
After the Gram matrix
is learned by semidefinite programming, the output
can be obtained via Cholesky decomposition.
In particular, the Gram matrix can be written as
α = 1
α
α i
α j
α i
is the i-th element of eigenvector
α
-th element of the output