Semidefinite embedding

Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.

[1][2][3] It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality,[4] as it leverages the Kernel trick to non-linearly map the original data into an inner-product space.

MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps:[5] The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.

be the original input and

are two neighbors, then the local isometry constraint that needs to be satisfied is:[7][8][9] Let

be the Gram matrices of

We can express the above constraint for every neighbor points

:[10][11] In addition, we also want to constrain the embedding

to center at the origin:[12][13][14]

As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points.

The objective function to be maximized is:[15][16][17]

Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold.

The local isometry constraint [18] Let

{\displaystyle \eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}}

prevents the objective function from diverging (going to infinity).

Since the graph has N points, the distance between any two points

We can then bound the objective function as follows:[19][20] The objective function can be rewritten purely in the form of the Gram matrix:[21][22][23] Finally, the optimization can be formulated as:[24][25][26]

{\displaystyle {\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}}

After the Gram matrix

is learned by semidefinite programming, the output

can be obtained via Cholesky decomposition.

In particular, the Gram matrix can be written as

α = 1

α

α i

α j

α i

is the i-th element of eigenvector

α

-th element of the output