Manifold regularization

The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points.

Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization.

Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions.

The hypothesis space is an RKHS, meaning that it is associated with a kernel

, which represents the complexity of the candidate function in the hypothesis space.

, a learning algorithm using Tikhonov regularization will attempt to solve the expression where

Under the manifold assumption in machine learning, the data in question do not come from the entire input space

The geometry of this manifold, the intrinsic space, is used to determine the regularization norm.

A smooth function should change slowly where the input data are dense; that is, the gradient

, the probability density of a randomly drawn data point appearing at

This gives one appropriate choice for the intrinsic regularizer: In practice, this norm cannot be computed directly because the marginal distribution

, the intrinsic norm can be estimated: As the number of data points

for the ambient and intrinsic regularizers, the final expression to be solved becomes: As with other kernel methods,

Instead, a representer theorem shows that under certain conditions on the choice of the norm

must be a linear combination of the kernel centered at each of the input points: for some weights

This method consists in estimating the Laplacian operator thanks to derivatives of the kernel reading

denotes the partial derivatives according to the j-th coordinate of the first variable.

Two commonly used examples are the families of support vector machines and regularized least squares algorithms.

(Regularized least squares includes the ridge regression algorithm; the related algorithms of LASSO and elastic net regularization can be expressed as support vector machines.

[5][6]) The extended versions of these algorithms are called Laplacian Regularized Least Squares (abbreviated LapRLS) and Laplacian Support Vector Machines (LapSVM), respectively.

, with the goal that the predicted values should be close to the true labels for the data.

In particular, RLS is designed to minimize the mean squared error between the predicted values and the true labels, subject to regularization.

[citation needed] The problem statement for RLS results from choosing the loss function

in Tikhonov regularization to be the mean squared error: Thanks to the representer theorem, the solution can be written as a weighted sum of the kernel evaluated at the data points: and solving for

Adding a Laplacian term for manifold regularization gives the Laplacian RLS statement: The representer theorem for manifold regularization again gives and this yields an expression for the vector

: with a solution of LapRLS has been applied to problems including sensor networks,[7] medical imaging,[8][9] object detection,[10] spectroscopy,[11] document classification,[12] drug-protein interactions,[13] and compressing images and videos.

[14] Support vector machines (SVMs) are a family of algorithms often used for classifying data into two or more groups, or classes.

This can be directly expressed as a linear program, but it is also equivalent to Tikhonov regularization with the hinge loss function,

: Adding the intrinsic regularization term to this expression gives the LapSVM problem statement: Again, the representer theorem allows the solution to be expressed in terms of the kernel evaluated at the data points:

is defined by LapSVM has been applied to problems including geographical imaging,[17][18][19] medical imaging,[20][21][22] face recognition,[23] machine maintenance,[24] and brain–computer interfaces.

Manifold regularization can classify data when labeled data (black and white circles) are sparse, by taking advantage of unlabeled data (gray circles). Without many labeled data points, supervised learning algorithms can only learn very simple decision boundaries (top panel). Manifold learning can draw a decision boundary between the natural classes of the unlabeled data, under the assumption that close-together points probably belong to the same class, and so the decision boundary should avoid areas with many unlabeled points. This is one version of semi-supervised learning .
A two-dimensional manifold embedded in three-dimensional space (left). Manifold regularization attempts to learn a function that is smooth on the unrolled manifold (right).