Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon[1][2][3][4] which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data.
The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points.
Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from.
By integrating local similarities at different scales, diffusion maps give a global description of the data-set.
Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.
Usually, this probability is specified in terms of a kernel function of the two points:
The kernel constitutes the prior definition of the local geometry of the data-set.
Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind.
This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once.
, we can then construct a reversible discrete-time Markov chain on
we can construct a transition matrix of a Markov chain (
We apply the graph Laplacian normalization to this new kernel: where
Due to the spectrum decay of the eigenvalues, only a few terms are necessary to achieve a given relative accuracy in this sum.
The reason to introduce the normalization step involving
is to tune the influence of the data point density on the infinitesimal transition of the diffusion.
In some applications, the sampling of the data is generally not related to the geometry of the manifold we are interested in describing.
We then recover the Riemannian geometry of the data set regardless of the distribution of the points.
To describe the long-term behavior of the point distribution of a system of stochastic differential equations, we can use
and the resulting Markov chain approximates the Fokker–Planck diffusion.
, it reduces to the classical graph Laplacian normalization.
is the stationary distribution of the Markov chain, given by the first left eigenvector of
is small if there is a large number of short paths connecting
There are several interesting features associated with the diffusion distance, based on our previous discussion that
The diffusion map is defined as: Because of the spectrum decay, it is sufficient to use only the first k eigenvectors and eigenvalues.
The basic algorithm framework of diffusion map is as: Step 1.
In the paper [6] Nadler et al. showed how to design a kernel that reproduces the diffusion induced by a Fokker–Planck equation.
This computation is completely insensitive to the distribution of the points and therefore provides a separation of the statistics and the geometry of the data.
Since diffusion maps give a global description of the data-set, they can measure the distances between pairs of sample points in the manifold in which the data is embedded.
Applications based on diffusion maps include face recognition,[7] spectral clustering, low dimensional representation of images, image segmentation,[8] 3D model segmentation,[9] speaker verification[10] and identification,[11] sampling on manifolds, anomaly detection,[12][13] image inpainting,[14] revealing brain resting state networks organization [15] and so on.
Furthermore, the diffusion maps framework has been productively extended to complex networks,[16] revealing a functional organisation of networks which differs from the purely topological or structural one.