Non-negative matrix factorization

Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered.

NMF finds applications in such fields as astronomy,[3][4] computer vision, document clustering,[1] missing data imputation,[5] chemometrics, audio signal processing, recommender systems,[6][7] and bioinformatics.

When the error function to be used is Kullback–Leibler divergence, NMF is identical to the probabilistic latent semantic analysis (PLSA), a popular document clustering method.

This greatly improves the quality of data representation of W. Furthermore, the resulting matrix factor H becomes more sparse and orthogonal.

The different types arise from using different cost functions for measuring the divergence between V and WH and possibly by regularization of the W and/or H matrices.

find nonnegative matrices W and H that minimize the function Another type of NMF for images is based on the total variation norm.

[22] When L1 regularization (akin to Lasso) is added to NMF with the mean squared error cost function, the resulting problem may be called non-negative sparse coding due to the similarity to the sparse coding problem,[23][24] although it may also still be referred to as NMF.

[26][27] If the columns of V represent data sampled over spatial or temporal dimensions, e.g. time signals, images, or video, features that are equivariant w.r.t.

In this case, W is sparse with columns having local non-zero weight windows that are shared across shifts along the spatio-temporal dimensions of V, representing convolution kernels.

By spatio-temporal pooling of H and repeatedly using the resulting representation as input to convolutional NMF, deep feature hierarchies can be learned.

[28] There are several ways in which the W and H may be found: Lee and Seung's multiplicative update rule[14] has been a popular method due to the simplicity of implementation.

Some options for initialization include complete randomization, SVD, k-means clustering, and more advanced strategies based on these and other paradigms.

The contribution of the sequential NMF components can be compared with the Karhunen–Loève theorem, an application of PCA, using the plot of eigenvalues.

A typical choice of the number of components with PCA is based on the "elbow" point, then the existence of the flat plateau is indicating that PCA is not capturing the data efficiently, and at last there exists a sudden drop reflecting the capture of random noise and falls into the regime of overfitting.

Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies a separability condition.

[42] In Learning the parts of objects by non-negative matrix factorization Lee and Seung[43] proposed NMF mainly for parts-based decomposition of images.

It compares NMF to vector quantization and principal component analysis, and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.

It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA".

[44] When NMF is obtained by minimizing the Kullback–Leibler divergence, it is in fact equivalent to another instance of multinomial PCA, probabilistic latent semantic analysis,[45] trained by maximum likelihood estimation.

That method is commonly used for analyzing and clustering textual data and is also related to the latent class model.

However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains.

[54] In astronomy, NMF is a promising method for dimension reduction in the sense that astrophysical signals are non-negative.

Ren et al. (2018)[4] are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the linearity of the NMF modeling process; the linearity property is used to separate the stellar light and the light scattered from the exoplanets and circumstellar disks.

In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted,[56][57][38] however the light from the exoplanets or circumstellar disks are usually over-fitted, where forward modeling have to be adopted to recover the true flux.

In addition, the imputation quality can be increased when the more NMF components are used, see Figure 4 of Ren et al. (2020) for their illustration.

[62] Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) have given polynomial-time algorithms to learn topic models using NMF.

The algorithm assumes that the topic matrix satisfies a separability condition that is often found to hold in these settings.

[42] Hassani, Iranmanesh and Mansouri (2019) proposed a feature agglomeration method for term-document matrices which operates using NMF.

Schmidt et al.[67] use NMF to do speech denoising under non-stationary noise, which is completely different from classical statistical approaches.

[72] NMF techniques can identify sources of variation such as cell types, disease subtypes, population stratification, tissue composition, and tumor clonality.