Topic model

Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body.

In the age of information, the amount of the written material we encounter each day is simply beyond our processing capacity.

Topic models can help to organize and offer insights for us to understand large collections of unstructured text bodies.

Originally developed as a text-mining tool, topic models have been used to detect instructive structures in data such as genetic information, images, and networks.

[4] Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSA.

Developed by David Blei, Andrew Ng, and Michael I. Jordan in 2002, LDA introduces sparse Dirichlet prior distributions over document-topic and topic-word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words.

HLTA was applied to a collection of recent research papers published at major AI and Machine Learning venues.

[17] Several groups of researchers starting with Papadimitriou et al.[3] have attempted to design algorithms with provable guarantees.

[18] In 2017, neural network has been leveraged in topic modeling to make it faster in inference,[19] which has been extended weakly supervised version.

Animation of the topic detection process in a document-word matrix through biclustering . Every column corresponds to a document, every row to a word. A cell stores the frequency of a word in a document, with dark cells indicating high word frequencies. This procedure groups documents, which use similar words, as it groups words occurring in a similar set of documents. Such groups of words are then called topics. More usual topic models, such as LDA, only group documents, based on a more sophisticated and probabilistic mechanism.