Statistical semantics

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

The term statistical semantics was first used by Warren Weaver in his well-known paper on machine translation.

[4] "Furnas et al. 1983" is frequently cited as a foundational contribution to statistical semantics.

Research in statistical semantics has resulted in a wide variety of algorithms that use the distributional hypothesis to discover many aspects of semantics, by applying statistical techniques to large corpora: Statistical semantics focuses on the meanings of common words and the relations between common words, unlike text mining, which tends to focus on whole documents, document collections, or named entities (names of people, places, and organizations).

Another advantage is that they are usually easier to adapt to new languages or noisier new text types from e.g. social media than lexicon-based algorithms are.