The concern with the problem of finding relevant information dates back at least to the first publication of scientific journals in the 17th century.
[1] The information retrieval community has emphasized the use of test collections and benchmark tasks to measure topical relevance, starting with the Cranfield Experiments of the early 1960s and culminating in the TREC evaluations that continue to this day as the main evaluation framework for information retrieval research.
The cluster hypothesis, proposed by C. J. van Rijsbergen in 1979, asserts that two documents that are similar to each other have a high likelihood of being relevant to the same information need.
[4] The global interpretation assumes that there exist some fixed set of underlying topics derived from inter-document similarity.
Methods in this spirit include: A second interpretation, most notably advanced by Ellen Voorhees,[8] focuses on the local relationships between documents.
The local interpretation avoids having to model the number or size of clusters in the collection and allow relevance at multiple scales.