Cluster hypothesis

In machine learning and information retrieval, the cluster hypothesis is an assumption about the nature of the data handled in those fields, which takes various forms.

[1] In terms of classification, it states that if points are in the same cluster, they are likely to be of the same class.

The cluster hypothesis was formulated first by van Rijsbergen:[3] "closely associated documents tend to be relevant to the same requests".

Although experiments showed that the cluster hypothesis as such holds, exploiting it for retrieval did not lead to satisfying results.

In contrast the amount of adherence of data to this assumption can be quantitatively measured.