Heaps' law

It can be formulated as where VR is the number of distinct words in an instance text of size n. K and β are free parameters determined empirically.

The law is frequently attributed to Harold Stanley Heaps, but was originally discovered by Gustav Herdan (1960).

[2] This is a consequence of the fact that the type-token relation (in general) of a homogenous text can be derived from the distribution of its types.

[6] Heaps' law also applies to situations in which the "vocabulary" is just some set of distinct types which are attributes of some collection of objects.

Heaps' law has been observed also in single-cell transcriptomes[7] considering genes as the distinct objects in the "vocabulary".

Verification of Heaps' law on War and Peace , as well as a randomly shuffled version of it. Both cases fit well to the Heaps' law with very similar exponents β , but different K .
A schematic Heaps-law plot. The x-axis represents the text size, and the y-axis represents the number of distinct vocabulary elements present in the text. Compare the values of the two axes.