Also known as the Pareto–Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf, who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.
is given by which may be thought of as a generalization of a harmonic number.
approaches infinity, this becomes the Hurwitz zeta function
The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001).
Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s = 1 does not converge, while the Zipf–Mandelbrot generalization with s > 1 does.
Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf–Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register.
[1] In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law.