Rank–size distribution

For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5 (ranks 1 through 4).

These are particularly of interest when the data vary significantly in scales, such as city size or word frequency.

Rather, it is a discrete form of a quantile function (inverse cumulative distribution) in reverse order, giving the size of the element at a given rank.

Most simply and commonly, a distribution may be split in two pieces, termed the head and tail.

[6] These frequently have some adjectives added, most significantly long tail, also fat belly,[4] chunky middle, etc.

Segments may arise naturally due to actual changes in the behavior of the distribution as rank varies.

The Yule–Simon distribution that results from preferential attachment (intuitively, "the rich get richer" and "success breeds success") simulates a broken power law and has been shown to "very well capture" word frequency versus rank distributions.

[9] While Zipf's law works well in many cases, it tends to not fit the largest cities in many countries; one type of deviation is known as the King effect.

A 2002 study found that Zipf's law was rejected in 53 of 73 countries, far more than would be expected based on random chance.

For instance, in the Democratic Republic of the Congo, the capital, Kinshasa, is more than eight times larger than the second-largest city, Lubumbashi.

A distribution such as that in the United States or China does not exhibit a pattern of primacy, but countries with a dominant "primate city" clearly vary from the rank-size rule in the opposite manner.

Rank–size distribution of the population of countries follows a stretched exponential distribution [ 1 ] except in the cases of the two " Kings ": China and India .
Wikipedia word frequency plot, showing three segments with distinct behavior.