Lexical diversity

[2] A common problem with lexical diversity measures, especially TTR, is that text samples containing large number of tokens give lower values for TTR since it is often necessary for the writer or speaker to re-use many words.

One consequence of this is that it is often assumed that lexical diversity can only be used to compare texts of the same length.

[3] Yet, many measures of lexical diversity attempt to account for sensitivity to text length.

Surveys of such measures are provided in Harald Baayen's book (2001)[4] and more recently in.

According to Jarvis's model, lexical diversity includes variability, volume, evenness, rarity, dispersion and disparity.