In 1967, Kučera and Francis published their classic work, entitled "Computational Analysis of Present-Day American English", which provided basic statistics on what is known today simply as the Brown Corpus.
[1] The Brown Corpus was a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources.
Kučera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, psychology, statistics, and sociology.
[2] Shortly after publication of the first lexicostatistical analysis, Boston publisher Houghton-Mifflin approached Kučera to supply a million word, three-line citation base for its new American Heritage Dictionary.
[3][4] Tagging the corpus enabled far more sophisticated statistical analysis, such as the work programmed by Andrew Mackie, and documented in books on English grammar.