A text corpus is a large collection of samples of written and/or spoken language, that has been carefully prepared for linguistic analysis.
The RAE is Spain's official institution for documenting, planning, and standardising the Spanish language.
The second table is a list of 100 most common lemmas found in a text corpus compiled by Mark Davies and other language researchers at Brigham Young University in the United States.
It includes books, magazines, and newspapers with a wide variety of content, as well as transcripts of spoken language from radio and television broadcasts and other sources.
[1] In 2006, Mark Davies, an associate professor of linguistics at Brigham Young University, published his estimate of the 5000 most common words in Modern Spanish.
Among the written sources are novels, plays, short stories, letters, essays, newspapers, and the encyclopedia Encarta.
A highlighted row indicates that the word was found to occur especially frequently in samples of spoken Spanish.