American National Corpus

It is annotated for part of speech and lemma, shallow parse, and named entities.

Unlike on-line searchable corpora, which due to copyright restrictions allow access only to individual sentences, the entire ANC is available to enable research involving, for example, development of statistical language models and full-text linguistic annotation.

Like the OANC, MASC is freely available for any use, and can be downloaded from the ANC site or from the Linguistic Data Consortium.

The ANC and its sub-corpora differ from similar corpora primarily in the range of linguistic annotations provided and the inclusion of modern genres that do not appear in resources like the British National Corpus.

Also, because the initial target use of the corpora was the development of statistical language models, the full data and all annotations are available, thus differing from the Corpus of Contemporary American English (COCA) which is available only selectively through a web browser.