National Corpus of Polish

A linguistic corpus is a collection of texts where one can find the typical use of a single word or a phrase, as well as their meaning and grammatical function.

It has been registered as a research-development project of the Ministry of Science and Higher Education.

The intended size of the whole National Corpus of Polish is over 1 billion words, of which a 300-million word subcorpus has been carefully balanced, and a manually-annotated 1-million corpus has been released under an open license.

The corpus is accessible online at http://nkjp.pl/poliqarp/ The corpus contains classic literature, daily newspapers, specialist periodicals and journals, transcripts of conversations, and a variety of short-lived and internet texts.

All four teams decided to join forces in 2006, forming the Consortium for the National Corpus of Polish.