Croatian National Corpus

Its compilation started in 1998 at the Institute of Linguistics[1] of the Faculty of Humanities and Social Sciences, University of Zagreb following the ideas of Marko Tadić.

The theoretical foundations and the expression of the need for a general-purpose, representative and multi-million corpus of Croatian started to appear even earlier.

The initial composition was divided in two constituents: Since 2004, with the adoption of the concept of the 3rd generation corpus, the two-constituent structure has been abandoned in favor of several subcorpora and larger size.

The author of this corpus manager is Pavel Rychlý[4] from the Natural Language Processing Laboratory[5] of the Faculty of Informatics,[6] Masaryk University in Brno, Czech Republic.

Its interface features complex and more elaborated queries over corpus, different types of statistical results, total or partial word lists according to different query criteria (with their frequencies), frequency distribution of types, automatic collocation detection etc.