It currently comprises more than 4.0 billion word tokens (as of August 2010) and constitutes the largest linguistically motivated collection of contemporary German texts.
Because different linguistic investigations generally aim at different language domains, the declared purpose of the German Reference Corpus is to serve as a versatile superordinate sample, or primordial sample (German: Ur-Stichprobe) of contemporary written German, from which corpus users may draw a specialised subsample (a so-called virtual corpus) to represent the language domain they wish to investigate.
Due to copyright and licence restrictions, the DeReKo archive may not be copied nor offered for download.
It can be queried and analyzed free of charge via the system COSMAS II - end-users are required to register by name and to agree to use the corpus data exclusively for non-commercial, academic purposes.
COSMAS II enables users to compile from DeReKo a virtual corpus suitable for their specific research questions.