CorCenCC

The corpus is accompanied by an online teaching and learning toolkit – Y Tiwtiadur[1] – which draws directly on the data from the corpus to provide resources for Welsh language learning at all ages and levels.

The dataset, therefore, offers a snapshot of the Welsh language across a range of contexts of use, e.g. private conversations, group socialising, business and other work situations, in education, in the various published media, and in public spaces.

A full list of contexts, genres and topics included are available on the project's website.

The published CorCenCC corpus was sampled from a range of different speakers and users of Welsh, from all regions of Wales, of all ages and genders, with a wide range of occupations, and with a variety of linguistic backgrounds (e.g. how they came to speak Welsh), to reflect the diversity of text types and of Welsh speakers found in contemporary Wales.

[2] The research on which CorCenCC project was based was funded by the UK Economic and Social Research Council (ESRC) and Arts and Humanities Research Council (AHRC) as "Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction project" (Grant Number ES/M011348/1).

CorCenCC Project and Corpus Logo