The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging.
[2] Developed in the early 1980s,[1][3] CLAWS was built to fill the ever-growing gap created by always-changing POS necessities.
Originally created to add part-of-speech tags to the LOB corpus of British English, the CLAWS tagset has since been adapted to other languages as well, including Urdu and Arabic.
Still, it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved.
CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words in anticipating each part-of-speech label.
[7] From 1983 to 1986, updated versions leading to CLAWS2 were part of a larger attempt to deal with aspects such as recognizing sentence breaks, in order to avoid the need for manual pre-processing of a text before the tags were applied, moving instead to optional manual post-editing to adjust the output of the automatic annotation, if needed.
[11] In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets.