EXCLAIM

Early work on CLIR depended on manually constructed parallel corpora for each pair of languages.

A more efficient way of finding data to train a CLIR system is to use matching pages on the web which are written in different languages.

The role of EXCLAIM is to use semantics and linguistic analytic tools to align the information in these Wikipedias so that they can be treated as parallel corpora.

EXCLAIM is also extensible to incorporate information from many other sources, such as the Chinese Community Health Resource Center (CCHRC).

One such project is a cross-linguistic readability software generation framework, detailed in work presented at ACL 2009.