Survey of English Usage

[1] Many well-known linguists have spent time doing research at the Survey, including Bas Aarts, Valerie Adams, John Algeo, Dwight Bolinger, Noël Burton-Roberts, David Crystal, Derek Davy, Jan Firbas, Sidney Greenbaum, Liliane Haegeman, Robert Ilson, Ruth Kempson, Geoffrey Leech, Jan Rusiecki, Jan Svartvik, and Joe Taglicht.

ICE-GB was annotated to a very detailed level, including constructing a full grammatical analysis (parse) for every sentence in the corpus.

A recent project at the Survey undertook the parsing of a large (400,000 word) selection of the spoken part of the LLC in a manner directly comparable with ICE-GB, forming a new, 800,000 word diachronic corpus, called the Diachronic Corpus of Present-Day Spoken English (DCPSE).

One of the consequences of forming large collections of valuable linguistic data is a pressing need for methods and tools to help researchers and other users make the most of them.

So in parallel with the parsing of natural language data, the Survey team have carried out research and development of software tools to help linguists use these corpora.

The ICECUP research platform uses an intuitive grammatical query representation called Fuzzy Tree Fragments (FTFs) to search parsed corpora.