WordNet

[4] WordNet was first created in 1985, in English only, in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller.

George Miller and Christiane Fellbaum received the 2006 Antonio Zampolli Prize for their work with WordNet.

[5] The database contains 155,327 words organized in 175,979 synsets for a total of 207,016 word-sense pairs; in compressed form, it is about 12 megabytes in size.

[3] It includes the lexical categories nouns, verbs, adjectives and adverbs but ignores prepositions, determiners and other function words.

The morphology functions of the software distributed with the database try to deduce the lemma or stem form of a word from the user's input.

The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s.

[6] While such psycholinguistic experiments and the underlying theories have been subject to criticism, some of WordNet's organization is consistent with experimental evidence.

For example, anomic aphasia selectively affects speakers' ability to produce words from a specific semantic category, a WordNet hierarchy.

Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2,[7] most projects claiming to reuse WordNet for knowledge-based applications (typically, knowledge-oriented information retrieval) simply reuse it directly.

WordNet has also been converted to a formal specification, by means of a hybrid bottom-up top-down methodology to automatically extract association relations from it and interpret these associations in terms of a set of conceptual relations, formally defined in the DOLCE foundational ontology.

[11] Synonyms, hyponyms, meronyms, and antonyms occur in all languages with a WordNet so far, but other semantic relationships are language-specific.

However, it also makes WordNet a resource for highlighting and studying the differences between languages, so it is not necessarily a limitation for all use cases.

WordNet is the most commonly used computational lexicon of English for word-sense disambiguation (WSD), a task aimed at assigning the context-appropriate meanings (i.e. synset members) to words in a text.

This issue prevents WSD systems from achieving a level of performance comparable to that of humans, who do not always agree when confronted with the task of selecting a sense from a dictionary that matches a word in a context.

The granularity issue has been tackled by proposing clustering methods that automatically group together similar senses of the same word.

However, this limitation is true of other lexical resources like dictionaries and thesauruses, which also contain pejorative and offensive words.

[21] Other more sophisticated WordNet-based similarity techniques include ADW,[22] whose implementation is available in Java.

[26] Projects such as BalkaNet and EuroWordNet made it feasible to create standalone wordnets linked to the original one.