Named-entity recognition

Rigid designators include proper names as well as terms for certain biological species and substances,[7] but exclude pronouns (such as "it"; see coreference resolution), descriptions that pick out a referent by its properties (see also De dicto and de re), and names for kinds of things as opposed to individuals (for example "Bank").

While some instances of these types are good examples of rigid designators (e.g., the year 2001) there are also many invalid ones (e.g., I take my vacations in “June”).

[12] More recently, in 2011 Ritter used a hierarchy based on common Freebase entity types in ground-breaking experiments on NER over social media text.

Hand-crafted grammar-based systems typically obtain better precision, but at the cost of lower recall and months of work by experienced computational linguists.

Since about 1998, there has been a great deal of interest in entity identification in the molecular biology, bioinformatics, and medical natural language processing communities.

There has been also considerable interest in the recognition of chemical entities and drugs in the context of the CHEMDNER competition, with 27 teams participating in this task.

The main efforts are directed to reducing the annotations labor by employing semi-supervised learning,[16][21] robust performance across domains[22][23] and scaling up to fine-grained entity types.

[12][24] In recent years, many projects have turned to crowdsourcing, which is a promising solution to obtain high-quality aggregate human judgments for supervised and semi-supervised machine learning approaches to NER.

[25] Another challenging task is devising models to deal with linguistically complex contexts such as Twitter and search queries.

[27] And some researchers recently proposed graph-based semi-supervised learning model for language specific NER tasks.

Below is an example output of a Wikification system: Another field that has seen progress but remains challenging is the application of NER to Twitter and other microblogs, considered "noisy" due to non-standard orthography, shortness and informality of texts.

[32][33] NER challenges in English Tweets have been organized by research communities to compare performances of various approaches, such as bidirectional LSTMs, Learning-to-Search, or CRFs.