Referring expression generation

While NLG is concerned with the conversion of non-linguistic information into natural language, REG focuses only on the creation of referring expressions (noun phrases) that identify specific entities called targets.

A variety of algorithms have been developed in the NLG community to generate different types of referring expressions.

A referring expression (RE), in linguistics, is any noun phrase, or surrogate for a noun phrase, whose function in discourse is to identify some individual object (thing, being, event...) The technical terminology for identify differs a great deal from one school of linguistics to another.

There has been a considerable amount of research on generating definite noun phrases, such as the big red book.

[2] This has been extended in various ways, for example Krahmer et al.[3] present a graph-theoretic model of definite NP generation with many nice properties.

In recent years a shared-task event has compared different algorithms for definite NP generation, using the TUNA[4] corpus.

[6] Ideally, a good referring expression should satisfy a number of criteria: REG goes back to the early days of NLG.

One of the first approaches was done by Winograd[7] in 1972 who developed an "incremental" REG algorithm for his SHRDLU program.

This new approach to the topic was influenced by the researchers Appelt and Kronfeld who created the programs KAMP and BERTRAND[8][9][10] and considered referring expressions as parts of bigger speech acts.

[8] Furthermore, its skepticism concerning the naturalness of minimal descriptions made Appelt and Kronfeld's research a foundation of later work on REG.

This new approach was led by Dale and Reiter who stressed the identification of the referent as the central goal.

[11][12][13][14] Like Appelt[8] they discuss the connection between the Gricean maxims and referring expressions in their culminant paper[2] in which they also propose a formal problem definition.

Often these extend the IA in a single perspective for example in relation to: Many simplifying assumptions are still in place or have just begun to be worked on.

Also a combination of the different extensions has yet to be done and is called a "non-trivial enterprise" by Krahmer and van Deemter.

Furthermore, research has extended its range to related topics such as the choice of Knowledge Representation(KR) Frameworks.

In this area the main question, which KR framework is most suitable for the use in REG remains open.

Some of the different approaches are the usage of: [note 1] Dale and Reiter (1995) think about referring expressions as distinguishing descriptions.

[2] The problem could be easily solved by conjoining all the properties of the referent which often leads to long descriptions violating the second Gricean Maxim of Quantity.

Yet in practice it is most common to instead include the condition that referring expressions produced by an algorithm should be as similar to human-produced ones as possible although this is often not explicitly mentioned.

Furthermore, Dale and Reiter[2] stress the attribute type which is always included in their descriptions even if it does not rule out any distractors.

The Incremental Algorithm is easy to implement and also computationally efficient running in polynomial time.

[2] More recently, empirical studies have become popular which are mostly based on the assumption that the generated expressions should be similar to human-produced ones.

Corpus-based evaluation began quite late in REG due to a lack of suitable data sets.

Typically those fully "semantically transparent"[45] created in experiments using simple and controlled settings.

The TUNA corpus which contains web-collected data on the two domains furniture and people has been used in three shared REG challenges already.

[note 1] To measure the correspondence between corpora and the results of REG algorithms several Metrics have been developed.

In an evaluation the scores are usually averaged over references made by different human participants in the corpus.

For the linguistic realization part of REG the overlap between strings has been measured using metrics like BLEU[55] or NIST.

A more time consuming way to evaluate REG algorithms is by letting humans judge the Adequacy (How clear is the description?)