Natural language generation

A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems that can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".

NLG systems can also be compared to translators of artificial computer languages, such as decompilers or transpilers, which also produce human-readable code generated from an intermediate representation.

NLG may be viewed as complementary to natural-language understanding (NLU): whereas in natural-language understanding, the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a representation into words.

NLU needs to deal with ambiguous or erroneous user input, whereas the ideas the system wants to express through NLG are generally known precisely.

NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.

This system takes as input six numbers, which give predicted pollen levels in different parts of Scotland.

However, a sophisticated NLG system needs to include stages of planning and merging of information to enable the generation of text that looks natural and does not become repetitive.

The typical stages of natural-language generation, as proposed by Dale and Reiter,[6] are: Content determination: Deciding what information to mention in the text.

The earliest such system to be deployed was FoG,[3] which was used by Environment Canada to generate weather forecasts in French and English in the early 1990s.

[19] NLG is also being used commercially in automated journalism, chatbots, generating product descriptions for e-commerce sites, summarising medical records,[20][4] and enhancing accessibility (for example by describing graphs and data sets to blind people[21]).

The same idea can be applied in a sports setting, with different reports generated for fans of specific teams.

[22] Over the past few years, there has been an increased interest in automatically generating captions for images, as part of a broader endeavor to investigate the interface between vision and language.

Designing automatic measures that can mimic human judgments in evaluating the suitability of image descriptions is another need in the area.

Other open challenges include visual question-answering (VQA),[25] as well as the construction and evaluation multilingual repositories for image description.

[22] Another area where NLG has been widely applied is automated dialogue systems, frequently in the form of chatbots.

A chatbot or chatterbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.

While natural language processing (NLP) techniques are applied in deciphering human input, NLG informs the output part of the chatbot algorithms in facilitating real-time dialogues.

A recent pioneer in the area is Phillip Parker, who has developed an arsenal of algorithms capable of automatically generating textbooks, crossword puzzles, poems and books on topics ranging from bookbinding to cataracts.

[29] Despite progresses, many challenges remain in producing automated creative and humorous content that rival human output.

However, task-based evaluations are time-consuming and expensive, and can be difficult to carry out (especially if they require subjects with specialised expertise, such as doctors).

In any case, human ratings are the most popular evaluation technique in NLG; this is contrast to machine translation, where metrics are widely used.