Controlled vocabulary

[4][5] Controlled vocabularies solve the problems of homographs, synonyms and polysemes by a bijection between concepts and preferred terms.

In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency.

[3] For example, in the Library of Congress Subject Headings[6] (a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (cockroach versus Periplaneta americana), and choices between synonyms (automobile versus car), among other difficult issues.

The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area.

Controlled vocabulary elements (terms/phrases) employed as tags, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as metadata.

[9] Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce irrelevant items in the retrieval list.

Controlled vocabulary solves this problem by tagging the documents in such a way that the ambiguities are eliminated.

Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic).

Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly.

The use of controlled vocabularies can be costly compared to free text searches because human experts or expensive automated systems are necessary to index each entry.

Numerous methodologies have been developed to assist in the creation of controlled vocabularies, including faceted classification, which enables a given data record or document to be described in multiple ways.

[10] Controlled vocabularies, such as the Library of Congress Subject Headings, are an essential component of bibliography, the study and classification of books.

This consistency of terms is one of the most important concepts in technical writing and knowledge management, where effort is expended to use the same word throughout a document or organization instead of slightly different ones to refer to the same thing.

[12][non-primary source needed] Controlled vocabularies of the Semantic Web define the concepts and relationships (terms) used to describe a field of interest or area of concern.

To use machine-readable terms from any controlled vocabulary, web designers can choose from a variety of annotation formats, including RDFa, HTML5 Microdata, or JSON-LD in the markup, or RDF serializations (RDF/XML, Turtle, N3, TriG, TriX) in external files.