Text Encoding Initiative

The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s.

The community currently runs a mailing list, meetings and conference series, and maintains the TEI technical standard, a journal,[1] a wiki, a GitHub repository and a toolchain.

The standard is split into two parts, a discursive textual description with extended examples and discussion and set of tag-by-tag definitions.

Schemata in most of the modern formats (DTD, RELAX NG and XML Schema (W3C)) are generated automatically from the tag-by-tag definitions.

In addition to documenting and describing each TEI tag, an ODD specification specifies its content model and other usage constraints, which may be expressed using schematron.

One example of this is the W3C's Internationalization Tag Set which uses the ODD format to generate schemas and document its vocabulary.

Even when users choose one of the off-the-shelf pre-generated schemas to validate against, these have been created from freely available customization files.

In 1987, a group of scholars representing fields in humanities, linguistics, and computing convened at Vassar College to put forth a set of guidelines known as the “Poughkeepsie Principles”.