Standard Generalized Markup Language

SGML descended from IBM's Generalized Markup Language (GML), which Charles Goldfarb, Edward Mosher, and Raymond Lorie developed in the 1960s.

Goldfarb, editor of the international standard, coined the "GML" term using their surname initials.

The advent of the XML profile has made SGML suitable for widespread application for small-scale, general-purpose use.

Integrally stored reflects the XML requirement that elements end in the same entity in which they started.

Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup.

In SGML, the entities and element types used in the document may be specified with a DTD, the different character sets, features, delimiter sets, and keywords are specified in the SGML Declaration to create the concrete syntax of the document.

SGML generalizes and supports a wide range of markup languages as found in the mid 1980s.

However, XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems.

Note: The OMITTAG feature is unrelated to the tagging of elements whose declared content is EMPTY as defined in the DTD: Elements defined like this have no end tag, and specifying one in the document instance would result in invalid markup.

SGML markup languages whose concrete syntax enables the SHORTTAG VALUE feature, do not require attribute values containing only alphanumeric characters to be enclosed within quotation marks—either double " " (LIT) or single ' ' (LITA)—so that the previous markup example could be written: One feature of SGML markup languages is the "presumptuous empty tagging", such that the empty end tag in this "inherits" its value from the nearest previous full start tag, which, in this example, is (in other words, it closes the most recently opened item).

Another feature is the NET (Null End Tag) construction: this.

Additionally, the SHORTTAG NETENABL IMMEDNET feature allows shortening tags surrounding an empty text value, but forbids shortening full tags: can be written as wherein the first slash ( / ) stands for the NET-enabling "start-tag close" (NESTC), and the second slash stands for the NET.

The third feature is 'text on the same line', allowing a markup item to be ended with a line-end; especially useful for headings and such, requiring using either SHORTREF or DATATAG minimization.

For example, if the DTD includes the following declarations: (and "&#RE;&#RS;" is a short-reference delimiter in the concrete syntax), then: is equivalent to: SGML has many features that defied convenient description with the popular formal automata theory and the contemporary parser technology of the 1980s and the 1990s.

The standard warns in Annex H: The SGML model group notation was deliberately designed to resemble the regular expression notation of automata theory, because automata theory provides a theoretical foundation for some aspects of the notion of conformance to a content model.

Moreover, the structure graph is also loosely characterized as an element tree, but the ID/IDREF markup allows arbitrary arcs.

The SGML standard characterizes parsing as a state machine switching between recognition modes.

During parsing, there is a stack of maps that configure the scanner, while the tokenizer relates to the recognition modes.

It was this active use of grammars that made concrete SGML parsing difficult to formally characterize.

The XML Infoset corresponds more to the programming language notion of abstract syntax introduced by John McCarthy.

Applications of XML include XHTML, XQuery, XSLT, XForms, XPointer, JSP, SVG, RSS, Atom, XML-RPC, RDF/XML, and SOAP.

[17] The second edition of the Oxford English Dictionary (OED) is entirely marked up with an SGML-based markup language using the LEXX text editor.

Significant open-source implementations of SGML have included: SP and Jade, the associated DSSSL processors, are maintained by the OpenJade project, and are common parts of Linux distributions.

A fragment of the Oxford English Dictionary (1985), showing SGML markup