HTML describes the structure of a web page semantically and originally included cues for its appearance.
HTML can embed programs written in a scripting language such as JavaScript, which affects the behavior and content of web pages.
Except for the hyperlink tag, these were strongly influenced by SGMLguid, an in-house Standard Generalized Markup Language (SGML)-based documentation format at CERN.
Many of the text elements are mentioned in the 1988 ISO technical report TR 9537 Techniques for using SGML, which describes the features of early text formatting languages such as that used by the RUNOFF command developed in the early 1960s for the CTSS (Compatible Time-Sharing System) operating system.
It was formally defined as such by the Internet Engineering Task Force (IETF) with the mid-1993 publication of the first proposal for an HTML specification, the "Hypertext Markup Language (HTML)" Internet Draft by Berners-Lee and Dan Connolly, which included an SGML Document type definition to define the syntax.
[10][11] The draft expired after six months, but was notable for its acknowledgment of the NCSA Mosaic browser's custom tag for embedding in-line images, reflecting the IETF's philosophy of basing standards on successful prototypes.
Similarly, Dave Raggett's competing Internet Draft, "HTML+ (Hypertext Markup Format)", from late 1993, suggested standardizing already-implemented features like tables and fill-out forms.
Since 1996,[update] the HTML specifications have been maintained, with input from commercial software vendors, by the World Wide Web Consortium (W3C).
In 2004, development began on HTML5 in the Web Hypertext Application Technology Working Group (WHATWG), which became a joint deliverable with the W3C in 2008, and was completed and standardized on 28 October 2014.
HTML tags most commonly come in pairs like
Another important component is the HTML document type declaration, which triggers standards mode rendering.
These indicate other information, such as identifiers for sections within the document, identifiers used to bind style information to the presentation of the document, and for some tags such as the used to embed images, the reference to the image resource in the format like this:
Some elements, such as the line break
do not permit any embedded content, either text or further tags.
[78] There are several common attributes that may appear in many elements : The abbreviation element, abbr, can be used to demonstrate some of these attributes: This example displays as HTML; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language."
If document authors overlook the need to escape such characters, some browsers can be very forgiving and try to use context to guess their intent.
[86] In a 2001 discussion of the Semantic Web, Tim Berners-Lee and others gave examples of ways in which intelligent software "agents" may one day automatically crawl the web and find, filter, and correlate previously unrelated, published facts for the benefit of human users.
[87] Such agents are not commonplace even now, but some of the ideas of Web 2.0, mashups and price comparison websites may be coming close.
In order for search engine spiders to be able to rate the significance of pieces of text they find in HTML documents, and also for those creating mashups and other hybrids as well as for more automated agents as they are developed, the semantic structures that exist in HTML need to be widely and uniformly applied to bring out the meaning of the published text.
The majority of presentational features from previous versions of HTML are no longer allowed as they lead to poorer accessibility, higher cost of site maintenance, and larger document sizes.
A document sent with the XHTML MIME type is expected to be well-formed XML; syntax errors may cause the browser to fail to render it.
[92] Most graphical email clients allow the use of a subset of HTML (often ill-defined) to provide formatting and semantic markup not available with plain text.
This may include typographic information like colored headings, emphasized and quoted text, inline images and diagrams.
Many such clients include both a GUI editor for composing HTML e-mail messages and a rendering engine for displaying them.
An HTA runs as a fully trusted application and therefore has more privileges, like creation/editing/removal of files and Windows Registry entries.
The latest standards surrounding HTML reflect efforts to overcome the sometimes chaotic development of the language[95] and to create a rational foundation for building both meaningful and well-presented documents.
The W3C intended XHTML 1.0 to be identical to HTML 4.01 except where limitations of XML over the more complex SGML require workarounds.
The introduction of this shorthand, which is not used in the SGML declaration for HTML 4.01, may confuse earlier software unfamiliar with this new convention.
HTML 4 defined three different versions of the language: Strict, Transitional (once called Loose), and Frameset.
Likewise, someone looking for the loose (transitional) or frameset specifications will find similar extended XHTML 1.1 support (much of it is contained in the legacy or frame modules).
The editor renders the document rather than showing the code, so authors do not require extensive knowledge of HTML.
The WYSIWYG editing model has been criticized,[97][98] primarily because of the low quality of the generated code; there are voices[who?]