Numeric character reference

Since WebSgml, XML and HTML 4, the code points of the Universal Character Set (UCS) of Unicode are used.

Ideally, when the characters of a document utilizing a markup language are encoded for storage or transmission over a network as a sequence of bits, the encoding that is used will be one that supports representing each and every character in the document, if not in the whole of Unicode, directly as a particular bit sequence.

While the syntax of SGML does not prohibit references to invalid or unassigned code points, such as ￿, SGML-derived markup languages such as HTML and XML can, and often do, restrict numeric character references to only those code points that are assigned to characters.

[1][citation needed] As another example, €, which is a reference to another control character, is not allowed to be used or referenced in either HTML or XML, but when used in HTML, it is usually not flagged as an error by web browsers – some of which interpret it as a reference to the character represented by code value 128 in the Windows-1252 encoding for compatibility reasons.

For example, as mentioned above, the correct numeric character reference for the Euro sign "€" U+20AC when using Unicode is decimal € and hexadecimal €.

As another example, if some text was created originally using the MacRoman character set, the left double quotation mark " will be represented with code point xD2.

This will not display properly in a system expecting a document encoded as UTF-8, ISO 8859-1, or CP-1252, where this code point is occupied by the letter Ò.

The correct numeric character reference for " in HTML 4 and newer is “, because U+201C is its UCS code.