Unicode control characters

The standard ISO/IEC 2022 (ECMA-35) defines extension methods for ASCII, including a secondary "C1" range of 8-bit control codes from 0x80 to 0x9F, equivalent to 7-bit sequences of ESC with the bytes 0x40 through 0x5F.

[3] Unicode inherits its first and second blocks (comprising U+0000 through U+00FF) from ASCII and ISO/IEC 8859-1, thus incorporating the C0 and C1 control code ranges (U+0000–U+001F, U+007F–U+009F) as general category "Cc".

[4] Most of these characters play no explicit role in Unicode text handling, and are used only by higher-level protocols such as those used by terminal emulators.

The rest of the "Cc" control codes are transparent to Unicode and their meanings are left to higher-level protocols, although interpretation as defined in ISO/IEC 6429 is suggested as a default.

[5] Furthermore, certain specialised higher-level protocols, such as transcoded Teletext, may include a different interpretation of the entire C0 control code range.

The W3C Ruby markup recommendation is an example of an alternate protocol supporting more advanced interlinear annotation.

Similarly, Unicode handles the mixture of left-to-right-text alongside right-to-left text without any special characters.

These types of glyph substitution are easily handled by the context of the character with no other authoring input involved.

Authors may also use special-purpose characters such as joiners and non-joiners to force an alternate form of glyph where it would not otherwise appear.

However, for other glyph substitution, the author's intent may need to be encoded with the text and cannot be determined contextually.

This is the case with character/glyphs referred to as gaiji where different glyphs are used for the same character either historically or for ideographs for family names.