Unicode compatibility characters

Some compatibility characters are completely dispensable for text processing and display software that conforms to the Unicode standard.

In order to dispense with these compatibility characters, text software must conform to several Unicode protocols.

These include all of the compatibility characters marked with the keywords , , , , , , , , , .

These include all of the compatibility characters marked with keywords and (except three listed in the semantically distinct below); 11 spaces variants from the and canonical characters; and some of the keyword and from the "Superscripts and Subscripts" block.

Therefore, Unicode designates several mathematical symbols based on letters from Greek and Hebrew as compatibility characters.

However, for all practical purposes they share the same semantics as their compatibility equivalent Greek or Hebrew letter.

Though not the intention of Unicode to encode such measuring units the repertoire includes six (6) such symbols that should not be used by authors: the characters' decompositions should be used instead.

In these cases subscripts and superscripts are not merely rich text, but constitute a distinct character in the writing system (130 total).

Finally, Unicode designates Roman numerals as compatibility equivalence to the Latin letters that share the same glyphs.

A similar situation exists for phonetic alphabet characters that use subscript or superscript positioned glyphs.

In the specialized circles that use phonetic alphabets, authors should be able to do so without resorting to rich text protocols.

This approach is much more flexible and open-ended than using the finite set of circled or enclosed alphanumerics to give just one example.

The "Enclosed CJK Letters and Months" block contains a single non-compatibility character: the 'Korean Standard Symbol' (㉿ U+327F).

[6] In any event, a normalized text should never contain both U+27EAF 𧺯 and U+FA23 﨣; these code points represent the same character, encoded twice.

Several other characters in these blocks have no compatibility mapping but are clearly intended for legacy support: Alphabetic Presentation Forms (1) Arabic Presentation Forms (4) CJK Compatibility Forms (2 that are both related to CJK Unified Ideograph: U+4E36 丶) Enclosed Alphanumerics (21 rich text variants) Normalization is the process by which Unicode conforming software first performs full compatibility decomposition (or composition) before making comparisons or collating text strings.