Unicode equivalence

Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed.

For example, the code point U+006E n LATIN SMALL LETTER N followed by U+0303 ◌̃ COMBINING TILDE is defined by Unicode to be canonically equivalent to the single code point U+00F1 ñ LATIN SMALL LETTER N WITH TILDE of the Spanish alphabet).

Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts.

Thus, for example, the code point U+FB00 (the typographic ligature "ﬀ") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters).

For compatibility or other reasons, Unicode sometimes assigns two different code points to entities that are essentially the same character.

In general, precomposed characters are defined to be canonically equivalent to the sequence of their base letter and subsequent combining diacritic marks, in whatever order these may occur.

However, the two sequences are not declared canonically equivalent, since the distinction has some semantic value and affects the rendering of the text.

A text processing software implementing the Unicode string search and comparison functionality must take into account the presence of equivalent code points.

For instance, some typographic ligatures like U+FB03 (ﬃ), Roman numerals like U+2168 (Ⅸ) and even subscripts and superscripts, e.g. U+2075 (⁵) have their own Unicode code points.

[3] For defective Unicode strings starting with a Hangul vowel or trailing conjoining jamo, concatenation can break Composition.

Stable sorting is required because combining characters with the same class value are assumed to interact typographically, thus the two possible orders are not considered equivalent.

In one specific instance, OS X normalized Unicode filenames sent from the Netatalk and Samba file- and printer-sharing software.

Netatalk and Samba did not recognize the altered filenames as equivalent to the original, leading to data loss.