Variable-width encoding

However disks (which unlike tapes allowed random access allowing text to be loaded on demand), increases in computer memory and general purpose compression algorithms have rendered such tricks largely obsolete.

Input and display software obviously needs to know about the structure of the multibyte encoding scheme, but other software generally doesn't need to know if a pair of bytes represent two separate characters or just one character.

UTF-8 makes it easy for a program to identify the three sorts of units, since they fall into separate value ranges.

In such encodings, one is liable to encounter false positives when searching for a string in the middle of the text.

There is also the danger that a single corrupted or lost unit may render the whole interpretation of a large run of multiunit sequences incorrect.

The stateful nature of these encodings and the large overlap make them very awkward to process.

This overlap again made processing tricky, though at least most of the symbols had unique byte values (though strangely the backslash does not).

Because of this bad design, similar to Shift JIS and Big5 in its overlap of values, the inventors of the Plan 9 operating system, the first to implement Unicode throughout, abandoned it and replaced it with a much better designed variable-width encoding for Unicode: UTF-8, in which singletons have the range 00–7F, lead units have the range C0–FD (now actually C2–F4, to avoid overlong sequences and to maintain synchronism with the encoding capacity of UTF-16; see the UTF-8 article), and trail units have the range 80–BF.