Prefix code

A prefix code is a uniquely decodable code: given a complete and accurate sequence, a receiver can identify each word without requiring a special marker between words.

The recipient can decode the message unambiguously, by repeatedly finding and removing sequences that form valid code words.

In practice, a message might first be compressed with a prefix code, and then encoded again with channel coding (including error correction) before transmission.

For example, ISO 8859-15 letters are always 8 bits long.

ATM cells are always 424 bits (53 bytes) long.

A fixed-length code of fixed length k bits can encode up to

It is possible to turn any code into a fixed-length code by padding fixed symbols to the shorter prefixes in order to meet the length of the longest prefixes.

Alternately, such padding codes may be employed to introduce redundancy that allows autocorrection and/or synchronisation.

However, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others.

Truncated binary encoding is a straightforward generalization of fixed-length codes to deal with cases where the number of symbols n is not a power of two.

Source symbols are assigned codewords of length k and k+1, where k is chosen so that 2k < n ≤ 2k+1.

This is a form of lossless data compression based on entropy encoding.

The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins.

A suffix code is a set of words none of which is a suffix of any other; equivalently, a set of words which are the reverse of a prefix code.

As with a prefix code, the representation of a string as a concatenation of such words is unique.