Variable-length code

The equivalent concept in computer science is bit string.

With the right coding strategy, an independent and identically-distributed source may be compressed almost arbitrarily close to its entropy.

This is in contrast to fixed-length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.

The extension of a code is the mapping of finite length source sequences to finite length bit strings, that is obtained by concatenating for each symbol of the source sequence the corresponding codeword produced by the original code.

Using terms from formal language theory, the precise mathematical definition is as follows: Let

be two finite sets, called the source and target alphabets, respectively.

Prefix codes are always uniquely decodable, and these in turn are always non-singular: A code is non-singular if each source symbol is mapped to a different non-empty bit string; that is, the mapping from source symbols to bit strings is injective.

A code is a prefix code if no target bit string in the mapping is a prefix of the target bit string of a different source symbol in the same mapping.

This means that symbols can be decoded instantaneously after their entire codeword is received.

Another special case of prefix codes are LEB128 and variable-length quantity (VLQ) codes, which encode arbitrarily large integers as a sequence of octets—i.e., every codeword is a multiple of 8 bits.