Shannon–Fano coding

In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is one of two related techniques for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured).

[1] However, Shannon–Fano codes have an expected codeword length within 1 bit of optimal.

Regarding the confusion in the two different codes being referred to by the same name, Krajči et al. write:[2] Around 1948, both Claude E. Shannon (1948) and Robert M. Fano (1949) independently proposed two different source coding algorithms for an efficient description of a discrete memoryless source.

Unfortunately, in spite of being different, both schemes became known under the same name Shannon–Fano coding.

For one thing, in the discussion of his coding scheme, Shannon mentions Fano’s scheme and calls it “substantially the same” (Shannon, 1948, p. 17 [reprint]).

[3] For another, both Shannon’s and Fano’s coding schemes are similar in the sense that they both are efficient, but suboptimal prefix-free coding schemes with a similar performance.

Shannon's (1948) method, using predefined word lengths, is called Shannon–Fano coding by Cover and Thomas,[4] Goldie and Pinch,[5] Jones and Jones,[6] and Han and Kobayashi.

[8] Fano's (1949) method, using binary division of probabilities, is called Shannon–Fano coding by Salomon[9] and Gupta.

[10] It is called Fano coding by Krajči et al.[2] Shannon's method starts by deciding on the lengths of all the codewords, then picks a prefix code with those word lengths.

is the ceiling function, meaning the smallest integer greater than or equal to

One method is to pick codewords in order from most probable to least probable symbols, picking each codeword to be the lexicographically first word of the correct length that maintains the prefix-free property.

This example shows the construction of a Shannon–Fano code for a small alphabet.

We can pick codewords in order, choosing the lexicographically first word of the correct length that maintains the prefix-free property.

To maintain the prefix-free property, B's codeword may not start 00, so the lexicographically first available word of length 3 is 010.

Continuing like this, we get the following code: Alternatively, we can use the cumulative probability method.

Note that although the codewords under the two methods are different, the word lengths are the same.

Hence we see that the Shannon–Fano code is always within one bit of the optimal expected word length.

The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently.

Fano's version of Shannon–Fano coding is used in the IMPLODE compression method, which is part of the ZIP file format.

[11] A Shannon–Fano tree is built according to a specification designed to define an effective code table.

All symbols are sorted by frequency, from left to right (shown in Figure a).

In the final tree, the three symbols with the highest frequencies have all been assigned 2-bit codes, and two symbols with lower counts have 3-bit codes as shown table below: This results in lengths of 2 bits for A, B and C and per 3 bits for D and E, giving an average length of We see that Fano's method, with an average length of 2.28, has outperformed Shannon's method, with an average length of 2.62.

Neither Shannon–Fano algorithm is guaranteed to generate an optimal code.

This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences.

In most situations, arithmetic coding can produce greater overall compression than either Huffman or Shannon–Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol.

However, arithmetic coding has not superseded Huffman the way that Huffman supersedes Shannon–Fano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents.

[12] A few years later, David A. Huffman (1952)[13] gave a different algorithm that always produces an optimal tree for any given symbol probabilities.

While Fano's Shannon–Fano tree is created by dividing from the root to the leaves, the Huffman algorithm works in the opposite direction, merging from the leaves to the root.

The lowest pair now are B and C so they're allocated 0 and 1 and grouped together with a combined probability of 0.333.

This leaves BC and DE now with the lowest probabilities so 0 and 1 are prepended to their codes and they are combined.

Shannon–Fano Algorithm
Huffman Algorithm