Kraft–McMillan inequality

In coding theory, the Kraft–McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code[1] (in Leon G. Kraft's version) or a uniquely decodable code (in Brockway McMillan's version) for a given set of codeword lengths.

Its applications to prefix codes and trees often find use in computer science and information theory.

The prefix code can contain either finitely many or infinitely many codewords.

However, Kraft's paper discusses only prefix codes, and attributes the analysis leading to the inequality to Raymond Redheffer.

The result was independently discovered in McMillan (1956).

McMillan proves the result for the general case of uniquely decodable codes, and attributes the version for prefix codes to a spoken observation in 1955 by Joseph Leo Doob.

Kraft's inequality limits the lengths of codewords in a prefix code: if one takes an exponential of the length of each valid codeword, the resulting set of values must look like a probability mass function, that is, it must have total measure less than or equal to one.

Kraft's inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive.

Among the useful properties following from the inequality are the following statements: Let each source symbol from the alphabet be encoded into a uniquely decodable code over an alphabet of size

with codeword lengths Then Conversely, for a given set of natural numbers

satisfying the above inequality, there exists a uniquely decodable code over an alphabet of size

Kraft's inequality states that Here the sum is taken over the leaves of the tree, i.e. the nodes without any children.

In the tree to the right, this sum is First, let us show that the Kraft inequality holds whenever the code for

-ary alphabet corresponds to a node in this tree at depth

th word in the prefix code corresponds to a node

, we have Since the code is a prefix code, those subtrees cannot share any leaves, which means that Thus, given that the total number of nodes at depth

natural numbers, satisfying the Kraft inequality, one can construct a prefix code with codeword lengths equal to each

arbitrarily, then ruling out all words of greater length that have it as a prefix.

First choose any node from the full tree at depth

Since we are building a prefix code, all the descendants of this node (i.e., all words that have this first word as a prefix) become unsuitable for inclusion in the code.

The next iteration picks a (surviving) node at depth

Since the Kraft inequality holds, we have indeed and thus a prefix code can be built.

Note that as the choice of nodes at each step is largely arbitrary, many different suitable prefix codes can be built, in general.

If there are infinitely many codewords, then any finite subset of it is also uniquely decodable, so it satisfies the Kraft–McMillan inequality.

Taking the limit, we have the inequality for the full code.

natural numbers, satisfying the Kraft inequality, we can construct a prefix code as follows.

digits after the radix point (e.g. decimal point) in the base r representation of Note that by Kraft's inequality, this sum is never more than 1.

Hence the codewords capture the entire value of the sum.

digits of Cj form a larger number than Ci, so the code is prefix free.

by taking the limit and applying Abel's theorem.