Shannon's source coding theorem

In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

Named after Claude Shannon, the source coding theorem shows that, in the limit, as the length of a stream of independent and identically-distributed random variable (i.i.d.)

data tends to infinity, it is impossible to compress such data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost.

However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

The source coding theorem for symbol codes places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.

random variable), the Kolmogorov complexity, which quantifies the minimal description length of an object, is more suitable to describe the limits of data compression.

Shannon entropy takes into account only frequency regularities while Kolmogorov complexity takes into account all algorithmic regularities, so in general the latter is smaller.

On the other hand, if an object is generated by a random process in such a way that it has only frequency regularities, entropy is close to complexity with high probability (Shen et al.

random variables each with entropy H(X) can be compressed into more than N H(X) bits with negligible risk of information loss, as N → ∞; but conversely, if they are compressed into fewer than N H(X) bits it is virtually certain that information will be lost.The

coded sequence represents the compressed message in a biunivocal way, under the assumption that the decoder knows the source.

Usually, the information that characterizes the source is inserted at the beginning of the transmitted message.

Suppose that X is a random variable taking values in Σ1 and let  f  be a uniquely decodable code from Σ∗1 to Σ∗2 where |Σ2| = a.

If  f  is optimal in the sense that it has the minimal expected word length for X, then (Shannon 1948): Where

The Source coding theorem states that for any ε > 0, i.e. for any rate H(X) + ε larger than the entropy of the source, there is large enough n and an encoder that takes n i.i.d.

repetition of the source, X1:n, and maps it to n(H(X) + ε) binary bits such that the source symbols X1:n are recoverable from the binary bits with probability of at least 1 − ε.

Fix some ε > 0, and let The typical set, Aεn, is defined as follows: The asymptotic equipartition property (AEP) shows that for large enough n, the probability that a sequence generated by the source lies in the typical set, Aεn, as defined approaches one.

The encoding algorithm: the encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary n(H(X) + ε) digit number.

As long as the input sequence lies within the typical set (with probability at least 1 − ε), the encoder does not make any error.

Proof of converse: the converse is proved by showing that any set of size smaller than Aεn (in the sense of exponent) would cover a set of probability bounded away from 1.

Thus, on an average, Hn(X) + ε bits suffice for encoding with probability greater than 1 − δ, where ε and δ can be made arbitrarily small, by making n larger.