Noisy-channel coding theorem

In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible (in theory) to communicate discrete data (digital information) nearly error-free up to a computable maximum rate through the channel.

This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.

The Shannon limit or Shannon capacity of a communication channel refers to the maximum rate of error-free data that can theoretically be transferred over the channel if the link is subject to random data transmission errors, for a particular noise level.

This founded the modern discipline of information theory.

Stated by Claude Shannon in 1948, the theorem describes the maximum possible efficiency of error-correcting methods versus levels of noise interference and data corruption.

Shannon's theorem has wide-ranging applications in both communications and data storage.

This theorem is of foundational importance to the modern field of information theory.

The first rigorous proof for the discrete case is given in (Feinstein 1954).

there exist codes that allow the probability of error at the receiver to be made arbitrarily small.

, an arbitrarily small probability of error is not achievable.

The theorem does not address the rare situation in which rate and capacity are equal.

Simple schemes such as "send the message 3 times and use a best 2 out of 3 voting scheme if the copies differ" are inefficient error-correction methods, unable to asymptotically guarantee that a block of data can be communicated free of error.

Using these highly efficient codes and with the computing power in today's digital signal processors, it is now possible to reach very close to the Shannon limit.

In fact, it was shown that LDPC codes can reach within 0.0045 dB of the Shannon limit (for binary additive white Gaussian noise (AWGN) channels, with very long block lengths).

[1] The basic mathematical model for a communication system is the following: A message W is transmitted through a noisy channel by using encoding and decoding functions.

An encoder maps W into a pre-defined sequence of channel symbols of length n. In its most basic model, the channel distorts each of these symbols independently of the others.

The output of the channel –the received sequence– is fed into a decoder which maps the sequence into an estimate of the message.

In this setting, the probability of error is defined as: Theorem (Shannon, 1948): (MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991), p. 198; Shannon (1948) thm.

The following outlines are only one set of many different styles available for study in information theory texts.

Another style can be found in information theory texts using error exponents.

Both types of proofs make use of a random coding argument where the codebook used across a channel is randomly constructed - this serves to make the analysis simpler while still proving the existence of a code satisfying a desired low probability of error at any data rate below the channel capacity.

, we can define a jointly typical set by the following: We say that two sequences

Steps The probability of error of this scheme is divided into two parts: Define:

Finally, given that the average codebook is shown to be "good," we know that there exists a codebook whose performance is better than the average, and so satisfies our need for arbitrarily low error probability communicating across the noisy channel.

is bounded away from 0 if R is greater than C - we can get arbitrarily low rates of error only if R is less than C. A strong converse theorem, proven by Wolfowitz in 1957,[3] states that, for some finite positive constant

While the weak converse states that the error probability is bounded away from zero as

goes to infinity, the strong converse states that the error goes to 1.

is a sharp threshold between perfectly reliable and completely unreliable communication.

We assume that the channel is memoryless, but its transition probabilities change with time, in a fashion known at the transmitter as well as the receiver.

The proof runs through in almost the same way as that of channel coding theorem.

Graph showing the proportion of a channel’s capacity ( y -axis) that can be used for payload based on how noisy the channel is (probability of bit flips; x -axis)