Typical set

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution.

That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers.

This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence Xn using nH(X) bits on average, and, hence, justifying the use of entropy as a measure of information from a source.

The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases.

Additionally, the typical set concept is foundational in understanding the limits of data transmission and error correction in communication systems.

By leveraging the properties of typical sequences, efficient coding schemes like Shannon's source coding theorem and channel coding theorem are developed, enabling near-optimal data compression and reliable transmission over noisy channels.

If a sequence x1, ..., xn is drawn from an independent identically-distributed random variable (IID) X defined over a finite alphabet

(n) is defined as those sequences which satisfy: where is the information entropy of X.

Taking the logarithm on all sides and dividing by -n, this definition can be equivalently stated as For i.i.d sequence, since we further have By the law of large numbers, for sufficiently large n An essential characteristic of the typical set is that, if one draws a large number n of independent random samples from the distribution X, the resulting sequence (x1, x2, ..., xn) is very likely to be a member of the typical set, even though the typical set comprises only a small fraction of all the possible sequences.

, one can choose n such that: For a general stochastic process {X(t)} with AEP, the (weakly) typical set can be defined similarly with p(x1, x2, ..., xn) replaced by p(x0τ) (i.e. the probability of the sample limited to the time interval [0, τ]), n being the degree of freedom of the process in the time interval and H(X) being the entropy rate.

If the process is continuous valued, differential entropy is used instead.

Counter-intuitively, the most likely sequence is often not a member of the typical set.

For example, suppose that X is an i.i.d Bernoulli random variable with p(0)=0.1 and p(1)=0.9.

Here the entropy of X is H(X)=0.469, while So this sequence is not in the typical set because its average logarithmic probability cannot come arbitrarily close to the entropy of the random variable X no matter how large we take the value of n. For Bernoulli random variables, the typical set consists of sequences with average numbers of 0s and 1s in n independent trials.

In case p(0)=p(1)=0.5, then every possible binary sequences belong to the typical set.

If a sequence x1, ..., xn is drawn from some specified joint distribution defined over a finite or an infinite alphabet

is defined as the set of sequences which satisfy where

is the number of occurrences of a specific symbol in the sequence.

Strong typicality is often easier to work with in proving theorems for memoryless channels.

However, as is apparent from the definition, this form of typicality is only defined for random variables having finite support.

is ε-typical with respect to the joint distribution

are ε-typical with respect to their marginal distributions

Jointly ε-typical n-tuple sequences are defined similarly.

be two independent sequences of random variables with the same marginal distributions

Then for any ε>0, for sufficiently large n, jointly typical sequences satisfy the following properties: In information theory, typical set encoding encodes only the sequences in the typical set of a stochastic source with fixed length block codes.

Since the size of the typical set is about 2nH(X), only nH(X) bits are required for the coding, while at the same time ensuring that the chances of encoding error is limited to ε. Asymptotically, it is, by the AEP, lossless and achieves the minimum rate equal to the entropy rate of the source.

In information theory, typical set decoding is used in conjunction with random coding to estimate the transmitted message as the one with a codeword that is jointly ε-typical with the observation.

is defined with respect to the joint distribution

is the transition probability that characterizes the channel statistics, and

is some input distribution used to generate the codewords in the random codebook.