In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution.
In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d.
random variables.
Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A).
Suppose we draw n i.i.d.
samples from q, represented by the vector
Then, we have the following bound on the probability that the empirical measure
of the samples falls within the set A: where In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.
Furthermore, if A is a closed set, then Define:
Then, Sanov's theorem states:[1] Here,
This probability-related article is a stub.