Chernoff bound

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function.

We may therefore combine the two infima and define the two-sided Chernoff bound:

which provides an upper bound on the folded cumulative distribution function of

The logarithm of the two-sided Chernoff bound is known as the rate function (or Cramér transform)

It is equivalent to the Legendre–Fenchel transform or convex conjugate of the cumulant generating function

The moment generating function is log-convex, so by a property of the convex conjugate, the Chernoff bound must be log-concave.

[citation needed] Individual moments can provide tighter bounds, at the cost of greater analytical complexity.

Unlike the Chernoff bound however, this result is not exponentially tight.

Theodosopoulos[9] constructed a tight(er) MGF-based lower bound using an exponential tilting procedure.

That is, the Chernoff bound for the average of n iid variables is equivalent to the nth power of the Chernoff bound on a single variable (see Cramér's theorem).

The proof follows a similar approach to the other Chernoff bounds, but applying Hoeffding's lemma to bound the moment generating functions (see Hoeffding's inequality).

Suppose X1, ..., Xn are independent random variables taking values in {0, 1}.

Then for any δ > 0, A similar proof strategy can be used to show that for 0 < δ < 1 The above formula is often unwieldy in practice, so the following looser but more convenient bounds[10] are often used, which follow from the inequality

from the list of logarithmic inequalities: Notice that the bounds are trivial for

A simpler bound follows by relaxing the theorem using D(p + ε || p) ≥ 2ε2, which follows from the convexity of D(p + ε || p) and the fact that This result is a special case of Hoeffding's inequality.

Chernoff bounds have very useful applications in set balancing and packet routing in sparse networks.

The set balancing problem arises while designing statistical experiments.

Typically while designing a statistical experiment, given the features of each participant in the experiment, we need to know how to divide the participants into 2 disjoint groups such that each feature is roughly as balanced as possible between the two groups.

[13] Chernoff bounds are also used to obtain tight bounds for permutation routing problems which reduce network congestion while routing packets in sparse networks.

[13] Chernoff bounds are used in computational learning theory to prove that a learning algorithm is probably approximately correct, i.e. with high probability the algorithm has small error on a sufficiently large training data set.

[14] Chernoff bounds can be effectively used to evaluate the "robustness level" of an application/algorithm by exploring its perturbation space with randomization.

The robustness level can be, in turn, used either to validate or reject a specific algorithmic choice, a hardware implementation or the appropriateness of a solution whose structural parameters are affected by uncertainties.

A simple and common use of Chernoff bounds is for "boosting" of randomized algorithms.

via the multiplicative Chernoff bound (Corollary 13.3 in Sinclair's class notes, μ = np).

:[16] Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.

is inevitable: take for example a diagonal random sign matrix of dimension

The operator norm of the sum of t independent samples is precisely the maximum deviation among d independent random walks of length t. In order to achieve a fixed bound on the maximum deviation with constant probability, it is easy to see that t should grow logarithmically with d in this scenario.

[19] The following theorem can be obtained by assuming M has low rank, in order to avoid the dependency on the dimensions.

Following the conditions of the multiplicative Chernoff bound, let X1, ..., Xn be independent Bernoulli random variables, whose sum is X, each having probability pi of being equal to 1.

Taking a = nq in (1), we obtain: Now, knowing that Pr(Xi = 1) = p, Pr(Xi = 0) = 1 − p, we have Therefore, we can easily compute the infimum, using calculus: Setting the equation to zero and solving, we have so that Thus, As q = p + ε > p, we see that t > 0, so our bound is satisfied on t. Having solved for t, we can plug back into the equations above to find that We now have our desired result, that To complete the proof for the symmetric case, we simply define the random variable Yi = 1 − Xi, apply the same proof, and plug it into our bound.