Matrix Chernoff bound

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices.

Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t: The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts.

All of these theorems can be found in (Tropp 2010), as the specific application of a general result which is derived below.

In the matrix setting, the analogous theorem concerns a sum of positive-semidefinite random matrices subjected to a uniform eigenvalue bound.

of independent, random, self-adjoint matrices that satisfy almost surely.

Compute the minimum and maximum eigenvalues of the average expectation, Then The binary information divergence is defined as for

In the scalar setting, Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or subexponential.

In the matrix case, the analogous results concern a sum of zero-mean random matrices.

Compute the norm of the total variance, Then, the following chain of inequalities holds for all

of independent and identically distributed random column vectors in

Compute the variance parameter, Then, the following chain of inequalities holds for all

The scalar version of Azuma's inequality states that a scalar martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence.

where An improvement of this result was established in (Mackey et al. 2012): for all

where In scalar setting, McDiarmid's inequality provides one common way of bounding the differences by applying Azuma's inequality to a Doob martingale.

A version of the bounded differences inequality holds in the matrix setting.

Recall the theorem above for self-adjoint matrix Gaussian and Rademacher bounds: For a finite sequence

a finite sequence of independent standard normal or independent Rademacher random variables, then where Ahlswede and Winter would give the same result, except with By comparison, the

It is never larger than the Ahlswede–Winter value (by the norm triangle inequality), but can be much smaller.

The chief contribution of (Ahlswede & Winter 2003) was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see Chernoff bound#Additive form (absolute error)) to the case of self-adjoint matrices.

Suppose one wished to vary the length of the series (n) and the dimensions of the matrices (d) while keeping the right-hand side approximately constant.

Then n must vary approximately as the log of d. Several papers have attempted to establish a bound without a dependence on dimensions.

(Magen & Zouzias 2010) provide a result without the dimensional dependence for low rank matrices.

The Laplace transform argument found in (Ahlswede & Winter 2003) is a significant result in its own right: Let

This is where the methods of (Ahlswede & Winter 2003) and (Tropp 2010) diverge.

At any rate, one can see how the Ahlswede–Winter bound arises as the sum of largest eigenvalues.

The major contribution of (Tropp 2010) is the application of Lieb's theorem where (Ahlswede & Winter 2003) had applied the Golden–Thompson inequality.

The final step is to use Jensen's inequality to move the expectation inside the function: This gives us the major result of the paper: the subadditivity of the log of the matrix generating function.

be a finite sequence of independent, random self-adjoint matrices.

Expanding the definitions, we need to show that To complete the proof, we use the law of total expectation.

Finally, we have where at every step m we use Tropp's corollary with The following is immediate from the previous result: All of the theorems given above are derived from this bound; the theorems consist in various ways to bound the infimum.