Sub-Gaussian distribution

Often in analysis, we divide an object (such as a random variable) into two parts, a central bulk and a distant tail, then analyze each separately.

The tail event is so rare, we may safely ignore that."

Subgaussian distributions are worthy of study, because the gaussian distribution is well-understood, and so we can give sharp bounds on the rarity of the tail event.

Similarly, the subexponential distributions are also worthy of study.

is called the optimal variance proxy and denoted by

So for example, given a random variable satisfying (1) and (2), the minimal constants

From the proof, we can extract a cycle of three inequalities: In particular, the constant

The proof splits the integral of MGF to two halves, one with

are independent random variables with the same upper subgaussian tail:

Gaussian concentration inequality for Lipschitz functions (Tao 2012, Theorem 2.1.12.)

By shifting and scaling, it suffices to prove the case where

Now it remains to bound the cumulant generating function.

By the circular symmetry of gaussian variables, we introduce

At the edge of possibility, we define that a random variable

is known for many standard probability distributions, including the beta, Bernoulli, Dirichlet[6], Kumaraswamy, triangular[7], truncated Gaussian, and truncated exponential.

Hoeffding's inequality is the Chernoff bound obtained using this fact.

The purpose of subgaussianity is to make the tails decay fast, so we generalize accordingly: a subgaussian random vector is a random vector where the tail decays fast.

(over a convex polytope) Fix a finite set of vectors

such that given any number of independent mean-zero subgaussian random variables

(Hoeffding's inequality) (Theorem 2.6.3 [2]) There exists a positive constant

such that given any number of independent mean-zero subgaussian random variables

(Bernstein's inequality) (Theorem 2.8.1 [2]) There exists a positive constant

such that given any number of independent mean-zero subexponential random variables

(Khinchine inequality) (Exercise 2.6.5 [2]) There exists a positive constant

such that given any number of independent mean-zero variance-one subgaussian random variables

The Hanson-Wright inequality states that if a random vector

A weak version of the following theorem was proved in (Hanson, Wright, 1971).

The purpose is to take a subgaussian vector and uniformly bound its quadratic forms.

has its tail uniformly bounded by an exponential, or a gaussian, whichever is larger.

It is a mathematical constant much like pi and e. Theorem (subgaussian concentration).

Some commonly used bounded distributions.
Density of a mixture of three normal distributions ( μ = 5, 10, 15, σ = 2) with equal weights. Each component is shown as a weighted density (each integrating to 1/3)