Often in analysis, we divide an object (such as a random variable) into two parts, a central bulk and a distant tail, then analyze each separately.
The tail event is so rare, we may safely ignore that."
Subgaussian distributions are worthy of study, because the gaussian distribution is well-understood, and so we can give sharp bounds on the rarity of the tail event.
Similarly, the subexponential distributions are also worthy of study.
is called the optimal variance proxy and denoted by
So for example, given a random variable satisfying (1) and (2), the minimal constants
From the proof, we can extract a cycle of three inequalities: In particular, the constant
The proof splits the integral of MGF to two halves, one with
are independent random variables with the same upper subgaussian tail:
Gaussian concentration inequality for Lipschitz functions (Tao 2012, Theorem 2.1.12.)
By shifting and scaling, it suffices to prove the case where
Now it remains to bound the cumulant generating function.
By the circular symmetry of gaussian variables, we introduce
At the edge of possibility, we define that a random variable
is known for many standard probability distributions, including the beta, Bernoulli, Dirichlet[6], Kumaraswamy, triangular[7], truncated Gaussian, and truncated exponential.
Hoeffding's inequality is the Chernoff bound obtained using this fact.
The purpose of subgaussianity is to make the tails decay fast, so we generalize accordingly: a subgaussian random vector is a random vector where the tail decays fast.
(over a convex polytope) Fix a finite set of vectors
such that given any number of independent mean-zero subgaussian random variables
(Hoeffding's inequality) (Theorem 2.6.3 [2]) There exists a positive constant
such that given any number of independent mean-zero subgaussian random variables
(Bernstein's inequality) (Theorem 2.8.1 [2]) There exists a positive constant
such that given any number of independent mean-zero subexponential random variables
(Khinchine inequality) (Exercise 2.6.5 [2]) There exists a positive constant
such that given any number of independent mean-zero variance-one subgaussian random variables
The Hanson-Wright inequality states that if a random vector
A weak version of the following theorem was proved in (Hanson, Wright, 1971).
The purpose is to take a subgaussian vector and uniformly bound its quadratic forms.
has its tail uniformly bounded by an exponential, or a gaussian, whichever is larger.
It is a mathematical constant much like pi and e. Theorem (subgaussian concentration).