Sub-Gaussian distributions are a large class of distributions used in a wide variety of applications. Intuitively, they behave “like or better than” Gaussian distributions, meaning their tails decay at the same rate or faster than Gaussian tails. Consequently, sub-Gaussian random variables are nice to work with.

Some facts about this class:

Sub-Gaussian distributions are a nonparametric class. There is no common dominating measure: it contains both discrete and continuous random variables.
If told $X$ is $σ$ -sub-Gaussian (see below), can’t estimate $σ$ from data. (Similar to how one can’t estimate the bounds for a bounded random variable).
There is no test-martingale for this class
They specify an e-value.

Scalar case

There are several equivalent ways to define them. The easiest way is to place a specific bound on the MGF. A real-valued random variable $X$ with mean $μ$ is $σ$ -sub-Gaussian if, for all $λ \in R$ ,

E exp (λ (X - μ)) \leq exp (λ^{2} σ^{2} /2) .

Equivalent definitions are:

The Orlicz norm $∥ X ∥_{ψ_{2}} = in f {u > 0 E exp (∣ X ∣^{2} / u^{2}) \leq 2}$ is finite.
$X^{2}$ has as sub-exponential distribution.
$P (∣ X ∣ \geq t) \leq 2 exp (- t^{2} / K^{2})$ for some $K$ . (Note the $t^{2}$ which makes it have Gaussian-like behavior, whereas sub-exponential distributions are linear in $t$ ).

An e-value for $σ$ -sub-Gaussian distributions is $exp {λ (X - μ) - λ^{2} σ^{2} /2}$ which follows immediately from the definition above. The product of such e-values naturally gives a nonnegative supermartingale which gives rise to confidence sequences for sub-Gaussian random variables.

Multivariate case

If $X$ is a random vector, we say it is $Σ$ -sub-Gaussian for some PSD $Σ$ if

E exp (λ ⟨ ν, X - μ ⟩) \leq exp (\frac{λ ^{2}}{2} ⟨ ν, Σ ν ⟩),

for all $ν \in S^{d - 1}$ . In the isotropic case (see isotropic distributions), we have $Σ = σ^{2} I$ for some $σ$ , and the condition above becomes

E exp (λ ⟨ ν, X - μ ⟩) \leq exp (\frac{λ ^{2} σ ^{2}}{2}),

which is perhaps the more common definition in the literature. But if we want to allow for anisotropy (see anisotropic distribution) then the first definition is preferred.

Note that this definition extends easily to infinite dimensional spaces as long as they are endowed with an inner product. Eg, in an infinite-dimensional Hilbert space, $Σ$ is interpreted as an operator $Σ : H \to H$ , and we work with the dot product $⟨ \cdot, \cdot ⟩_{H}$ .

References

Nice expository note: https://www.stat.cmu.edu/~arinaldo/36788/subgaussians.pdf

The Stats Map

Explore

sub-Gaussian distributions

Scalar case

Multivariate case

References

Table of Contents

Graph View

Backlinks

Explore