Bounded distributions

Bernstein bounds

Bounds in $R^{d}$ : Let $X_{1}, \dots, X_{n} \in R^{d}$ obey $∥ X_{t} ∥ \leq c_{t}$ and $E [∥ X_{t} ∥^{2}] \leq v_{t}^{2}$ . Let $V_{n} = \sum_{i \leq n} v_{i}^{2}$ . Using the martingale-variance inequality martingale concentration:Variance bound, we obtain

P (∥ S_{n} ∥ \geq V_{n} + ϵ) \geq exp (\frac{- ϵ ^{2}}{4 V _{n}}),

for $ϵ \leq V / max_{t} c_{t}$ . A bound of this form was first given by David Gross here.

Note that a weaker form of this bound can be obtained by appealing to the Azuma-Hoeffding inequality (martingale concentration:Azuma-Hoeffding inequality). The difference is equivalent to the difference between the Hoeffding bound and the Bernstein/Bennett bound in the scalar case (see bounded scalar concentration).

Bounds in more general spaces: There are dimension-free Bernstein bounds that hold in Banach spaces: see concentration in Banach spaces. There are also dimension-free empirical Bernstein bounds, see empirical Bernstein bounds.

Sub-Gaussian distributions

See sub-Gaussian distributions distributions for definitions in multivariate settings.

Sub-Gaussian coordinates: If $X_{1}, \dots, X_{n}$ are independent and sub-Gaussian with $E X_{i}^{2} = 1$ , then the norm $∥ X ∥ = ∥ (X_{1}, \dots, X_{n}) ∥$ is concentrated around $n$ . In particular, for some $c > 0$ ,

P (∣∥ X ∥_{2} - n ∣ \geq u) \leq 2 exp (- c u^{2} / K^{4}),

where $K = max_{i} ∥ X_{i} ∥_{ψ_{2}}$ is the maximum sub-Gaussian Orlicz norm.

Sub-Gaussian random vectors. Hsu et al (2012) give state-of-the-art sub-Gaussian concentration. They show that for a fixed matrix $A$ , if $X$ is $I$ -sub-Gaussian with mean 0, then with probability $1 - δ$ ,

∥ A X ∥^{2} \leq Tr (Σ) + 2∥Σ∥ u + 2 Tr (Σ^{2}) u,

where $u = lo g (1/ δ)$ . This can be translated into a sub-Gaussian as follows: If $X$ is $I$ -subGaussian, then $A X$ is $A^{t} A$ -subGaussian. Therefore, the above gives a bound for $Σ$ -subGaussian $X$ where $Σ$ can be decomposed as $Σ = A^{t} A$ . For $n$ such vectors, we get the bound

∥ \overline{X}_{n} ∥^{2} \leq \frac{Tr ( Σ ) + 2∥Σ∥ u + 2 Tr ( Σ ^{2} ) u}{n} .

There is also a time-uniform bound that allows for martingale dependence which is nearly as tight, but not quite. See this paper for details. For $X_{1}, X_{2}, \dots$ where $X_{i}$ is conditionally $Σ$ -subGaussian (and still mean 0) we have

\frac{\sum _{i \leq t} λ _{i} X _{i}}{\sum _{i \leq t} λ _{i}} \leq \frac{\sum _{i \leq t} ψ _{N} ( λ _{i} ) ( ∥ Σ ∥ + β ^{- 1} Tr ( Σ )) + β /2 + lo g ( 1/ α )}{\sum _{i \leq t} λ _{i}},

where $(λ_{i})$ is a predictable sequence. When instantiated with the optimal $λ_{i}$ and $β$ we get the rate

O (Tr (Σ) + ∥ Σ ∥ lo g (1/ α) \frac{lo g t}{t}) .

This can be made into a bound with optimal iterated logarithm dependence (laws of the iterated logarithm) using stitching (stitching for LIL rates). At a fixed time $n$ , the optimal choice of $β$ and $λ$ give the bound

∥ \overline{X}_{n} ∥^{2} \leq \frac{Tr ( Σ ) + 2∥Σ∥ u + 2 2∥Σ∥ Tr ( Σ ) u}{n},

which we can see is worse than Hsu et al’s bound above, since $∥Σ∥ Tr (Σ) \geq Tr (Σ^{2})$ (but not by the leading factor, so the asymptotics are the same). This bound is proved with the variational approach to concentration.

Sub- $ψ$ distributions

Using confidence sequences for convex functionals, Manole and Ramdas, give a bound for iid data $(X_{t})$ with mean $μ$ that obeys

E exp (λ ⟨ θ, X - μ ⟩) \leq exp (ψ (λ)), for all θ \in S^{d - 1} .

Note this is a multivariate sub-psi process with $V_{t} = I$ . They show that with probability $1 - δ$ , for all $t$ ,

∥ \overline{X}_{t} - μ ∥ \leq 2 (ψ^{*})^{- 1} (\frac{lo g ℓ ( lo g _{2} t ) + lo g ( 1/ δ ) + lo g N _{2}}{⌈ t /2 ⌉}),

where $N_{γ}$ is the covering number of the dual norm, and $ψ^{*}$ is the convex conjugate of $ψ$ . For isotropic sub-exponential distributions, this gives a bound that scales as

O (\frac{lo g lo g ( t ) + lo g ( 1/ α ) + d}{t}) .

Other conditions

Log-concave distributions. This comes from this paper and is proved with the variational approach to concentration. Let $(X_{t})_{t \geq 1}$ be conditionally log-concave, meaning that for all $t \geq 1$ ,

∥ ⟨ θ, X_{t} - E [X_{t} ∣ F_{t - 1}]⟩ ∥_{ψ_{1}} \leq C ⟨ θ, Σ θ ⟩,

for all $θ \in R^{d}$ , some $C > 0$ and PSD $Σ$ . Let $f_{α} (u) = Tr (Σ) u + u ∥ Σ ∥$ . Then with probability $1 - α$ , for all $t \geq 1$ ,

\frac{\sum _{i \leq t} λ _{i} X _{i}}{\sum _{i \leq t} λ _{i}} - μ \leq \frac{2 C f _{α} ( 1 ) \sum _{i \leq t} λ _{i}^{2} + 4 C f _{α} ( lo g ( 2/ α ))}{\sum _{i \leq t} λ _{i}},

where $(λ_{t})$ is any predictable sequence taking values in $(0, 1)$ . For a fixed-time $n$ , taking $λ = \frac{2 f _{α} ( l o g ( 2/ α ))}{n f _{α} ( 1 )}$ gives a width bounded by $8 C lo g (1/ α) Tr (Σ) / n$ . In the sequential setting, taking $λ_{t} = \frac{2 f _{α} ( l o g ( 2/ α ))}{t l o g ( t + 1 ) f _{α} ( 1 )}$ gives a width of

O (\frac{Tr ( Σ ) lo g ( 1/ α ) lo g ( t )}{t}) .

Stitching can be used to get LIL rates (stitching for LIL rates).

The Stats Map

Explore

multivariate light-tailed concentration

Bounded distributions

Bernstein bounds

Sub-Gaussian distributions

Sub- $ψ$ distributions

Other conditions

Table of Contents

Graph View

Backlinks

Explore

The Stats Map

Explore

multivariate light-tailed concentration

Bounded distributions

Bernstein bounds

Sub-Gaussian distributions

Sub-ψ distributions

Other conditions

Table of Contents

Graph View

Backlinks

Explore

Sub- $ψ$ distributions