Bounded distributions

Bernstein bounds

Bounds in : Let obey and . Let . Using the martingale-variance inequality martingale concentration:Variance bound, we obtain

for . A bound of this form was first given by David Gross here.

Note that a weaker form of this bound can be obtained by appealing to the Azuma-Hoeffding inequality (martingale concentration:Azuma-Hoeffding inequality). The difference is equivalent to the difference between the Hoeffding bound and the Bernstein/Bennett bound in the scalar case (see bounded scalar concentration).

Bounds in more general spaces: There are dimension-free Bernstein bounds that hold in Banach spaces: see concentration in Banach spaces. There are also dimension-free empirical Bernstein bounds, see empirical Bernstein bounds.

Sub-Gaussian distributions

See sub-Gaussian distributions distributions for definitions in multivariate settings.

Sub-Gaussian coordinates: If are independent and sub-Gaussian with , then the norm is concentrated around . In particular, for some ,

where is the maximum sub-Gaussian Orlicz norm.

Sub-Gaussian random vectors. Hsu et al (2012) give state-of-the-art sub-Gaussian concentration. They show that for a fixed matrix , if is -sub-Gaussian with mean 0, then with probability ,

where . This can be translated into a sub-Gaussian as follows: If is -subGaussian, then is -subGaussian. Therefore, the above gives a bound for -subGaussian where can be decomposed as . For such vectors, we get the bound

There is also a time-uniform bound that allows for martingale dependence which is nearly as tight, but not quite. See this paper for details. For where is conditionally -subGaussian (and still mean 0) we have

where is a predictable sequence. When instantiated with the optimal and we get the rate

This can be made into a bound with optimal iterated logarithm dependence (laws of the iterated logarithm) using stitching (stitching for LIL rates). At a fixed time , the optimal choice of and give the bound

which we can see is worse than Hsu et al’s bound above, since (but not by the leading factor, so the asymptotics are the same). This bound is proved with the variational approach to concentration.

Sub- distributions

Using confidence sequences for convex functionals, Manole and Ramdas, give a bound for iid data with mean that obeys

Note this is a multivariate sub-psi process with . They show that with probability , for all ,

where is the covering number of the dual norm, and is the convex conjugate of . For isotropic sub-exponential distributions, this gives a bound that scales as

Other conditions

Log-concave distributions. This comes from this paper and is proved with the variational approach to concentration. Let be conditionally log-concave, meaning that for all ,

for all , some and PSD . Let . Then with probability , for all ,

where is any predictable sequence taking values in . For a fixed-time , taking gives a width bounded by . In the sequential setting, taking gives a width of

Stitching can be used to get LIL rates (stitching for LIL rates).