Bounded distributions
Bernstein bounds
Bounds in : Let obey and . Let . Using the martingale-variance inequality martingale concentration:Variance bound, we obtain
for . A bound of this form was first given by David Gross here.
Note that a weaker form of this bound can be obtained by appealing to the Azuma-Hoeffding inequality (martingale concentration:Azuma-Hoeffding inequality). The difference is equivalent to the difference between the Hoeffding bound and the Bernstein/Bennett bound in the scalar case (see bounded scalar concentration).
Bounds in more general spaces: There are dimension-free Bernstein bounds that hold in Banach spaces: see concentration in Banach spaces. There are also dimension-free empirical Bernstein bounds, see empirical Bernstein bounds.
Sub-Gaussian distributions
See sub-Gaussian distributions distributions for definitions in multivariate settings.
Sub-Gaussian coordinates: If are independent and sub-Gaussian with , then the norm is concentrated around . In particular, for some ,
where is the maximum sub-Gaussian Orlicz norm.
Sub-Gaussian random vectors. Hsu et al (2012) give state-of-the-art sub-Gaussian concentration. They show that for a fixed matrix , if is -sub-Gaussian with mean 0, then with probability ,
where . This can be translated into a sub-Gaussian as follows: If is -subGaussian, then is -subGaussian. Therefore, the above gives a bound for -subGaussian where can be decomposed as . For such vectors, we get the bound
There is also a time-uniform bound that allows for martingale dependence which is nearly as tight, but not quite. See this paper for details. For where is conditionally -subGaussian (and still mean 0) we have
where is a predictable sequence. When instantiated with the optimal and we get the rate
This can be made into a bound with optimal iterated logarithm dependence (laws of the iterated logarithm) using stitching (stitching for LIL rates). At a fixed time , the optimal choice of and give the bound
which we can see is worse than Hsu et al’s bound above, since (but not by the leading factor, so the asymptotics are the same). This bound is proved with the variational approach to concentration.
Sub- distributions
Using confidence sequences for convex functionals, Manole and Ramdas, give a bound for iid data with mean that obeys
Note this is a multivariate sub-psi process with . They show that with probability , for all ,
where is the covering number of the dual norm, and is the convex conjugate of . For isotropic sub-exponential distributions, this gives a bound that scales as
Other conditions
Log-concave distributions. This comes from this paper and is proved with the variational approach to concentration. Let be conditionally log-concave, meaning that for all ,
for all , some and PSD . Let . Then with probability , for all ,
where is any predictable sequence taking values in . For a fixed-time , taking gives a width bounded by . In the sequential setting, taking gives a width of
Stitching can be used to get LIL rates (stitching for LIL rates).