Here we list concentration inequalities for scalar-valued random variables that are bounded with probability 1 (sometimes only bounded on one side). This in contrast to light-tailed, unbounded scalar concentration which does not assume boundedness and heavy-tailed scalar concentration, where we assume only a few moments.
Many of these concentration inequalities are proven using corresponding exponential inequalities. There are also time-uniform versions of many of these inequalities which stem from the sub-psi process given by the corresponding exponential inequality.
Bounded random variables
Hoeffding’s bound
Let be independent and set . Suppose . Hoeffding showed that, for any ,
The natural two sided version also exists. This is proved with the Chernoff method. Hoeffding’s bound is generalized by McDiarmid’s inequality (bounded difference inequalities).
Note that this is suboptimal. To see this, note that is roughly by the Lindeberg-Levy CLT, and . So there is a missing factor of in the bound. This is well-known and is called the missing factor in Hoeffding’s bound. Hoeffding’s bound can therefore be improved with a more careful analysis (see Talagrand’s bound below).
Hoeffding’s bound is also suboptimal in a more straightforward way: If we take for , then but the bound does not capture this behavior.
Bennett’s inequality
Hoeffding’s bound doesn’t use any information beside boundedness of the observations. It therefore must (implicitly) assume a worse case bound on the variance. If we know a bound on the variance, we can do better. Both Bennett’s inequality and Bernstein’s inequality use such information to tighten the bound.
Let be independent with finite variance and one-sided boundedness, i.e., for some . If and then
where . If we assume that then we can get a bound on . This trend of first presenting a result using only the boundedness of observations and then giving a variance-adaptive result is a common one, see from boundedness to variance adaptivity.
If the have conditional mean , then we can replace with where is the expectation conditional on the first observations.
Like Hoeffding’s bound, Bennett’s inequality also does not recover the missing factor of that we expect from the central limit theorem. This is because it is also based on the Chernoff method, but bounding the exponential function a little differently than does Hoeffding’s bound to take advantage of the variance information.
Bernstein bound 1
There are several bounds that go under the name “Bernstein bound”. The most common is perhaps a relaxation of Bennett’s bound and uses that
to show that
where and are as above. We can of course obtain two-sided versions of Bennett’s and Bernstein’s bound if we assume two-sided boundedness (), apply the bound twice and use a union bound.
There are two regimes in Bernstein’s bound: A sub-Gaussian regime where the tail decays at a sub-Gaussian rate, and a sub-exponential regime where it decays at a sub-exponential rate. The former occurs when the contribution of is small relative to so the tail decays as . When is large, the bound decays as . Therefore, if we have good a priori knowledge of small variance, Bernstein’s (and Bennett’s) can be a big improvement over Hoeffding, scaling as instead of .
Hoeffding’s bound remastered
It is somewhat of a shame that Hoeffding’s name came to be associated with the first bound above, since this is much weaker than the following more general result that he proved in his famous 1963 paper, which recovers both Bennett’s and Bernstein’s bounds.
If are iid with variance and lying in , then
where denotes the binary entropy function: . If we interpret as infinity, meaning the bound cuts off at that point. Thus, this bound circumvents one of the drawbacks mentioned above.
This is the sharpest bound using the Chernoff method. We can recover the first Hoeffding bound by assuming worst-case variance.
Bentkus’ inequality
Empirical Bernstein bounds
Empirical Bernstein bounds replace the oracle variance with an estimated variance. This is useful because the true variance is not always known. These bounds deserve their own page: empirical Bernstein bounds.
References
- Concentration inequalities by Boucheron, Lugosi, and Massart.