Here we list concentration inequalities for scalar-valued random variables under various light-tailed conditions (which typically means that the MGF exists in some neighborhood). This is contrast to heavy-tailed scalar concentration, where we assume only a few moments.

For rvs let .

Many of these concentration inequalities are proven using corresponding exponential inequalities. There are also time-uniform versions of these inequalities which stem from the sub-psi process given by the corresponding exponential inequality.

Bounded random variables

Hoeffding bound

Let be independent with . Hoeffding showed that

The natural one sided versions also exist. This is generalized by McDiarmid’s inequality (bounded difference inequalities).

Bennett’s inequality

Hoeffding’s bound doesn’t use any information beside boundedness of the observations. It therefore must (implicitly) assume a worse case bound on the variance. If we know a bound on the variance, we can do better. Both Bennett’s inequality and Bernstein’s inequality use such information to tighten the bound.

Let be independent with finite variance and one-sided boundedness, i.e., for some . If and then

where . If we assume that then we can get a bound on . This trend of first presenting a result using only the boundedness of observations and then giving a variance-adaptive result is a common one, see from boundedness to variance adaptivity.

If the have conditional mean , then we can replace with where is the expectation conditional on the first observations.

Bernstein bound 1

There are several bounds that go under the name “Bernstein bound”. The most common is perhaps a relaxation of Bennett’s bound and uses that

to show that

where and are as above. Both Bennett’s bound and Bernstein’s bound are proven using the Chernoff method.

There are two regimes in Bernstein’s bound: A sub-Gaussian regime where the tail decays at a sub-Gaussian rate, and a sub-exponential regime where it decays at a sub-exponential rate. The former occurs when the contribution of is small, so the tail decays as . When is large, the bound decays as .

Empirical Bernstein bounds

Empirical Bernstein bounds replace the oracle variance with an estimated variance. This is useful because the true variance is not always known. These bounds deserve their own page: empirical Bernstein bounds.

todo non-bounded stuff

References

  • Concentration inequalities by Boucheron, Lugosi, and Massart.