Chaining is a generic method of generating maximal inequalities on the suprema of random processes.
Suppose we have a collection of random variables indexed by some set which we assume is a centered (zero-mean) sub-Gaussian process in the sense that
for some metric . The ultimate goal is to bound but this is done via bounding
If is mean-zero, then for any so we often focus on the quantity .
Overview
The idea is to decompose as
where and for all large enough (so that the sum is really a finite sum). This is the “chain”. Here is a map from where
So is the approximation of within . We enforce that is finite, so that union bound type arguments will apply. In particular we set , which looks ridiculous I know, but we will want exponential growth in log-space. We let be the closest point to in , so that
The intuition is the following:
- Bound using that is a sub-Gaussian process.
- Union bound over all elements of (both are finite).
- Union bound over all (again finite).
- Use the chaining decomposition above to provide a bound on .
Chapter 2.2 in Talagrand’s book below provides the details. We end up with a bound of the form:
where
where we’ve used the triangle inequality . This gives .
There are two main ways of dealing with . One is generic chaining, and one is Dudley chaining. The latter is looser than the former, but sometimes easier to compute.
References
- Upper and lower bounds for stochastic processes, Talagrand.