Chaining is a generic method of generating maximal inequalities on the suprema of random processes.

Suppose we have a collection of random variables indexed by some set which we assume is a centered (zero-mean) sub-Gaussian process in the sense that

$P(∣X_{θ}−X_{ϕ}∣≥ϵ)≤2exp(−2ρ_{2}(θ,ϕ)ϵ_{2} ),$for some metric $ρ$. The ultimate goal is to bound $Esup_{θ}X_{θ}$ but this is done via bounding

$P(θsup X_{θ}≥ϵ).$If $X_{θ}$ is mean-zero, then $Esup_{θ}X_{θ}=Esup_{θ}(X_{θ}−X_{θ_{0}})$ for any $θ_{0}$ so we often focus on the quantity $X_{θ}−X_{θ_{0}}$.

# Overview

The idea is to decompose $X_{θ}−X_{θ_{0}}$ as

$X_{θ}−X_{θ_{0}}=n≥1∑ (X_{π_{n}(θ)}−X_{π_{n−1}(θ)}),$where $π_{0}(θ)=θ_{0}$ and $π_{N}(θ)=θ$ for all large enough $N$ (so that the sum is really a finite sum). This is the “chain”. Here $π_{n}$ is a map from $Θ→Θ_{n}$ where

${θ_{0}}=Θ_{0}⊂Θ_{1}⊂⋯⊂Θ_{N}=Θ.$So $θ_{n}(θ)⊂Θ_{n}$ is the approximation of $θ$ within $Θ_{n}$. We enforce that $Θ_{n}$ is finite, so that union bound type arguments will apply. In particular we set $∣Θ_{n}∣=2_{2_{n}}$, which looks ridiculous I know, but we will want exponential growth in log-space. We let $π_{n}(θ)$ be the closest point to $θ$ in $Θ_{n}$, so that

$ρ(θ,π_{n}(θ))=ϕ∈Θ_{n}f ρ(θ,ϕ).$The intuition is the following:

- Bound $P(∣X_{π_{n}(θ)}−X_{π_{n−1}(θ)}≥ϵ)$ using that $X_{θ}$ is a sub-Gaussian process.
- Union bound over all elements of $Θ_{n}×Θ_{n−1}$ (both are finite).
- Union bound over all $n≤N$ (again finite).
- Use the chaining decomposition above to provide a bound on $X_{θ}−X_{θ_{0}}$.

Chapter 2.2 in Talagrand’s book below provides the details. We end up with a bound of the form:

$P(θsup ∣X_{θ}−X_{θ_{0}}∣≥tS)≤2n≤N∑ 2_{2_{n+1}}exp(−t_{2}2_{n−1})≲exp(−t_{2}/2),$where

$S=θ∈Θsup n≤N∑ 2_{n/2}ρ(π_{n}(θ),π_{n−1}(θ))≲θ∈Θsup 0≤n≤N∑ 2_{n/2}ρ(θ,Θ_{n})=A,$where we’ve used the triangle inequality $ρ(π_{n}(θ),π_{n−1}(θ))≤ρ(π_{n}(θ),θ)+ρ(θ,π_{n−1}(θ))$. This gives $Esup_{θ}X_{θ}≲A$.

There are two main ways of dealing with $A$. One is generic chaining, and one is Dudley chaining. The latter is looser than the former, but sometimes easier to compute.

# References

*Upper and lower bounds for stochastic processes*, Talagrand.