Suppose we have a stochastic process $(S_{t})$ evolving in some space. What techniques are available for bounding $∣S_{t}∣$? If $S_{t}∈R$, then we usually argue about $S_{t}$ and $−S_{t}$ and take a union bound to get a bound on $∣S_{t}∣$ (see bounded scalar concentration and heavy-tailed scalar concentration). However, in higher-dimensional spaces (in $R_{d}$, $d≥2$, say) a union bound doesn’t suffice: there are infinitely many directions one has to worry about.

There are several techniques for dealing with this problem.

# Working directly with the norm

The idea is to form a Doob martingale from the increments of the process and then apply well-known martingale bounds. Set

$Z_{t}=E[∣S_{n}∣∣F_{t}]−E[∣S_{n}∣].$So $(Z_{t})$ is a martingale with $Z_{0}=0$. Let $D_{t}=Z_{t}−Z_{t−1}$, be the increments of the process. Various martingale concentration bounds are stated as conditions on the increments of the process. For instance:

- If $S_{t}$ is bounded for each $t$, then we can use the Azuma-Hoeffding inequality (martingale concentration:Azuma-Hoeffding inequality)
- If a bound on the variance of $D_{t}$ is known, then we can use the equivalent of a Bernstein inequality (martingale concentration:Variance bound).

The Pinelis approach to concentration is another method which works directly with the norm.

# Covering arguments

See concentration via covering for an overview. The idea is to bound cover the unit ball with an finite net, and then take a union bound. You suffer the covering number (covering and packing) of the underlying space in the bound.

This means covering arguments lead to dimension-dependent bounds because covering numbers will depend geometrically on the underlying space. You also have to be able to calculate the covering number.

# Chaining

See chaining for an overview of chaining. Chaining provides high probability guarantees of the form

$P(θ∈S_{d−1}sup ⟨θ,S_{n}⟩≥tS)≲exp(−t_{2}),$where

$S=θ∈S_{d−1}sup n≤N∑ 2_{n/2}ρ(θ,Θ_{n}),$and $Θ_{0}⊂Θ_{1}⊂⋯⊂Θ_{N}$ are a selected sequence of sets which slowly “approximate” $Θ=S_{d−1}$. The difference between generic chaining and Dudley chaining is how they choose to handle $S$. The latter typically bounds it using covering numbers and metric entropy.

# The variational approach

See variational approach to concentration. The idea is to use PAC-Bayes inequalities which are high probability guarantees over all probabilities measures into a simultaneous guarantee over all directions.

In some scenarios they allow you to escape-dimension dependence (eg for sub-Gaussian random vectors) which neither chaining nor covering arguments allow you to do.