Suppose we have a stochastic process evolving in some space. What techniques are available for bounding ? If , then we usually argue about and and take a union bound to get a bound on (see eg bounded scalar concentration and light-tailed, unbounded scalar concentration). However, in higher-dimensional spaces (in , , say) a union bound doesn’t suffice: there are infinitely many directions one has to worry about.

There are several techniques for dealing with this problem.

Working directly with the norm

There are roughly two approaches that I’m aware of: The Doob decomposition approach and the Pinelis approach.

The idea for the first is to form a Doob martingale from the increments of the process and then apply well-known martingale bounds. Set

So is a martingale with . Let , be the increments of the process. Various martingale concentration bounds are stated as conditions on the increments of the process. For instance:

The Pinelis approach to concentration is another method which works directly with the norm. We might also label this the Yurinskii approach, since he does something similar in an earlier 1976 paper.

For multivariate light-tailed concentration of sub-Gaussian distributions, the technique of Hsu et al is another example of working directly with the norm. Their technique seems specific to sub-Gaussian distributions however; it doesn’t seem to constitute a general approach.

Covering arguments

See concentration via covering for an overview. The idea is decompose into where the supremum is over the unit ball, cover the the unit ball with an finite net, bound for each in this finite set, and then take a union bound. You suffer the covering number (covering and packing) of the underlying space in the bound.

This means covering arguments lead to dimension-dependent bounds because covering numbers will depend geometrically on the underlying space. Obviously, you also have to be able to calculate the covering number.

Chaining

This also works by decomposing . See chaining for a more detailed overview. Chaining provides high probability guarantees of the form

where

and are a selected sequence of sets which slowly “approximate” . The difference between generic chaining and Dudley chaining is how they choose to handle . The latter typically bounds it using covering numbers and metric entropy.

The variational approach

See variational approach to concentration. The idea is to use PAC-Bayes inequalities which are high probability guarantees over all probabilities measures into a simultaneous guarantee over all directions.

In some scenarios they allow you to escape-dimension dependence (eg for sub-Gaussian random vectors) which neither chaining nor covering arguments allow you to do.

Reading