Techniques for Multivariate Concentration

Suppose we have a stochastic process $(S_{t})$ evolving in some space. What techniques are available for bounding $∥ S_{t} ∥$ ? If $S_{t} \in R$ , then we usually argue about $S_{t}$ and $- S_{t}$ and take a union bound to get a bound on $∣ S_{t} ∣$ (see eg bounded scalar concentration and light-tailed, unbounded scalar concentration). However, in higher-dimensional spaces (in $R^{d}$ , $d \geq 2$ , say) a union bound doesn’t suffice: there are infinitely many directions one has to worry about.

There are several techniques for dealing with this problem.

Working directly with the norm

The idea is to form a Doob martingale from the increments of the process and then apply well-known martingale bounds. Set

Z_{t} = E [∥ S_{n} ∥ ∣ F_{t}] - E [∥ S_{n} ∥] .

So $(Z_{t})$ is a martingale with $Z_{0} = 0$ . Let $D_{t} = Z_{t} - Z_{t - 1}$ , be the increments of the process. Various martingale concentration bounds are stated as conditions on the increments of the process. For instance:

If $S_{t}$ is bounded for each $t$ , then we can use the Azuma-Hoeffding inequality (martingale concentration:Azuma-Hoeffding inequality)
If a bound on the variance of $D_{t}$ is known, then we can use the equivalent of a Bernstein inequality (martingale concentration:Variance bound).

The Pinelis approach to concentration is another method which works directly with the norm.

Covering arguments

See concentration via covering for an overview. The idea is to bound cover the unit ball with an finite net, and then take a union bound. You suffer the covering number (covering and packing) of the underlying space in the bound.

This means covering arguments lead to dimension-dependent bounds because covering numbers will depend geometrically on the underlying space. You also have to be able to calculate the covering number.

Chaining

See chaining for an overview of chaining. Chaining provides high probability guarantees of the form

P (θ \in S^{d - 1} sup ⟨ θ, S_{n} ⟩ \geq tS) ≲ exp (- t^{2}),

where

S = θ \in S^{d - 1} sup n \leq N \sum 2^{n /2} ρ (θ, Θ_{n}),

and $Θ_{0} \subset Θ_{1} \subset \dots \subset Θ_{N}$ are a selected sequence of sets which slowly “approximate” $Θ = S^{d - 1}$ . The difference between generic chaining and Dudley chaining is how they choose to handle $S$ . The latter typically bounds it using covering numbers and metric entropy.

The variational approach

See variational approach to concentration. The idea is to use PAC-Bayes inequalities which are high probability guarantees over all probabilities measures into a simultaneous guarantee over all directions.

In some scenarios they allow you to escape-dimension dependence (eg for sub-Gaussian random vectors) which neither chaining nor covering arguments allow you to do.

The Stats Map

Explore

techniques for multivariate concentration

Working directly with the norm

Covering arguments

Chaining

The variational approach

Table of Contents

Graph View

Backlinks

Explore