Concentration in Banach Spaces

Pinelis (see his 1992 and 1994 papers) is responsible for both a Hoeffding and Bernstein bound in smooth and separable Banach spaces. At the core of his proof is the construction of a supermartingale (see Pinelis approach to concentration) his results can therefore be made time-uniform by applying Ville’s inequality instead of Markov’s inequality (though they’re usually stated as fixed-time bounds).

Chebyshev

The sample mean doesn’t necessarily concentrate in Banach spaces (see discussion in Banach space). One needs a smoothness assumption. But once such an assumption is added, we can get the following behavior of the sample mean. This result is from Mean Estimation in Banach Spaces Under Infinite Variance and Martingale Dependence.

Let $(X_{n})_{n \geq 1}$ be random variables in a $(2, β)$ -smooth Banach space which have conditional mean $μ$ and conditional $p$ -th moment, $p > 1$ , bounded by $v$ (i.e., $E [∥ X_{n} - μ ∥^{p} ∣ F_{n - 1}] \leq v$ ). Then, for any $δ \in (0, 1)$ and $n \geq 1$ , we have

P (∥ μ_{n} - μ ∥ \leq \frac{2 β v}{n ^{(p - 1) / p}}) \geq 1 - δ,

or equivalently,

P (∥ μ_{n} - μ ∥ \geq t) \leq \frac{v 2 ^{p} β ^{p}}{n ^{p - 1} t ^{p}} .

where $μ_{n} := n^{- 1} \sum_{m = 1}^{n} X_{m}$ denotes the usual sample mean. So we see that $β$ pops up in the bound.

Marcinkiewicz–Zygmund inequality

The proof of Chebyshev’s inequality above actually relies on a Marcinkiewicz–Zygmund-style inequality in smooth Banach spaces. Let $(X_{n})_{n \geq 1}$ be random variables in a $(2, β)$ -smooth Banach space such that $E_{n - 1} X_{n} = 0$ and $E ∥ X_{n} ∥^{p} < \infty$ for all $n \geq 0$ and some $p \in (1, 2]$ . Then, letting $(S_{n})_{n \geq 0}$ be $S_{n} := X_{1} + \dots + X_{n}$ , we have

E ∥ S_{n} ∥^{p} \leq 2^{p} β^{p} E (m = 1 \sum n ∥ X_{m} ∥^{2})^{p /2} \leq 2^{p} β^{p} m = 1 \sum n E ∥ X_{m} ∥^{p} .

Hoeffding

Consider a $(2, D)$ -smooth separable Banach space with norm $∥ \cdot ∥$ . Let $X_{1}, X_{2}, \dots$ have conditional mean $0$ with $∥ X_{t} ∥ \leq B$ . Then:

P (i \leq n \sum X_{i} \geq u) \leq 2 exp (- \frac{u ^{2}}{2 n D ^{2} B ^{2}}) .

Note the similarity to the usual Hoeffding bound (bounded scalar concentration) but with the extra factor of $D$ . This is because Hilbert spaces are $(2, 1)$ -smooth Banach spaces.

Bernstein

Consider a $(2, D)$ -smooth separable Banach space with norm $∥ \cdot ∥$ . Let $X_{1}, X_{2}, \dots$ have conditional mean $0$ with $∥ X_{t} ∥ \leq B$ . Then

P (i \leq n \sum X_{i} \geq u) \leq 2 exp - \frac{u ^{2} /2}{D ^{2} \sum _{i \leq n} E _{i - 1} ∥ X _{i} - μ ∥ ^{2} _{\infty} + u B /3} .

Like for Bernstein’s/Bennett’s inequality in the scalar setting (Bennett’s inequality), if the random variables are iid then $E_{i - 1} ∥ X_{i} - μ ∥^{2}$ is simply the variance.

Empirical-Bernstein

See discussion on the work of Martinez-Taboada and Ramdas in empirical Bernstein bounds:

Random vectors

Using elements of the Pinelis approach to concentration in addition to some game-theoretic techniques (game-theoretic statistics), Martinez-Taboada and Ramdas gave a time-uniform empirical Bernstein bound in (2, $D$ )-smooth Banach spaces. They show that, for $X_{1}, X_{2}, \dots$ with conditional mean $μ$ with $∥ X_{i} ∥ \leq 1/2$ . Then, with probability $1 - α$ , simultaneously for all $t$ :
$\frac{\sum _{i \leq t} λ _{i} X _{i}}{\sum _{i \leq t} λ _{i}} - μ \leq D \frac{\frac{1}{2} \sum _{i \leq t} ψ _{E} ( λ _{i} ) ∥ X _{i} - μ _{i - 1} ∥ ^{2} + 2 lo g ( 1/ α )}{\sum _{i \leq t} λ _{i}} .$
where $\overline{μ}_{t} = \sum_{i \leq t} λ_{i} X_{i} / \sum_{i \leq t} λ_{i}$ and $ψ_{E} (λ) = - lo g (1 - λ) - λ$ and $(λ_{t})$ is a predictable sequence with values in $(0, 0.8]$ .

We also give an empirical-Bernstein bound for random vectors in $R^{d}$ in Time-uniform confidence spheres for means of random vectors using the variational approach to concentration but it has explicit dependence on the dimension.
Link to original

Heavy-tailed concentration

In 2015, Minsker proposed the geometric median-of-means for Banach spaces. This is a general method for boosting weak (polynomial rate) estimators into an estimator with exponential rate. The idea is similar to the Lugosi-Mendelson median-of-means (see multivariate heavy-tailed mean estimation), but the weak estimators are aggregated using the geometric median. This can be computed in polynomial time (in $R^{d}$ ) using Weisfeld’s algorithm, since the objective is convex. This estimator was proposed simultaneously by Hsu and Sabato.

In 2022, Yun and Park extended geometric median-of-means to Polish spaces, which include separable Banach spaces. They seem to get the same rates as Minsker, which are not quite sub-Gaussian in $R^{d}$ .

Whitehouse et al. (2024) propose extending Catoni and Giulini’s truncation-based estimator (truncation-based estimators) to Banach spaces. This estimator can handle infinite variance.

The Stats Map

Explore

concentration in Banach spaces

Chebyshev

Marcinkiewicz–Zygmund inequality

Hoeffding

Bernstein

Empirical-Bernstein

Random vectors

Heavy-tailed concentration

Table of Contents

Graph View

Backlinks

Explore