A Bernstein bound is a kind of concentration inequality that takes advantage of the variance of the distribution. They are usually part of a progression of bounds for random variables with bounded range. See from boundedness to variance adaptivity.

Empirical-Bernstein bounds replace knowledge of the variance with data-driven estimates of the variance. This is useful since we often don’t know what the variance is. Ideally, such bounds can be shown to converge in the limit to a bound with width depending on the true variance — i.e., they scale asymptotically with the true variance.

Some examples:

Scalar-random variables

The first empirical Bernstein bound was given by Maurer and Pontil in 2009. A second, tighter bound, was given by Waudby-Smith and Ramdas using the betting approach to concentration (see estimating means by betting). These are very reminiscent of Bennett’s bound (estimating means by betting). These are very reminiscent of Bennett’s bound (Bennett’s inequality), but has a data-driven variance term.

Let $X_{1}, X_{2},, \dots, X_{n}$ be iid random variables in $[0, 1]$ . The Maurer and Pontil bound reads that with probability $1 - δ$ ,

\frac{1}{n} i \sum X_{i} - E X_{1} \leq \frac{2 σ _{n} lo g ( 2/ δ )}{n} + \frac{7 lo g ( 1/ δ )}{3 ( n - 1 )},

where $σ_{n} = \frac{1}{n ( n - 1 )} \sum_{i < j} (X_{i} - X_{j})^{2}$ is the sample variance. The width of the bound, $W_{n}$ (the RHS above), has the limiting behavior $n W_{n} \to σ 2 lo g (4/ α)$ .

For the same setting, the bound by Waudby-Smith and Ramdas gives that for any $λ_{n} > 0$ , with probability $1 - δ$ ,

\frac{1}{n} i \sum X_{i} - E X_{1} \leq \frac{lo g ( 1/ α ) + ψ _{E} ( λ _{n} ) \sum _{i \leq n} v _{i}}{λ _{n} n},

for $v_{i} - μ_{i - 1}$ where $μ_{i - 1}$ is any function based on $X_{1}, \dots, X_{i - 1}$ (though one should think of it as roughly $\frac{1}{i} \sum_{j \leq i} X_{i}$ ). For the appropriate choice of $λ_{n}$ the width scales as $n W_{n} \to σ 2 lo g (2/ α)$ , which is optimal and has the same asymptotic variance as Hoeffding’s bound.

Finally, betting techniques can yield other empirical-Bernstein bounds depending on the choice of predictable plug-in. These do not have closed-form solutions, but are often tighter. See estimating means by betting.

Random vectors

Using elements of the Pinelis approach to concentration in addition to some game-theoretic techniques (game-theoretic statistics), Martinez-Taboada and Ramdas gave a time-uniform empirical Bernstein bound in (2, $D$ )-smooth Banach spaces. They show that, for $X_{1}, X_{2}, \dots$ with conditional mean $μ$ with $∥ X_{i} ∥ \leq 1/2$ . Then, with probability $1 - α$ , simultaneously for all $t$ :

\frac{\sum _{i \leq t} λ _{i} X _{i}}{\sum _{i \leq t} λ _{i}} - μ \leq D \frac{\frac{1}{2} \sum _{i \leq t} ψ _{E} ( λ _{i} ) ∥ X _{i} - μ _{i - 1} ∥ ^{2} + 2 lo g ( 1/ α )}{\sum _{i \leq t} λ _{i}} .

where $\overline{μ}_{t} = \sum_{i \leq t} λ_{i} X_{i} / \sum_{i \leq t} λ_{i}$ and $ψ_{E} (λ) = - lo g (1 - λ) - λ$ and $(λ_{t})$ is a predictable sequence with values in $(0, 0.8]$ .

We also give an empirical-Bernstein bound for random vectors in $R^{d}$ in Time-uniform confidence spheres for means of random vectors using the variational approach to concentration but it has explicit dependence on the dimension.

Random matrices

Wang and Ramdas develop an empirical Bernstein bound for random matrices.todo

The Stats Map

Explore

empirical Bernstein bounds

Scalar-random variables

Random vectors

Random matrices

Table of Contents

Graph View

Backlinks

Explore