A *Bernstein* bound is a kind of concentration inequality that takes advantage of the variance of the distribution. They are usually part of a progression of bounds for random variables with bounded range. See from boundedness to variance adaptivity.

*Empirical-Bernstein bounds* replace knowledge of the variance with data-driven estimates of the variance. This is useful since we often don’t know what the variance is. Ideally, such bounds can be shown to converge in the limit to a bound with width depending on the true variance — i.e., they scale asymptotically with the true variance.

Some examples:

# Scalar-random variables

The first empirical Bernstein bound was given by Maurer and Pontil in 2009. A second, tighter bound, was given by Waudby-Smith and Ramdas using the betting approach to concentration (see estimating means by betting). These are very reminiscent of Bennett’s bound (estimating means by betting). These are very reminiscent of Bennett’s bound (light-tailed scalar concentration:Bennett’s bound), but has a data-driven variance term.

Let $X_{1},X_{2},,…,X_{n}$ be iid random variables in $[0,1]$. The Maurer and Pontil bound reads that with probability $1−δ$,

$ n1 i∑ X_{i}−EX_{1} ≤n2σ_{n}g(2/δ) +3(n−1)7g(1/δ) ,$where $σ_{n}=n(n−1)1 ∑_{i<j}(X_{i}−X_{j})_{2}$ is the sample variance. The width of the bound, $W_{n}$ (the RHS above), has the limiting behavior $n W_{n}→σ2g(4/α) $.

For the same setting, the bound by Waudby-Smith and Ramdas gives that for any $λ_{n}>0$, with probability $1−δ$,

$ n1 i∑ X_{i}−EX_{1} ≤λ_{n}ng(1/α)+ψ_{E}(λ_{n})∑_{i≤n}v_{i} ,$for $v_{i}−μ _{i−1}$ where $μ _{i−1}$ is *any* function based on $X_{1},…,X_{i−1}$ (though one should think of it as roughly $i1 ∑_{j≤i}X_{i}$). For the appropriate choice of $λ_{n}$ the width scales as $n W_{n}→σ2g(2/α) $, which is optimal and has the same asymptotic variance as the Hoeffding bound (light-tailed scalar concentration:Hoeffding bound).

Finally, betting techniques can yield other empirical-Bernstein bounds depending on the choice of predictable plug-in. These do not have closed-form solutions, but are often tighter. See estimating means by betting.

# Random vectors

Using elements of the Pinelis approach to concentration in addition to some game-theoretic techniques (game-theoretic statistics), Martinez-Taboada and Ramdas gave a time-uniform empirical Bernstein bound in (2,$D$)-smooth Banach spaces. They show that, for $X_{1},X_{2},…$ with conditional mean $μ$ with $∣X_{i}∣≤1/2$. Then, with probability $1−α$, simultaneously for all $t$:

$ ∑_{i≤t}λ_{i}∑_{i≤t}λ_{i}X_{i} −μ ≤D∑_{i≤t}λ_{i}21 ∑_{i≤t}ψ_{E}(λ_{i})∣X_{i}−μ _{i−1}∣_{2}+2g(1/α) .$where $μ _{t}=∑_{i≤t}λ_{i}X_{i}/∑_{i≤t}λ_{i}$ and $ψ_{E}(λ)=−g(1−λ)−λ$ and $(λ_{t})$ is a predictable sequence with values in $(0,0.8]$.

We also give an empirical-Bernstein bound for random vectors in $R_{d}$ in Time-uniform confidence spheres for means of random vectors using the variational approach to concentration but it has explicit dependence on the dimension.

# Random matrices

Wang and Ramdas develop an empirical Bernstein bound for random matrices.todo