The Chernoff method, also called the Cramer-Chernoff method, applies Markov’s inequality to the nonnegative random variable $e_{λX}$, and then chooses $λ$ to optimize the bound. This often results in tighter bounds and concentration inequalities than working with the random variable directly.

In particular, for a random variable $X$, Markov’s inequality gives that

$P(X≥t)=P(e_{λX}≥e_{λt})≤e_{−λt}E[e_{λX}].$This holds for all $λ>0$, so we obtain the bound

$P(X≥t)≤λ>0f e_{−λt}E[e_{λX}]=λ>0f e_{−λt}M_{X}(λ),$where $M_{X}$ is the MGF of $X$. If we place distributional assumptions on $X$ (eg bounded, sub-Gaussianity), then we can solve for the optimal value of $λ$. If $X=∑_{i≤n}X_{i}$ is the sum of iid observations, then $E[e_{λX}]=∏_{i}E[e_{λX_{i}}]=nE[e_{λX_{1}}]$. This makes the Chernoff particularly attractive under independence assumptions, even though it is not optimal (see below).

The Chernoff method is often written in terms of the Cramer transform of $X$ (which is why it’s often called the Cramer-Chernoff method), which is

$ψ_{X}(t)=λ≥0sup (λt−ψ_{X}(λ)),whereψ_{X}(λ)=gEe_{λZ}.$This is also often called the Fenchel-Legendre transform. This gives,

$P(X≥t)≤exp(−ψ_{X}(t)),$which leads us to studying and bounding the function $ψ_{X}(t)$.

## Optimality

The Chernoff method is very convenient hence widely deployed, but it’s not optimal. We can see this in the case of Hoeffding’s bound, which is missing a factor of $1/t$ in the exponent from what we expect from the CLT. (See The missing factor in Hoeffding’s inequalities by Talagrand).

More generally, the method of moments for concentration beats the Chernoff bound (when all moments exist), and the optimization approach to concentration (concentration via convex optimization) will give the tightest results if the convex program can be solved. See also interpolating between Markov and Chernoff, which is tighter than the Chernoff method but also less computationally convenient.

However, the Chernoff method is optimal is one sense. If $X_{1},…,X_{n}$ are iid mean zero, $S_{n}=∑_{i}X_{i}$, and for all $t≥0$,

$P(S_{n}≥nt)≤Q_{n}(t),$then

$Q(t)≥λ≥0f E[exp(λ(X−t))].$That, we have a bound which *uses the product structure of the observations* (hence the power of $n$ on $Q(t)$), then the Chernoff method is optimal.