The Chernoff method, also called the Cramer-Chernoff method, applies Markov’s inequality to the nonnegative random variable , and then chooses to optimize the bound. This often results in tighter bounds and concentration inequalities than working with the random variable directly.
In particular, for a random variable , Markov’s inequality gives that
This holds for all , so we obtain the bound
where is the MGF of . If we place distributional assumptions on (eg bounded, sub-Gaussianity), then we can solve for the optimal value of . If is the sum of iid observations, then . This makes the Chernoff particularly attractive under independence assumptions, even though it is not optimal (see below).
The Chernoff method is often written in terms of the Cramer transform of (which is why it’s often called the Cramer-Chernoff method), which is
This is also often called the Fenchel-Legendre transform. This gives,
which leads us to studying and bounding the function .
Optimality
The Chernoff method is very convenient hence widely deployed, but it’s not optimal. We can see this in the case of Hoeffding’s bound, which is missing a factor of in the exponent from what we expect from the CLT. (See The missing factor in Hoeffding’s inequalities by Talagrand).
More generally, the method of moments for concentration beats the Chernoff bound (when all moments exist), and the optimization approach to concentration (concentration via convex optimization) will give the tightest results if the convex program can be solved. See also interpolating between Markov and Chernoff, which is tighter than the Chernoff method but also less computationally convenient.
However, the Chernoff method is optimal is one sense. If are iid mean zero, , and for all ,
then
That, we have a bound which uses the product structure of the observations (hence the power of on ), then the Chernoff method is optimal.