Some basic probabilistic inequalities. Used mostly under the hood to build more sophisticated concentration inequalities, but also useful in other scenarios.
Markov’s inequality
For a nonnegative random variable and parameter , Markov’s inequality states that
Clearly the condition that is necessary: consider any random variable which has negative mean but some positive mass.
The proof of Markov’s is easy:
Markov’s inequality is tight in the sense that there are distributions for which the inequality holds with equality. But it is not very interesting in the sense that it doesn’t provide interesting concentration behavior. However, it’s a cornerstone in more sophisticated concentration techniques, which often rely on applying Markov’s inequality to the random variable
(See the Chernoff method below, which is precisely this strategy.)
You can strengthen Markov by writing
for any (not necessarily nonnegative) random variable . To prove this just apply expectations to the inequality which holds for all and all .
Some notes and pointers:
- For achievability and optimality see optimality of Markov and Chebyshev.
- Markov’s inequality can also be derived using convex optimization arguments (concentration via convex optimization).
- Ville’s inequality is a time-uniform extension of Markov’s inequality for nonnegative supermartingales (or reverse submartingales, as the case may be). Just as Markov’s inequality is used as a foundation for concentration inequalities, Ville’s inequality is used as a foundation for time-uniform concentration inequalities.
- There is also a randomized Markov’s inequality (randomized inequalities:Markov) which uses external randomization.
- An intuitive explanation of Markov’s inequality is the optimization perspective on Markov’s inequality.
Chebyshev’s inequality
Chebyshev’s inequality is simply Markov’s inequality applied to . For ,
Chebyshev’s inequality gives the first flavor of concentration. If are iid with mean , then Chebyshev gives
where . Making the RHS equal to gives with probability ,
The rate of concentration here is actually not too bad, it scales with which is what we expect in many scenarios. However, the dependence on is not great. We’d rather have , which is an exponential improvement over . The Chernoff method gives us this sort of dependence.
Note that the strict iid assumption is not necessary for the above bounds. If we have for all and (uncorrelated or reverse-correlated) for all , then so the same discussion applies after swapping with . See negative correlation can improve concentration for more.
Notes:
- Chebyshev’s inequality is a specific instance of the method of moments for concentration.
- The time-uniform extension of Chebyshev’s inequality is the Dubins-Savage inequality.
- For achievability and optimality see optimality of Markov and Chebyshev.
Cantelli’s inequality
Cantelli’s inequality is a one-sided improvement to Chebyshev’s inequality. For any random variable with mean and variance , for any , write
so by Chebyshev,
which is Cantelli’s inequality. This was proved using concentration via convex optimization.
Chernoff method
The Chernoff method applies Markov’s inequality to the nonnegative random variable , and then chooses to optimize the bound. More detail is given in Chernoff method (also called the Cramer-Chernoff method).