Some basic probabilistic inequalities. Used mostly under the hood to build more sophisticated concentration inequalities, but also useful in other scenarios.

Markov’s inequality

For a nonnegative random variable and parameter , Markov’s inequality states that

Clearly the condition that is necessary: consider any random variable which has negative mean but some positive mass.

The proof of Markov’s is easy:

Markov’s inequality is tight in the sense that there are distributions for which the inequality holds with equality. But it is not very interesting in the sense that it doesn’t provide interesting concentration behavior. However, it’s a cornerstone in more sophisticated concentration techniques, which often rely on applying Markov’s inequality to the random variable

(See the Chernoff method below, which is precisely this strategy.)

You can strengthen Markov by writing

for any (not necessarily nonnegative) random variable . To prove this just apply expectations to the inequality which holds for all and all .

Some notes and pointers:

Paley-Zygmund inequality

A lower bound, with a similar flavor to Markov’s inequality. If is non-negative, then for all ,

This is useful for proving lower bounds. Similarly to Markov’s inequality, it is rarely applied to directly, but typically some transformation of .

Chebyshev’s inequality

Chebyshev’s inequality is simply Markov’s inequality applied to . For ,

Chebyshev’s inequality gives the first flavor of concentration. If are iid with mean , then Chebyshev gives

where . Making the RHS equal to gives with probability ,

The rate of concentration here is actually not too bad, it scales with which is what we expect in many scenarios. However, the dependence on is not great. We’d rather have , which is an exponential improvement over . The Chernoff method gives us this sort of dependence.

Note that the strict iid assumption is not necessary for the above bounds. If we have for all and (uncorrelated or reverse-correlated) for all , then so the same discussion applies after swapping with . See negative correlation can improve concentration for more.

Notes:

Cantelli’s inequality

Cantelli’s inequality is a one-sided improvement to Chebyshev’s inequality. For any random variable with mean and variance , for any , write

so by Chebyshev,

which is Cantelli’s inequality. This was proved using concentration via convex optimization.

Chernoff method

The Chernoff method applies Markov’s inequality to the nonnegative random variable , and then chooses to optimize the bound. More detail is given in Chernoff method (also called the Cramer-Chernoff method).