Some basic probabilistic inequalities. Used mostly under the hood to build more sophisticated concentration inequalities, but also useful in other scenarios.

## Markov’s inequality

For a nonnegative random variable $X$ and parameter $a>0$, Markov’s inequality states that

$P(X≥a)≤aE[X] .$Clearly the condition that $X≥0$ is necessary: consider any random variable which has negative mean but some positive mass.

The proof of Markov’s is easy:

$EX=∫_{0}xdP≥∫_{a}xdP≥a∫_{a}dP=aP(X≥a).$Markov’s inequality is tight in the sense that there are distributions for which the inequality holds with equality. But it is not very interesting in the sense that it doesn’t provide interesting concentration behavior. However, it’s a cornerstone in more sophisticated concentration techniques, which often rely on applying Markov’s inequality to the random variable

$exp(λi≤n∑ X_{i}).$(See the Chernoff method below, which is precisely this strategy.)

You can strengthen Markov by writing

$P(X≥a)≤aE[X1{X≥t}] ,$for any (not necessarily nonnegative) random variable $X$. To prove this just apply expectations to the inequality $1{x≥t}≤tx 1{x≥t}$ which holds for all $x$ and all $t≥0$.

Some notes and pointers:

- For achievability and optimality see optimality of Markov and Chebyshev.
- Markov’s inequality can also be derived using convex optimization arguments (concentration via convex optimization).
- Ville’s inequality is a time-uniform extension of Markov’s inequality for nonnegative supermartingales (or reverse submartingales, as the case may be). Just as Markov’s inequality is used as a foundation for concentration inequalities, Ville’s inequality is used as a foundation for time-uniform concentration inequalities.
- There is also a randomized Markov’s inequality (randomized inequalities:Markov) which uses external randomization.
- An intuitive explanation of Markov’s inequality is the optimization perspective on Markov’s inequality.

## Chebyshev’s inequality

Chebyshev’s inequality is simply Markov’s inequality applied to $(X−μ)_{2}$. For $a>0$,

$P(X−μ≥a)≤P(∣X−μ∣≥a)=P(∣X−μ∣_{2}≥a_{2})≤a_{2}V(X) .$Chebyshev’s inequality gives the first flavor of concentration. If $X_{1},X_{2},…,X_{n}$ are iid with mean $μ$, then Chebyshev gives

$P(∣X_{n}−μ∣≥a)≤a_{2}V(X_{n} ) =na_{2}V(X) ,$where $X_{n}=n_{−1}∑_{i}X_{i}$. Making the RHS equal to $δ$ gives with probability $1−δ$,

$∣X_{n}−μ∣≤V(X)/(δn) .$The *rate* of concentration here is actually not too bad, it scales with $n $ which is what we expect in many scenarios. However, the dependence on $1/δ $ is not great. We’d rather have $g(1/δ)$, which is an exponential improvement over $1/δ $. The Chernoff method gives us this sort of dependence.

Note that the strict iid assumption is not necessary for the above bounds. If we have $V(X_{i})≤v_{2}$ for all $i$ and $Cov(X_{i},X_{j})≤0$ (uncorrelated or reverse-correlated) for all $i=j$, then $V(X_{n})≤σ_{2}/n$ so the same discussion applies after swapping $V(X_{n})$ with $v_{2}$. See negative correlation can improve concentration for more.

Notes:

- Chebyshev’s inequality is a specific instance of the method of moments for concentration.
- The time-uniform extension of Chebyshev’s inequality is the Dubins-Savage inequality.
- For achievability and optimality see optimality of Markov and Chebyshev.

## Cantelli’s inequality

Cantelli’s inequality is a one-sided improvement to Chebyshev’s inequality. For any random variable $X$ with mean $μ$ and variance $σ_{2}$, for any $t≥0$, write

$P(X−μ≥a)≤P((X−μ+t)_{2}≥(t+a)_{2}),$so by Chebyshev,

$P(X−μ≥a)≤t≥0f (t+a)_{2}σ_{2}+t_{2} =σ_{2}+a_{2}σ_{2} .$which is Cantelli’s inequality. This was proved using concentration via convex optimization.

## Chernoff method

The Chernoff method applies Markov’s inequality to the nonnegative random variable $e_{λX}$, and then chooses $λ$ to optimize the bound. More detail is given in Chernoff method (also called the Cramer-Chernoff method).