Here’s an intuitive explanation of Markov’s inequality. Suppose we fix the probability and ask the question: How small can we make subject to the constraint that a.s.?

That is, we want to solve

This is easy. If we could, we’d put all of ‘s mass at 0, which would minimize subject to . But we have to move at least some of the mass to or beyond. So we’ll put mass at , and mass at 0. It should be clear that any other alteration would simply increase . With these choices, we have . Since this minimized , we have for all other which is precisely Markov’s inequality.

This also proves that (i) the distribution that attains Markov’s inequality takes only two values, and (ii) distributions that attain Markov’s inequality will only be tight at a particular .

Note that even though we can construct distributions which attain Markov’s inequality, this does not contradict that we can improve Markov’s inequality via external randomization (see randomized inequalities). The important piece is that the randomization happens independently of the random variable; we’re effectively working in an enlarged probability space. To achieve Markov, the random variable must depend on the threshold, which becomes a randomized quantity in this larger space.