What can we say about the concentration of the empirical CDF about the true CDF?
Scalar DKW inequality
For real valued observations with , the DKW (Dvoretzky-Kiefer-Wolfowitz) inequality states that
Note this statement is uniform in which, depending on your expectations, is kind of remarkable. The intuition is that if sharply increases (it can’t decrease, of course) at some , then these are precisely the places where we expect to see many observations, so won’t be too far from .
The original DKW inequality was an asymptotic statement: it bounded the probability as . In 1990, Paul Massart proved that the constant on the right hand side was 2, and he proved that this was tight.
Multivariate DKW inequality
In 2021, Naaman gave a multivariate extension of DKW. Namely,
For sufficiently large , can be replaced by in which case the constant in front of the exponential is , which is optimal. Here the empirical CDF is as above, but ) is interpreted component-wise: to be true each component of must be at most .
In 1977, Devroye gave a similar result but with replacing .
Both the univariate and multivariate DKW inequalities hold under slightly weaker assumptions than the usual iid assumption.
Glivenko-Cantelli theorem
Glivenko and Cantelli proved that, in the scalar case,
The DKW inequality implies almost sure convergence (via the Borel-Cantelli) lemma, so the Glivenko-Cantelli theorem is strictly weaker than the DKW inequality, which provides explicit rates of convergence. But the Glivenko-Cantelli theorem is part of a broader conversation about Glivenko-Cantelli classes in empirical process theory.