Wasserstein distances are a class of distributional metrics between distributions and over some common space . They come from considering the optimal transport cost when the cost function is a metric. In particular, if is a metric, set
where is the set of joint distributions on who marginals are and , that is
Wasserstein distances are an optimal class of distances because they take advantage of the underlying geometry of the space (as induced by ). This is contradistinction to, e.g., the KL divergence and the total variation distance.
For distributions over , the 1-Wasserstein distance is
where and are the cdfs of and .
The Wasserstein distance is an ideal metric of order 1, and useful for proving central limit theorems.