The general area of study which deals with how to move abstract “stuff” from one place to another in the most efficient way, where efficient is relative to some given cost function. This is usually framed in terms of distributional distance: Move mass from one distribution to another while minimizing cost.
The problem requires introducing new classes of divergences, since typical divergences do not take advantage of the underlying geometry. Eg: consider the following density functions defined on the real line: , and . Intuitively, it seems like is “closer” to than it is to . Unfortunately, several common divergences do not capture this relationship. The total variation distance has (here is the measure associated having density ) and the KL divergence is between any two of these distributions, since each puts zero mass on places where the others place positive mass.
The problem is that our intuition that and are closer than and comes from the underlying geometry of . This motivates constructing optimal transport costs (a special case of which is the Wasserstein distance), which take advantage of the underlying geometry of the sample space.
The Monge formulation is the strictest formulation of the optimal transport problem, giving rise to an optimization problem which may not have a solution. The Kantorovich formulation is a relaxation of Monge’s formulation, resulting in optimal transport costs.