Let be the set of joint probability distributions over a space whose martingales are and . Given a (nonnegative) cost function , the optimal transport cost between and is
If we take where is a metric, we recover the Wasserstein distance. We call this an optimal transport cost because it’s a formulation of the optimal transport problem. The optimization problem can be seen as a relaxation of Monge formulation, often called the Kantorovich formulation (see below).
Optimal transport costs are convex divergences. We can thus build confidence sequences for them using confidence sequences for convex functionals.
The Kantorovich Dual
admits the following dual representation, known as the Kantorovich dual:
where is the set of pairs of functions such that for all (almost surely). If is lower semi-continuous then we have equality above. This might be considered the equivalent of the variational representation of e.g., f-divergences.
Motivation: The Kantorovich Relaxation
We can recover optimal transport costs by relaxing the Monge formulation, which does not always admit a solution. To remedy this, we might allow ourselves to transport dirt from one pile to multiples holes. Let denote the amount of dirt transported from pile to hole . Then we are looking to minimize the total transportation cost such that the total amount of dirt transported from pile is , and the total amount of dirt transported to hole is . That is, we are looking to solve
Again, looking at and as discrete probability distribution, we are searching for a map between and which minimizes total transportation cost who marginal distributions are and . This suggests the continuous version of the above optimization problem:
where
is the set of joint distributions on whose marginals are and . This is known as the Kantorovich relaxation.