Optimal Transport Costs

Let $Γ (ρ, ν)$ be the set of joint probability distributions over a space $X \times X$ whose martingales are $ρ$ and $ν$ . Given a (nonnegative) cost function $c : X \times X \to R_{\geq 0}$ , the optimal transport cost between $ρ$ and $ν$ is

L_{c} (ρ, ν) = π \in Γ (ρ, ν) in f E_{(X, Y) \sim π) c (X, Y)} = π \in Γ (ρ, ν) in f \int c (x, y) d π (x, y) .

If we take $c = d^{p}$ where $d$ is a metric, we recover the Wasserstein distance. We call this an optimal transport cost because it’s a formulation of the optimal transport problem. The optimization problem can be seen as a relaxation of Monge formulation, often called the Kantorovich formulation (see below).

Optimal transport costs are convex divergences. We can thus build confidence sequences for them using confidence sequences for convex functionals.

The Kantorovich Dual

$L_{c}$ admits the following dual representation, known as the Kantorovich dual:

L_{c} (ρ, ν) \leq (f, g) \in M_{c} sup {E_{P} [f] + E_{Q} [g]},

where $M_{c}$ is the set of pairs of functions $f, g$ such that $f (x) + g (y) \leq c (x, y)$ for all $x, y \in X$ (almost surely). If $c$ is lower semi-continuous then we have equality above. This might be considered the equivalent of the variational representation of e.g., f-divergences.

Motivation: The Kantorovich Relaxation

We can recover optimal transport costs by relaxing the Monge formulation, which does not always admit a solution. To remedy this, we might allow ourselves to transport dirt from one pile to multiples holes. Let $π (i, j)$ denote the amount of dirt transported from pile $i$ to hole $j$ . Then we are looking to minimize the total transportation cost $\sum_{i, j} π (i, j) c (i, j)$ such that the total amount of dirt transported from pile $i$ is $α_{i}$ , and the total amount of dirt transported to hole $j$ is $β_{j}$ . That is, we are looking to solve

π min i, j \sum π (i, j) c (i, j) s.t. j \sum π (i, j) = α (i), i \sum π (i, j) = β (j) .

Again, looking at $α$ and $β$ as discrete probability distribution, we are searching for a map between $α$ and $β$ which minimizes total transportation cost who marginal distributions are $α$ and $β$ . This suggests the continuous version of the above optimization problem:

L_{c} (α, β) = π \in Γ (α, β) in f {E_{(X, Y) \sim π} [c (X, Y)]},

where

Γ (α, β) = {π \in P (X \times X) : \int π (x, y) d y = α (x), \int π (x, y) d x = β (y) .},

is the set of joint distributions on $X \times X$ whose marginals are $α$ and $β$ . This is known as the Kantorovich relaxation.

The Stats Map

Explore

optimal transport costs

The Kantorovich Dual

Motivation: The Kantorovich Relaxation

Table of Contents

Graph View

Backlinks

Explore