Functions which capture “distances” between probability distributions. They may not be metrics in the formal sense (metric space) (eg perhaps they’re not symmetric as in the KL divergence).
Examples:
- KL divergence
- Wasserstein distance
- KS distance
- total variation distance
- Hellinger distance
- chi-squared divergence
- Fisher information distance
General families of divergences include f-divergence, alpha-divergence, and integral probability metric.
Some definitions
Different authors have different notation: Some write distances as functions of the distributions themselves, eg , and some write them as functions of random variables, .
A metric is regular if for any independent of and . This captures the notion that blurring observations by independent noise makes them harder to distinguish, i.e., decreases the distance between them.
Regularity is equivalent to sub-additivity:
A metric is homogeneous of order if
Ideal metrics of order are simultaneously regular and homogeneous of order . These come up in the study of central limit theorems (see quantitative CLT template with ideal metrics).