A broad class of distributional distances. For a convex function with and two probability measures defined on a measurable space , set
With different choices of we can obtain
f-divergences are mostly a distinct class of divergence than IPMs, intersecting only that the TV distance. Taking
results in the alpha-divergence. There is a lots of literature on fixed-time plug-in estimators for -divergences. In the sequential setting, because -divergences are convex by construction they can estimated using reverse submartingales; see confidence sequences for convex functionals.
-divergences admit the following variational inequality: For all measurable and all distributions over ,
where is the convex conjugate of : . Equality holds for any in the subdifferential . For the KL divergence, the Donsker-Varadhan variational formula strengthens this variational inequality.