Conformal prediction is a popular and practical tool in uncertainty quantification. We’re in a supervised learning setting and have a black-box model . Given a new observation , we want to develop a confidence set for its label .

It turns out we can do this simply by having access to predictions and outcomes . We need only make the assumption that the data are exchangeable, nothing else. There are several flavors of conformal prediction, split conformal being the easiest both conceptually and in terms of implementation.

Split conformal

Introduce an arbitrary score function which maps observation-label pairs to a positive number. We should think of as quantifying a heuristic notion of uncertainty, but it can be anything. If is a regressor then a popular choice is . For classification we might take if is the probability that assigns to being in class .

Suppose we have some validation data , and we compute the scores . Given a new covariate , our uncertainty set is based on the conformal p-value

Let be the true label for . If , are exchangeable, then is a p-value. Therefore, given , our confidence set is

which by definition of a p-value, satisfies . Intuitively, if then should be large, making small and ensuring that .

An alternative way to describe the same algorithm is as follows. Compute the quantile of . Call this . Then our confidence set for a new observation is

It can be shown to obey

where we note that the probability is over both the new test point and the training data.

The intuition for the quantile version is very straightforward: we’re just letting the holdout set tell us what the most extreme values of are empirically. If we had access to the full distribution , then we could compute the precise quantile and our set would be an exact confidence interval. The extra factor of is a finite sample correction.

Extensions

Conformal inference is a huge research now, and I don’t stand the slightest chance of keeping up with all the developments. But here are a few:

Beyond Marginal guarantees. The marginal guarantee given by split conformal is quite weak. One would ideally like to give coverage guarantees that are conditional on some properties of the new covariates . Unfortunately, fully conditional coverage (i.e., conditional on itself) is too high a bar: Lei and Wasserman showed that, absent any further assumptions, an algorithm that achieves conditional coverage will also be trivial. But there are notions of conditional coverage:

Beyond exchangeability

  • The conditional coverage methods of Gibbs et al. and Kandinsky conformal prediction allow guarantees to be given under distribution shift.
  • Conformal Prediction Under Covariate Shift Tibsharani et al.
  • [Conformal prediction beyond exchangeability](Conformal prediction beyond exchangeability), Barber et al.

Reading