Given a distribution over feature-label pairs , the goal of mean calibration is to produce a model such that

and the goal of quantile calibration is to produce a model such that

The most interesting setting for calibration is online calibration.

Patching

Most research into calibration is not trying to generate a new model which is calibrated, but instead supposes we are given an uncalibrated model and asks whether we can we “fix” it to make it more calibrated? Yes, yes we can.

We have a target average calibration error of . Let

Then we simply set if and otherwise. Repeating this procedure leads to a model with calibration error. Moreover, it also reduces squared error. You can show that this runs for iterations where .

There’s also a one-shot algorithm that accomplishes this goal. If , then simply let . Then has .

Obviously, these assume access to the true distribution. One can approximate the means with the empirical distribution and then obtain bounds on the calibration error using standard concentration inequalities.

Similar arguments apply if we’re interested in quantile calibration. There’s also a one-shot version.