“Probability approximately correct” learning theory was introduced by Leslie Valiant in 1984 (A theory of the learnable). While Valiant and early results in PAC learning were mostly focused on simple and finite model classes, the theory extends to more general settings.
Let be a class of functions from a covariate space to some label space . We witness from data drawn from a distribution over and chosen some as our model (perhaps via empirical risk minimization, or perhaps some other way). The question is, how well does generalize?
To answer this question, Valiant proposed that we look for bounds of the form: For all , there exists some such that
Thus, the algorithm is probably (with high probability, ), approximately (off by at most ) correct (true risk matches empirical risk).
PAC bounds are often proved using uniform convergence bounds (though not always, see eghere). This results in bounds that depend on the complexity of in some way, such as VC dimension or covering and packing numbers.
An alternative way to think about generalization bounds are PAC-Bayes bounds.