We are in the setting of nonparametric regression. Suppose we have training data and a loss . Consider empirical risk minimization where we add a regularization term based on a Mercer kernel :
Then can be represented as (where the are the training points) for some .
This is an amazing fact, and allows us to boil down the gigantic search space in nonparametric regression to a quadratic program that we can solve by hand — see RKHS regression.
This theorem is actually more general, and works for any loss which is a function of and any monotone , in which case minimizes