Representer Theorem

We are in the setting of nonparametric regression. Suppose we have training data $(x_{i}, y_{i})$ and a loss $ℓ$ . Consider empirical risk minimization where we add a regularization term based on a Mercer kernel $K$ :

f^{*} = ar g f min \frac{1}{n} i \sum ℓ (y_{i}, f (x_{i})) + λ ∣∣ f ∣ ∣_{K}^{2} .

Then $f^{*}$ can be represented as $f^{*} (x) = \sum_{i} α_{i} K (x_{i}, x)$ (where the $x_{i}$ are the training points) for some $α_{1}, \dots, α_{n}$ .

This is an amazing fact, and allows us to boil down the gigantic search space in nonparametric regression to a quadratic program that we can solve by hand — see RKHS regression.

This theorem is actually more general, and works for any loss $ℓ$ which is a function of $(X_{i}, Y_{i}, f (X_{i}))$ and any monotone $g$ , in which case $f^{*}$ minimizes

ℓ + g (∣∣ f ∣ ∣_{K}) .

The Stats Map

Explore

representer theorem

Graph View

Backlinks

Explore