Kernel Trick

Let $K$ be a Mercer kernel such that $sup_{x, y} K (x, y) < \infty$ . Then there exists an orthonormal basis for $L_{2} (X)$ , ${ψ_{i}}$ and eigenvalues $λ_{1} \geq λ_{2} \geq \dots$ such that

K (x, y) = i \sum λ_{i} ψ_{i} (x) ψ_{i} (y) .

The Kernel is thus “ordering” the basis, in the sense that it assigns higher weight to $ψ_{1}$ than $ψ_{2}$ and so on. If we gather the first $r$ eigenfunctions, then we can think of this as dimensionality reduction.

However, we can do something better by appealing to the kernel trick. For each $x \in \cX$ , define the map

Φ (x) = (λ_{i} ψ_{i} (x))_{i = 1}^{\infty},

which is an infinite dimensional feature map for $x$ . Then,

⟨ Φ (x), Φ (y)⟩ = i \sum λ_{i} ψ_{i} (x) ψ_{i} (y) = K (x, y) .

That is, the Kernel embeds the information about these infinite dimensional sequences. Thus, if we can structure our computation to only require computations of $K (x, y)$ , then we can essentially use the entire feature maps (sans dimensionality reduction), which having to represent them explicitly. This is known as the Kernel trick.

The Stats Map

Explore

kernel trick

Graph View

Backlinks

Explore