Let be a Mercer kernel such that . Then there exists an orthonormal basis for , and eigenvalues such that
The Kernel is thus “ordering” the basis, in the sense that it assigns higher weight to than and so on. If we gather the first eigenfunctions, then we can think of this as dimensionality reduction.
However, we can do something better by appealing to the kernel trick. For each , define the map
which is an infinite dimensional feature map for . Then,
That is, the Kernel embeds the information about these infinite dimensional sequences. Thus, if we can structure our computation to only require computations of , then we can essentially use the entire feature maps (sans dimensionality reduction), which having to represent them explicitly. This is known as the Kernel trick.