A Bayesian (see Bayesian statistics) approach to nonparametric regression; see also Bayesian nonparametrics.
We have training data and want to model the regression function . GP regression assumes that is a Gaussian process. That is, there exists a Mercer kernel such that for any finite collection ,
where and . Typically we suppose that since we can always subtract the mean from the data.
Note that is a distribution over functions. The shape smoothness of these functions is determined by the kernel . By choosing , we are thus choosing a prior over functions . Assuming , the prior has density
where . Note that the prior favors those such that is small. If is an eigenvalue corresponding to the eigenfunction , i.e., , then , meaning small implies large eigenvalues, which correspond to smooth functions. GP regression is self-regularizing in this way.
Now, we’re given a new test point . Doing some math and using some properties of the Gaussian pdf, one can show that, conditional on , and , is distributed as , where
where , is the covariance defined by the kernel K on the training data, and is the noise inherent to the model, i.e., we assume that where has variance . Our prediction is taken to be , but writing down the predictive distribution lets us quantify uncertainty.
Refs
A lot has been written about GP regression, as you can imagine. Some useful references are