A Bayesian (see Bayesian statistics) approach to nonparametric regression; see also Bayesian nonparametrics.

We have training data $(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})$ and want to model the regression function $f (x) = E [Y ∣ X = x]$ . GP regression assumes that $f$ is a Gaussian process. That is, there exists a Mercer kernel $K$ such that for any finite collection $X_{1}, \dots, X_{n}$ ,

(f (X_{1}), \dots, f (X_{n})) \sim N (μ, K),

where $K_{i, j} = K (X_{i}, X_{j})$ and $μ = (f (X_{1}), \dots, f (X_{n}))$ . Typically we suppose that $μ = 0$ since we can always subtract the mean from the data.

Note that $N (μ, K) = N (0, K)$ is a distribution over functions. The shape smoothness of these functions is determined by the kernel $K$ . By choosing $K$ , we are thus choosing a prior over functions $f$ . Assuming $μ = 0$ , the prior has density

π (f) = \frac{∣ K ∣ ^{- 1/2}}{( 2 π ) ^{n /2}} exp (- \frac{1}{2} f^{T} K^{- 1} f),

where $f = (f (X_{1}), \dots, m (X_{n}))$ . Note that the prior favors those $f$ such that $f^{T} K^{- 1} f$ is small. If $λ$ is an eigenvalue corresponding to the eigenfunction $f$ , i.e., $K f = λ f$ , then $λ^{- 1} = f^{T} K^{- 1} f$ , meaning small $f^{T} K^{- 1} f$ implies large eigenvalues, which correspond to smooth functions. GP regression is self-regularizing in this way.

Now, we’re given a new test point $X_{*}$ . Doing some math and using some properties of the Gaussian pdf, one can show that, conditional on $X_{1}, \dots, X_{n}$ , $Y_{1}, \dots, Y_{n}$ and $X_{*}$ , $f (X_{*})$ is distributed as $N (μ_{*}, σ_{*}^{2})$ , where

μ_{*} = k_{*}^{T} (K_{n} + σ^{2} I)^{- 1} y, σ_{*}^{2} = K (X_{*}, X_{*}) - k_{*}^{T} (K + σ^{2} I)^{- 1} y,

where $k_{*} = (K (X_{1}, X_{*}), \dots, K (X_{n}, X_{*}))$ , $K_{n}$ is the covariance defined by the kernel K on the training data, and $σ^{2}$ is the noise inherent to the model, i.e., we assume that $Y_{i} = f (X_{i}) + ϵ$ where $ϵ$ has variance $σ^{2}$ . Our prediction $f (X_{*})$ is taken to be $μ_{*}$ , but writing down the predictive distribution lets us quantify uncertainty.

Refs

A lot has been written about GP regression, as you can imagine. Some useful references are

An intuitive tutorial to GP regression
Gaussian processes for Machine Learning, the e-book.

The Stats Map

Explore

Gaussian process regression

Refs

Graph View

Backlinks

Explore