Let be a 1-d smoothing kernel (distinct from a Mercer kernel) i.e.,

So is essentially a well-behaved single dimensional probability distribution. Kernel regression is characterized by the Kernel function and a parameter called the bandwidth. The estimated regression function is

where . Kernel regression can therefore be considered a kind of “smoothed” knn regression, in which our prediction is the average of nearby points. Here, the definition “nearby” is given by the Kernel function and the bandwidth : Larger values of lead to more dependence on nearby points and less influence of points further away. Gaussian kernels are a common choice.

Kernel regression is a special case of local polynomial regression (in turn a special case of linear smoothers).

Properties

Kernel estimators were once of the first nonparametric regression estimators shown to be universally consistent, meaning that

as the sample size goes to infinity. The only assumption required on the true regression function, is that where .

If we take the bandwidth to be , then the risk of kernel regression is (assuming the distribution generating has a density).