A specific approach to nonparametric density estimation, which can be viewed as a kind of “smoothed out” histogram. Let be a smoothing Kernel, i.e., , and . That is, induces a distribution on the space which is symmetric about the origin.
Our estimate for is
Essentially, we’re placing small lumps of mass around the data points . The size of the lumps is controlled by the Kernel and , called the bandwidth. As increases, becomes more uniform.
We can generalize the above to allow for positive definite bandwidth matrices and write
where . Taking recovers the previous formula.
For kernels of the form where is a one-dimensional kernel, or of the form , the bias of KDE is and the variance is . If we take this gives an MSE of , which is better than histograms and is minimax optimal over many classes of densities (see statistical decision theory). Of course, this is worse than parametric rates such as the MLE.