Let . The goal of density estimation is to determine the density of , call it . Here we want to make as few assumptions about as possible (i.e., we don’t assume that comes from some parametric family; see parametric versus nonparametric statistics).
An obvious solution is to simply take the empirical distribution, in which case our estimator is
But this solution obviously overfits the given data and has very few nice properties (continuity, smoothness, etc).
In terms of evaluation, typically we’re interested in loss, i.e.,
Here is treated as a fixed function of the training data. The risk is then the expectation of the loss over the training data:
As usual, the risk can be decomposed into a bias term and variance term.
Common methods to nonparametric density estimation include:
A solution to nonparametric density estimation also provides a solution to nonparametric regression as follows. Suppose is an estimate of the distribution . Then, for , we can generate an estimate of with
We can estimate both of and with nonparametric density estimation. Then we can plug this into the empirical distribution of the to estimate the integral.