Possibly the simplest method for nonparametric density estimation after the empirical distribution.
We split the space into equal sized bins, . Note that the true density is
We estimate by simply counting the fraction of training points which landed in bin . We estimate ) by placing a uniform distribution over the bin , i.e., we assume . Our estimator is therefore
where . Note that is the number of bins and is the number of training points.
If , then and . Obviously, the choice of is important and affects performance. The minimax risk (statistical decision theory) of the histogram in a 1-Hölder space is bounded by
so minimizing over we get a risk of . This rate of is much slower than parametric models (parametric density estimation), which are usually on the order of , which is much better. In a -Holder space the rate is . Histograms are not minimax optimal, unlike kernel density estimation.
There are also high-probability guarantees. In particular, with probability ,
where is the true density and is the histogram with bin size .
There have been efforts to choose the bin size adaptively. These estimators are sometimes called density trees. See:
- Density estimation trees by Ram and Gray
- Density estimation via adaptive sequential partitition Li, Yang, and Wong.
- Density estimation via adaptive partitioning, Liu and Wong.