Possibly the simplest method for nonparametric density estimation after the empirical distribution.

We split the space into equal sized bins, . Note that the true density is

We estimate by simply counting the fraction of training points which landed in bin . We estimate ) by placing a uniform distribution over the bin , i.e., we assume . Our estimator is therefore

where . Note that is the number of bins and is the number of training points.

If , then and . Obviously, the choice of is important and affects performance. The minimax risk (statistical decision theory) of the histogram in a 1-Hölder space is bounded by

so minimizing over we get a risk of . This rate of is much slower than parametric models (parametric density estimation), which are usually on the order of , which is much better. In a -Holder space the rate is . Histograms are not minimax optimal, unlike kernel density estimation.

There are also high-probability guarantees. In particular, with probability ,

where is the true density and is the histogram with bin size .

There have been efforts to choose the bin size adaptively. These estimators are sometimes called density trees. See: