Kernel Density Estimation

A specific approach to nonparametric density estimation, which can be viewed as a kind of “smoothed out” histogram. Let $K$ be a smoothing Kernel, i.e., $\int K (x) d x = 1$ , $\int x K (x) d x = 0$ and $\int x^{2} K (x) d x > 0$ . That is, $K$ induces a distribution on the space $X$ which is symmetric about the origin.

Our estimate for $p (x)$ is

p (x) = \frac{1}{n} i = 1 \sum n \frac{1}{h ^{d}} K (\frac{x - X _{i}}{h}) .

Essentially, we’re placing small lumps of mass around the data points $X_{1}, \dots, X_{n}$ . The size of the lumps is controlled by the Kernel and $h$ , called the bandwidth. As $h$ increases, $p$ becomes more uniform.

We can generalize the above to allow for positive definite bandwidth matrices $H$ and write

p (x) = i = 1 \sum n K_{H} (x - X_{i}),

where $K_{H} (z) = ∣ H ∣^{- 1/2} K (H^{- 1/2} z)$ . Taking $H = h^{2} I$ recovers the previous formula.

For kernels of the form $K (x) = c \cdot k (∥ x ∥)$ where $k$ is a one-dimensional kernel, or of the form $K (x) = k (x_{1}) \dots k (x_{n})$ , the bias of KDE is $O (h^{4})$ and the variance is $O (1/ (n h^{d}))$ . If we take $h = O (n^{- \frac{1}{4 + d}})$ this gives an MSE of $O (n^{- \frac{4}{4 + d}})$ , which is better than histograms and is minimax optimal over many classes of densities (see statistical decision theory). Of course, this is worse than parametric rates such as the MLE.

The Stats Map

Explore

kernel density estimation

Graph View

Backlinks

Explore