Local Polynomial Regression

kernel regression, knn methods, and partitions and trees can be generated by minimizing the following sum:

i = 1 \sum n w_{i} (x) (c - Y_{i})^{2},

as a function of $x$ . This is a simple optimization problem which yields the solution

c^{*} (x) = \frac{\sum _{i} w _{i} ( x ) Y _{i}}{\sum _{i} w _{i} ( x )} .

Thus, we can see that KNN regression arises from taking $w_{i} (x) = 1 [x \in N_{k} (X_{i})]$ ( $N_{k} (x)$ are the k-nearest neighbours of $x$ ). More generally, partition-based regressors follow from taking $w_{i} (x) = 1 [R (x) = R (x_{i})]$ , where $R (x)$ is the region of the feature space to which $x$ belongs. Finally, kernel regression follows from taking $w_{i} (x) = K (∣∣ x - X_{i} ∣∣/ h)$ .

We note that above equation is solved for at test time, i.e., for each new test point $x$ . In that sense, they are “local”: The (weight of) the set of samples recruited to take part in the optimization depends on the locality of the test point $x$ .

Local polynomial regression generalizes the above by replacing $c$ with a polynomial. The objective becomes, for dimension $d = 1$ ,

i = 1 \sum n w_{i} (x) (Y_{i} - j = 0 \sum r β_{j} x^{j})^{2} .

Note we’re minimizing over $β_{0}, \dots, β_{r}$ , which will be functions of $x$ . If $r = 1$ this is referred to as local linear regression. By writing it out in matrix form, it’s not hard to see that solution is

β (x) = (A^{T} W A)^{- 1} W^{T} A y,

where $A$ is the matrix with $i$ -th row equal to $a (x_{i})$ and $W$ is the diagonal matrix whose $i$ -th diagonal entry is $w_{i} (x)$ . The predictor is then $m (x) = (1, x, \dots, x^{r})^{T} β (x)$ . The solution is very reminiscent of weighted least squares.

The Stats Map

Explore

local polynomial regression

Graph View

Backlinks

Explore