kernel regression, knn methods, and partitions and trees can be generated by minimizing the following sum:
as a function of . This is a simple optimization problem which yields the solution
Thus, we can see that KNN regression arises from taking ( are the k-nearest neighbours of ). More generally, partition-based regressors follow from taking , where is the region of the feature space to which belongs. Finally, kernel regression follows from taking .
We note that above equation is solved for at test time, i.e., for each new test point . In that sense, they are “local”: The (weight of) the set of samples recruited to take part in the optimization depends on the locality of the test point .
Local polynomial regression generalizes the above by replacing with a polynomial. The objective becomes, for dimension ,
Note we’re minimizing over , which will be functions of . If this is referred to as local linear regression. By writing it out in matrix form, it’s not hard to see that solution is
where is the matrix with -th row equal to and is the diagonal matrix whose -th diagonal entry is . The predictor is then . The solution is very reminiscent of weighted least squares.