Optimal choice of k for k-nearest neighbor regression
Abstract
The k-nearest neighbor algorithm (k-NN) is a widely used non-parametric method for classification and regression. We study the mean squared error of the k-NN estimator when k is chosen by leave-one-out cross-validation (LOOCV). Although it was known that this choice of k is asymptotically consistent, it was not known previously that it is an optimal k. We show, with high probability, the mean squared error of this estimator is close to the minimum mean squared error using the k-NN estimate, where the minimum is over all choices of k.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.