A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Abstract

In the k-nearest neighborhood model (k-NN), we are given a set of points P, and we shall answer queries q by returning the k nearest neighbors of q in P according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many k-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed k-NN is not explicit. We study property testing of k-NN graphs in theory and evaluate it empirically: given a point set P ⊂ Rδ and a directed graph G=(P,E), is G a k-NN graph, i.e., every point p ∈ P has outgoing edges to its k nearest neighbors, or is it ε-far from being a k-NN graph? Here, ε-far means that one has to change more than an ε-fraction of the edges in order to make G a k-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the k-NN property, with complexity O(n k2 / ε2) measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of (n / ε k). We evaluate our tester empirically on the k-NN models computed by various algorithms and show that it can be used to detect k-NN models with bad accuracy in significantly less time than the building time of the k-NN model.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…