Chromatic k-Nearest Neighbor Queries
Abstract
Let P be a set of n colored points. We develop efficient data structures that store P and can answer chromatic k-nearest neighbor (k-NN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast k-NN classifiers. Our main aim is to obtain query times that are independent of k while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the k-nearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of O(n1 / 2 n) for points in R1, and with query times varying between O(n2/32/3 n) and O(n5/6 polylog n), depending on the distance measure used, for points in R2. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1-)f* times, where f* is the frequency of the most frequent color, we obtain a query time of O( n + 11- n) in R1 and expected query times ranging between O(n1/2-3/2) and O(n1/2-5/2) in R2 using near-linear space (ignoring polylogarithmic factors).
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.