New Instability Results for High Dimensional Nearest Neighbor Search

Abstract

Consider a dataset of n(d) points generated independently from Rd according to a common p.d.f. fd with support(fd) = [0,1]d and supfd([0,1]d) growing sub-exponentially in d. We prove that: (i) if n(d) grows sub-exponentially in d, then, for any query point qd in [0,1]d and any epsilon>0, the ratio of the distance between any two dataset points and qd is less that 1+epsilon with probability -->1 as d-->infinity; (ii) if n(d)>[4(1+epsilon)]d for large d, then for all qd in [0,1]d (except a small subset) and any epsilon>0, the distance ratio is less than 1+epsilon with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when fd=N(mud,Sigmad).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…