Optimal Data-Dependent Hashing for Approximate Near Neighbors

Ilya Razenshteyn

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Abstract

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an n-point data set in a d-dimensional space our data structure achieves query time O(d n+o(1)) and space O(n1++o(1) + dn), where =12c2-1 for the Euclidean space and approximation c>1. For the Hamming space, we obtain an exponent of =12c-1. Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors c>1. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…