Optimal Las Vegas Locality Sensitive Data Structures
Abstract
We show that approximate similarity (near neighbour) search can be solved in high dimensions with performance matching state of the art (data independent) Locality Sensitive Hashing, but with a guarantee of no false negatives. Specifically, we give two data structures for common problems. For c-approximate near neighbour in Hamming space we get query time dn1/c+o(1) and space dn1+1/c+o(1) matching that of indyk1998approximate and answering a long standing open question from~indyk2000dimensionality and~pagh2016locality in the affirmative. By means of a new deterministic reduction from 1 to Hamming we also solve 1 and 2 with query time d2n1/c+o(1) and space d2 n1+1/c+o(1). For (s1,s2)-approximate Jaccard similarity we get query time dn+o(1) and space dn1++o(1), =1+s12s1/1+s22s2, when sets have equal size, matching the performance of~tobias2016. The algorithms are based on space partitions, as with classic LSH, but we construct these using a combination of brute force, tensoring, perfect hashing and splitter functions \`a la~naor1995splitters. We also show a new dimensionality reduction lemma with 1-sided error.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.