On fast bounded locality sensitive hashing

Abstract

In this paper, we examine the hash functions expressed as scalar products, i.e., f(x)=<v,x>, for some bounded random vector v. Such hash functions have numerous applications, but often there is a need to optimize the choice of the distribution of v. In the present work, we focus on so-called anti-concentration bounds, i.e. the upper bounds of P[|<v,x>| < α ]. In many applications, v is a vector of independent random variables with standard normal distribution. In such case, the distribution of <v,x> is also normal and it is easy to approximate P[|<v,x>| < α ]. Here, we consider two bounded distributions in the context of the anti-concentration bounds. Particularly, we analyze v being a random vector from the unit ball in l∞ and v being a random vector from the unit sphere in l2. We show optimal up to a constant anti-concentration measures for functions f(x)=<v,x>. As a consequence of our research, we obtain new best results for c-approximate nearest neighbors without false negatives for lp in high dimensional space for all p∈[1,∞], for c=(\d,d1/p\). These results improve over those presented in [16]. Finally, our paper reports progress on answering the open problem by Pagh~[17], who considered the nearest neighbor search without false negatives for the Hamming distance.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…