Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space
Abstract
We study the r-near neighbors reporting problem (r-NN), i.e., reporting all points in a high-dimensional point set S that lie within a radius r of a given query point q. Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving r-NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for r-NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.