Multi-Resolution Hashing for Fast Pairwise Summations
Abstract
A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X⊂ Rd and a pairwise function w:Rd× Rd [0,1], we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Zw(y)=1|X|Σx∈ Xw(x,y) for any query y∈ Rd. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is a collection of T≥ 1 hashing schemes with collision probabilities p1,…, pT such that t∈ [T]\pt(x,y)\ = (w(x,y)). This leads to a data-structure that approximates Zw(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(x,y)=eφ( x,y) of the inner product on the unit sphere x,y∈ Sd-1. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to Rd and from scalar functions to vector functions.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.