Probabilistic Polynomials and Hamming Nearest Neighbors
Abstract
We show how to compute any symmetric Boolean function on n variables over any field (as well as the integers) with a probabilistic polynomial of degree O(n (1/ε)) and error at most ε. The degree dependence on n and ε is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution. This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let c(n) : N → N. Suppose we are given a database D of n vectors in \0,1\c(n) n and a collection of n query vectors Q in the same dimension. For all u ∈ Q, we wish to compute a v ∈ D with minimum Hamming distance from u. We solve this problem in n2-1/O(c(n) 2 c(n)) randomized time. Hence, the problem is in "truly subquadratic" time for O( n) dimensions, and in subquadratic time for d = o((2 n)/( n)2). We apply the algorithm to computing pairs with maximum inner product, closest pair in 1 for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.