Hardness of Bichromatic Closest Pair with Jaccard Similarity
Abstract
Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A× B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a b|/|a b| for (a,b)∈ A× B. We consider the approximate version of the problem where we are given thresholds j1>j2 and wish to return a pair from A× B that has Jaccard similarity higher than j2 if there exists a pair in A× B with Jaccard similarity at least j1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in O(n2-δ) time if j1 j21-δ. In particular, for δ=(1), the approximation ratio j1/j2=1/j2δ increases polynomially in 1/j2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2-(1)) time for j1/j2=1/j2o(1). Specifically, assuming OVC, we prove that for any δ>0 there exists an >0 such that Bichromatic Closest Pair with Jaccard similarity requires time (n2-δ) for any choice of thresholds j2<j1<1-δ, that satisfy j1 j21-.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.