Optimal Algorithms for Testing Closeness of Discrete Distributions
Abstract
We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions p and q over an n-element set, we wish to distinguish whether p=q versus p is at least -far from q, in either 1 or 2 distance. Batu et al. gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant up to a logarithmic factor in n, and a polynomial factor of . In this work, we present simple (and new) testers for both the 1 and 2 settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on n, and the dependence on ; for the 1 testing problem we establish that the sample complexity is (\n2/3/4/3, n1/2/2 \).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.