Optimal Identity Testing with High Probability

Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< ε, δ < 1, we wish to distinguish, with probability at least 1-δ, whether the distributions are identical versus -far in total variation distance. Most prior work focused on the case that δ = (1), for which the sample complexity of identity testing is known to be (n/ε2). Given such an algorithm, one can achieve arbitrarily small values of δ via black-box amplification, which multiplies the required number of samples by ((1/δ)). We show that black-box amplification is suboptimal for any δ = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is \[ ( 1ε2(n (1/δ) + (1/δ) )) \] for any n, , and δ. For the special case of uniformity testing, where the given distribution is the uniform distribution Un over the domain, our new tester is surprisingly simple: to test whether p = Un versus d TV(p, Un) ≥ , we simply threshold d TV(p, Un), where p is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant δ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of and δ.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…