p Testing and Learning of Discrete Distributions
Abstract
The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general p metrics. The intuitions and results often contrast with the classic 1 case. For p > 1, we can learn and test with a number of samples that is independent of the support size of the distribution: With an p tolerance ε, O(\ 1/εq, 1/ε2 \) samples suffice for testing uniformity and O(\ 1/εq, 1/ε2\) samples suffice for learning, where q=p/(p-1) is the conjugate of p. As this parallels the intuition that O(n) and O(n) samples suffice for the 1 case, it seems that 1/εq acts as an upper bound on the "apparent" support size. For some p metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if p > 43. The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all 1 ≤ p ≤ 2. Another algorithm gives order-optimal sample complexity for ∞ uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all p metrics. The author thanks Cl\'ement Canonne for discussions and contributions to this work.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.