The Minimax Risk in Testing Uniformity over Large Alphabets under Missing-Ball Alternatives

Abstract

We study the problem of testing the goodness of fit of categorical count data to a Poisson distribution uniform over the categories, against a class of alternatives defined by excluding an p ball, p ≤ 2, of radius ε around the uniform rate sequence. We characterize the minimax risk for this problem as the expected number of samples n and the number of categories N go to infinity. Our result enables constant-factor comparisons among the many estimators previously proposed for this problem, rather than comparisons only at the level of convergence rates or scaling orders of sample complexity. The minimax test relies exclusively on collisions in the small sample limit, but behaves like the chi-squared test otherwise. Empirical studies across a range of parameters show that the asymptotic risk estimate is accurate in finite samples, and that the minimax test outperforms both the chi-squared test and a test based on collisions under the least favorable alternative. Our analysis involves a reduction to a structured subset of alternatives, establishing uniform asymptotic normality for a family of linear test statistics, and solving an optimization problem over N-dimensional sequences akin to classical results from signal detection in Gaussian white noise. Finally, we discuss the connection to the fixed-sample-size multinomial model, arguing that the Poisson minimax risk derived here also characterizes the minimax risk of the multinomial problem.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…