Minimax Estimation of Discrete Distributions under 1 Loss

Abstract

We analyze the problem of discrete distribution estimation under 1 loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size S may grow with the number of observations n. We show that among distributions with bounded entropy H, the asymptotic maximum risk for the empirical distribution is 2H/ n, while the asymptotic minimax risk is H/ n. Moreover, Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound H, is asymptotically minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again 2H/ n. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…