Dualizing Le Cam's method for functional estimation, with applications to estimating the unseens
Abstract
Le Cam's method (or the two-point method) is a commonly used tool for obtaining statistical lower bound and especially popular for functional estimation problems. This work aims to explain and give conditions for the tightness of Le Cam's lower bound in functional estimation from the perspective of convex duality. Under a variety of settings it is shown that the maximization problem that searches for the best two-point lower bound, upon dualizing, becomes a minimization problem that optimizes the bias-variance tradeoff among a family of estimators. For estimating linear functionals of a distribution our work strengthens prior results of Donoho-Liu DL91 (for quadratic loss) by dropping the H\"olderian assumption on the modulus of continuity. For exponential families our results extend those of Juditsky-Nemirovski JN09 by characterizing the minimax risk for the quadratic loss under weaker assumptions on the exponential family. We also provide an extension to the high-dimensional setting for estimating separable functionals. Notably, coupled with tools from complex analysis, this method is particularly effective for characterizing the ``elbow effect'' -- the phase transition from parametric to nonparametric rates. As the main application we derive sharp minimax rates in the Distinct elements problem (given a fraction p of colored balls from an urn containing d balls, the optimal error of estimating the number of distinct colors is (d-12\p1-p,1\)) and the Fisher's species problem (given n iid observations from an unknown distribution, the optimal prediction error of the number of unseen symbols in the next (unobserved) r · n observations is (n-\1r+1,12\)).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.