Chebyshev polynomials, moment matching, and optimal estimation of the unseen

Abstract

We consider the problem of estimating the support size of a discrete distribution whose minimum non-zero mass is at least 1k. Under the independent sampling model, we show that the sample complexity, i.e., the minimal sample size to achieve an additive error of ε k with probability at least 0.1 is within universal constant factors of k k21ε , which improves the state-of-the-art result of kε2 k in VV13. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in O(n+2 k) time and attains the sample complexity within a factor of six asymptotically. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…