Estimating the size of a set using cascading exclusion

Abstract

Let S be a finite set, and X1,…,Xn an i.i.d. uniform sample from S. To estimate the size |S|, without further structure, one can wait for repeats and use the birthday problem. This requires a sample size of the order |S|12. On the other hand, if S=\1,2,…,|S|\, the maximum of the sample blown up by n/(n-1) gives an efficient estimator based on any growing sample size. This paper gives refinements that interpolate between these extremes. A general non-asymptotic theory is developed. This includes estimating the volume of a compact convex set, the unseen species problem, and a host of testing problems that follow from the question `Is this new observation a typical pick from a large prespecified population?' We also treat regression style predictors. A general theorem gives non-parametric finite n error bounds in all cases.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…