Data Structures for Density Estimation
Abstract
We study statistical/computational tradeoffs for the following density estimation problem: given k distributions v1, …, vk over a discrete domain of size n, and sampling access to a distribution p, identify vi that is "close" to p. Our main result is the first data structure that, given a sublinear (in n) number of samples from p, identifies vi in time sublinear in k. We also give an improved version of the algorithm of Acharya et al. (2018) that reports vi in time linear in k. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.