Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms
Abstract
Let p be an unknown and arbitrary probability distribution over [0,1). We consider the problem of density estimation, in which a learning algorithm is given i.i.d. draws from p and must (with high probability) output a hypothesis distribution that is close to p. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any k and ε, we give an algorithm that makes O(k/ε2) draws from p, runs in O(k/ε2) time, and outputs a hypothesis distribution h that is piecewise constant with O(k 2(1/ε)) pieces. With high probability the hypothesis h satisfies dTV(p,h) ≤ C · optk(p) + ε, where dTV denotes the total variation distance (statistical distance), C is a universal constant, and optk(p) is the smallest total variation distance between p and any k-piecewise constant distribution. The sample size and running time of our algorithm are optimal up to logarithmic factors. The "approximation factor" C in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of k and ε can achieve C<2 regardless of what kind of hypothesis distribution it uses.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.