Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

Abstract

We study the problem of robustly learning multi-dimensional histograms. A d-dimensional function h: D → R is called a k-histogram if there exists a partition of the domain D ⊂eq Rd into k axis-aligned rectangles such that h is constant within each such rectangle. Let f: D → R be a d-dimensional probability density function and suppose that f is OPT-close, in L1-distance, to an unknown k-histogram (with unknown partition). Our goal is to output a hypothesis that is O(OPT) + ε close to f, in L1-distance. We give an algorithm for this learning problem that uses n = Od(k/ε2) samples and runs in time Od(n). For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the d=1 case was well-understood, but significant gaps in our understanding remained even for d=2.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…