Agnostic Sample Compression Schemes for Regression
Abstract
We obtain the first positive results for bounded sample compression in the agnostic regression setting with the p loss, where p∈ [1,∞]. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for 1 and ∞ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other p loss, p∈ (1,∞), there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the 2 loss. We close by posing general open questions: for agnostic regression with 1 loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the 2 loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.