Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise
Abstract
Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given k data sources, the goal is to output a classifier for each source by exploiting shared structure to reduce sample complexity. We focus on the bounded label noise setting to determine whether the fast 1/ε rates achievable in single-task learning extend to this regime with minimal dependence on k. Surprisingly, we show that this is not the case. We demonstrate that learning across k distributions inherently incurs slow rates scaling with k/ε2, even under constant noise levels, unless each distribution is learned separately. A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality under bounded noise-a cost we show is unavoidable in the multi-distribution setting. Finally, we prove that when competing with the stronger benchmark of each distribution's optimal Bayes error, the sample complexity incurs a multiplicative penalty in k. This establishes a statistical separation between random classification noise and Massart noise, highlighting a fundamental barrier unique to learning from multiple sources.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.