Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes
Abstract
In this paper, we establish sample complexity bounds for learning high-dimensional simplices in RK from noisy data. Specifically, we consider n i.i.d. samples uniformly drawn from an unknown simplex in RK, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within 2 or total variation (TV) distance at most from the true simplex, provided n (K2/2) eO(K/SNR2), where SNR is the signal-to-noise ratio. Extending our prior work~saberi2023sample, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance requires at least n (K3 σ2/2 + K/) samples, where σ2 denotes the noise variance. In the noiseless scenario, our lower bound n (K/) matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when SNR (K1/2), noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.