Near-optimal-sample estimators for spherical Gaussian mixtures

Abstract

Statistical and machine-learning algorithms are frequently applied to high-dimensional data. In many of these applications data is scarce, and often much more costly than computation time. We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. For mixtures of any k d-dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses Ok(d2dε4) samples and runs in time Ok,ε(d35 d), both significantly lower than previously known. The constant factor Ok is polynomial for sample complexity and is exponential for the time complexity, again much smaller than what was previously known. We also show that k(dε2) samples are needed for any algorithm. Hence the sample complexity is near-optimal in the number of dimensions. We also derive a simple estimator for one-dimensional mixtures that uses O(k kε ε2 ) samples and runs in time O((kε)3k+1). Our other technical contributions include a faster algorithm for choosing a density estimate from a set of distributions, that minimizes the 1 distance to an unknown underlying distribution.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…