Mixture Models, Robustness, and Sum of Squares Proofs

Abstract

We use the Sum of Squares method to develop new efficient algorithms for learning well-separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that substantially improve upon the statistical guarantees achieved by previous efficient algorithms. Firstly, we study mixtures of k distributions in d dimensions, where the means of every pair of distributions are separated by at least k. In the special case of spherical Gaussian mixtures, we give a (dk)O(1/2)-time algorithm that learns the means assuming separation at least k, for any > 0. This is the first algorithm to improve on greedy ("single-linkage") and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation k1/4. We also study robust estimation. When an unknown (1-)-fraction of X1,…,Xn are chosen from a sub-Gaussian distribution with mean μ but the remaining points are chosen adversarially, we give an algorithm recovering μ to error 1-1/t in time dO(t2), so long as sub-Gaussian-ness up to O(t) moments can be certified by a Sum of Squares proof. This is the first polynomial-time algorithm with guarantees approaching the information-theoretic limit for non-Gaussian distributions. Previous algorithms could not achieve error better than 1/2. Both of these results are based on a unified technique. Inspired by recent algorithms of Diakonikolas et al. in robust statistics, we devise an SDP based on the Sum of Squares method for the following setting: given X1,…,Xn ∈ Rd for large d and n = poly(d) with the promise that a subset of X1,…,Xn were sampled from a probability distribution with bounded moments, recover some information about that distribution.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…