Statistically Optimal Robust Mean and Covariance Estimation for Anisotropic Gaussians
Abstract
Assume that X1, …, XN is an -contaminated sample of N independent Gaussian vectors in Rd with mean μ and covariance . In the strong -contamination model we assume that the adversary replaced an fraction of vectors in the original Gaussian sample by any other vectors. We show that there is an estimator μ of the mean satisfying, with probability at least 1 - δ, a bound of the form \[ \|μ - μ\|2 c(Tr()N + \|\|(1/δ)N + \|\|), \] where c > 0 is an absolute constant and \|\| denotes the operator norm of . In the same contaminated Gaussian setup, we construct an estimator of the covariance matrix that satisfies, with probability at least 1 - δ, \[ \| - \| c(\|\|Tr()N + \|\|(1/δ)N + \|\|). \] Both results are optimal up to multiplicative constant factors. Despite the recent significant interest in robust statistics, achieving both dimension-free bounds in the canonical Gaussian case remained open. In fact, several previously known results were either dimension-dependent and required to be close to identity, or had a sub-optimal dependence on the contamination level . As a part of the analysis, we derive sharp concentration inequalities for central order statistics of Gaussian, folded normal, and chi-squared distributions.