Robust Mean Estimation in High Dimensions: An Outlier Fraction Agnostic and Efficient Algorithm

Abstract

The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the 0-`norm' of an outlier indicator vector, under a second moment constraint on the datapoints. The 0-`norm' is then relaxed to the p-norm (0<p≤ 1) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative p-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point ≈ 0.3) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for p=1 it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…