A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise

Abstract

We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted n-sized label-feature sample of at most ε n arbitrary outliers. We wish to estimate a p-dimensional parameter b* given such sample of a label-feature pair (y,x) satisfying y= x,b*+ with heavy-tailed (x,). We only assume x is L4-L2 hypercontractive with constant L>0 and has covariance matrix with minimum eigenvalue 1/μ2>0 and bounded condition number >0. The noise can be arbitrarily dependent on x and nonsymmetric as long as x has finite covariance matrix . We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on (,) nor the operator norm of . With probability at least 1-δ, our proposed estimator attains the statistical rate μ21/2(pn+(1/δ)n+ε)1/2 and breakdown-point ε1L42, both optimal in the 2-norm, assuming the near-optimal minimum sample size L42(p p + (1/δ)) n, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction v with respect to the (unknown) pre-conditioned inner product (·),·. The second stage estimate the descent direction v with respect to the (known) inner product ·,·, without knowing nor estimating .

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…