On efficient robust regression with subquadratic samples

Abstract

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number κ. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses O(d/ε4) samples, where d is the dimension and ε is the corruption rate, and achieves prediction error O(εκ) under the condition εκ 1, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error o(εκ) when εκ 1 require queries that take Ω(d2) samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as εκ 1, efficient algorithms may require Ω(\dε2κ2,\ ε2d2\) samples to significantly outperform the trivial estimator that always guesses 0.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…