ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

Abstract

We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and L-Lipschitz loss functions. We consider a setting where || malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the L2-error rate is bounded by rN + AL ||/N, where N is the total number of observations, rN is the L2-error rate in the non-contaminated setting and A is a parameter coming from the local Bernstein condition. When rN is minimax-rate-optimal in a non-contaminated setting, the rate rN + AL||/N is also minimax-rate-optimal when || outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber's M-estimators (without penalization or regularized by the 1-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…