Robust Learning of a Group DRO Neuron
Abstract
We study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by w* that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to K distinct distributions p[1],…, p[K], we seek to approximate w* that minimizes the worst-case objective over convex combinations of group distributions λ ∈ ΔK, where the objective is Σi ∈ [K]λ[i]\, E( x,y) p[i](σ( w· x)-y)2 - νdf(λ,1K1) and df is an f-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter ν≥ 0. We develop a computationally efficient primal-dual algorithm that outputs a vector w that is constant-factor competitive with w* under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.