Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

Abstract

We study the fundamental problem of learning a single neuron, i.e., a function of the form xσ(w·x) for monotone activations σ:R, with respect to the L22-loss in the presence of adversarial label noise. Specifically, we are given labeled examples from a distribution D on (x, y)∈Rd × R such that there exists w∈Rd achieving F(w)=ε, where F(w)=E(x,y) D[(σ(w· x)-y)2]. The goal of the learner is to output a hypothesis vector w such that F(w)=C\, ε with high probability, where C>1 is a universal constant. As our main contribution, we give efficient constant-factor approximate learners for a broad class of distributions (including log-concave distributions) and activation functions. Concretely, for the class of isotropic log-concave distributions, we obtain the following important corollaries: For the logistic activation, we obtain the first polynomial-time constant factor approximation (even under the Gaussian distribution). Our algorithm has sample complexity O(d/ε), which is tight within polylogarithmic factors. For the ReLU activation, we give an efficient algorithm with sample complexity O(d\, (1/ε)). Prior to our work, the best known constant-factor approximate learner had sample complexity (d/ε). In both of these settings, our algorithms are simple, performing gradient-descent on the (regularized) L22-loss. The correctness of our algorithms relies on novel structural results that we establish, showing that (essentially all) stationary points of the underlying non-convex loss are approximately optimal.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…