Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Jason D. Lee

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Abstract

We focus on the task of learning a single index model σ(w · x) with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning w is governed by the information exponent k of the link function σ, which is defined as the index of the first nonzero Hermite coefficient of σ. Ben Arous et al. (2021) showed that n dk-1 samples suffice for learning w and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that n dk/2 samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns w with n dk/2 samples. We also draw connections to statistical analyses of tensor PCA and to the implicit regularization effects of minibatch SGD on empirical losses.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…