Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Abstract

We study the problem of gradient descent learning of a single-index target function f*(x) = σ*(x,θ) under isotropic Gaussian data in Rd, where the unknown link function σ*:R has information exponent p (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with n d(p) samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns f* with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and runtime complexity of n T = (d\!·\! polylog d), where (·) hides a constant only depending on the degree of σ*; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. More generally, we show that n d(p*-1) 1 samples are sufficient to achieve low generalization error, where p* p is the generative exponent of the link function. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…