A faster and simpler algorithm for learning shallow networks

Abstract

We revisit the well-studied problem of learning a linear combination of k ReLU activations given labeled examples drawn from the standard d-dimensional Gaussian measure. Chen et al. [CDG+23] recently gave the first algorithm for this problem to run in poly(d,1/) time when k = O(1), where is the target error. More precisely, their algorithm runs in time (d/)quasipoly(k) and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only (d/)O(k2).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…