Ridgeless Interpolation with Shallow ReLU Networks in 1D is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions
Abstract
We prove a precise geometric description of all one layer ReLU networks z(x;θ) with a single linear unit and input/output dimensions equal to one that interpolate a given dataset D=\(xi,f(xi))\ and, among all such interpolants, minimize the 2-norm of the neuron weights. Such networks can intuitively be thought of as those that minimize the mean-squared error over D plus an infinitesimal weight decay penalty. We therefore refer to them as ridgeless ReLU interpolants. Our description proves that, to extrapolate values z(x;θ) for inputs x∈ (xi,xi+1) lying between two consecutive datapoints, a ridgeless ReLU interpolant simply compares the signs of the discrete estimates for the curvature of f at xi and xi+1 derived from the dataset D. If the curvature estimates at xi and xi+1 have different signs, then z(x;θ) must be linear on (xi,xi+1). If in contrast the curvature estimates at xi and xi+1 are both positive (resp. negative), then z(x;θ) is convex (resp. concave) on (xi,xi+1). Our results show that ridgeless ReLU interpolants achieve the best possible generalization for learning 1d Lipschitz functions, up to universal constants.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.