Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Abstract

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum 2 norm ("ridgeless") interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xi ∈ Rp are obtained by applying a linear transform to a vector of i.i.d. entries, xi = 1/2 zi (with zi ∈ Rp); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi = (W zi) (with zi ∈ Rd, W ∈ Rp × d a matrix of i.i.d. entries, and an activation function acting componentwise on W zi). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…