The loss landscape of overparameterized neural networks
Abstract
We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from Rn to R - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has n parameters and is trained on d data points, with n>d, we show that the locus M of global minima of L is usually not discrete, but rather an n-d dimensional submanifold of Rn. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that M is typically a very high-dimensional subset of Rn.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.