The loss landscape of overparameterized neural networks

Abstract

We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from Rn to R - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has n parameters and is trained on d data points, with n>d, we show that the locus M of global minima of L is usually not discrete, but rather an n-d dimensional submanifold of Rn. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that M is typically a very high-dimensional subset of Rn.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…