Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Abstract

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with L layers of minimal widths r1*, …, rL-1* reaches a zero-loss minimum at r1*! ·s rL-1*! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r*+ h =: m we explicitly describe the manifold of global minima: it consists of T(r*, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r,m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r<r*. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h ) and vice versa in the vastly overparameterized regime (h r*). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…