The Role of Symmetry in Optimizing Overparameterized Networks
Abstract
Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional symmetries that benefit optimization in two distinct ways. First, we prove that these symmetries act as a form of diagonal preconditioning on the Hessian, enabling the existence of better-conditioned minima within each equivalence class of functionally identical solutions. Second, we show that overparameterization increases the probability mass of global minima near typical initializations, making these favourable solutions more reachable. These results offer a potential link between loss landscape geometry and simplicity bias. Empirically, we observe wider networks have lower top eigenvalues, smaller condition numbers and faster convergence, matching our analysis. Our analysis provides a unified framework for understanding overparameterization and width growth as a geometric transformation of the loss landscape.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.