Towards Quantifying the Hessian Structure of Neural Networks

Ruoyu Sun

Towards Quantifying the Hessian Structure of Neural Networks

Abstract

Empirical studies reported that the Hessian matrix of neural networks (NNs) exhibits a near-block-diagonal structure, yet its theoretical foundation remains unclear. In this work, we reveal that the reported Hessian structure comes from a mixture of two forces: a ``static force'' rooted in the architecture design, and a ''dynamic force'' arisen from training. We then provide a rigorous theoretical analysis of ''static force'' at random initialization. We study linear models and 1-hidden-layer networks for classification tasks with C classes. By leveraging random matrix theory, we compare the limit distributions of the diagonal and off-diagonal Hessian blocks and find that the block-diagonal structure arises as C becomes large. Our findings reveal that C is one primary driver of the near-block-diagonal structure. These results may shed new light on the Hessian structure of large language models (LLMs), which typically operate with a large C exceeding 104.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…