Fast Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron

Abstract

We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the fully-connected feed-forward neural network. The algorithm has a precomputation step and a sampling step. While it generally requires O(Nn) work to compute an entry (and the entire column) in the GNH matrix for a neural network with N parameters and n data points, our fast sampling algorithm reduces the cost to O(n+d/ε2) work, where d is the output dimension of the network and ε is a prescribed accuracy (independent of N). One application of our algorithm is constructing the hierarchical-matrix (H-matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. It generally requires O(N2) memory and O(N3) work to store and factorize the GNH matrix, respectively. The H-matrix approximation requires only O(N ro) memory footprint and O(N ro2) work to be factorized, where ro N is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the H-matrix approximation on classification and autoencoder neural networks.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…