A Random Matrix Approach to Neural Networks

Abstract

This article studies the Gram random matrix model G=1T T, =σ(WX), classically found in the analysis of random feature maps and random neural networks, where X=[x1,…,xT]∈ Rp× T is a (data) matrix of bounded norm, W∈ Rn× p is a matrix of independent zero-mean unit variance entries, and σ: R R is a Lipschitz continuous (activation) function --- σ(WX) being understood entry-wise. By means of a key concentration of measure lemma arising from non-asymptotic random matrix arguments, we prove that, as n,p,T grow large at the same rate, the resolvent Q=(G+γ IT)-1, for γ>0, has a similar behavior as that met in sample covariance matrix models, involving notably the moment =Tn E[G], which provides in passing a deterministic equivalent for the empirical spectral measure of G. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…