MLPs at the EOC: Spectrum of the NTK

Abstract

We study the properties of the Neural Tangent Kernel (NTK) ∞K : Rm0 × Rm0 Rml × ml corresponding to infinitely wide l-layer Multilayer Perceptrons (MLPs) taking inputs from Rm0 to outputs in Rml equipped with activation functions φ(s) = a s + b s for some a,b ∈ R and initialized at the Edge Of Chaos (EOC). We find that the entries ∞K(x1,x2) can be approximated by the inverses of the cosine distances of the activations corresponding to x1 and x2 increasingly better as the depth l increases. By quantifying these inverse cosine distances and the spectrum of the matrix containing them, we obtain tight spectral bounds for the NTK matrix ∞K = [1n ∞K(xi1,xi2) : i1, i2 ∈ [1:n]] over a dataset \x1,·s,xn\ ⊂ Rm0, transferred from the inverse cosine distance matrix via our approximation result. Our results show that φ = b2a2+b2 determines the rate at which the condition number of the NTK matrix converges to its limit as depth increases, implying in particular that the absolute value (φ=1) is better than the ReLU (φ=12) in this regard.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…