On approximating ∇ f with neural networks
Abstract
Consider a feedforward neural network : Rd→ Rd such that ≈ ∇ f, where f:Rd → R is a smooth function, therefore must satisfy ∂j i = ∂i j pointwise. We prove a theorem that a network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic departure from the well-known results for one hidden layer. The proof of the theorem is straightforward, where two backward paths and a weight-tying matrix play the key roles. We then present the alternative, the implicit parametrization, where the neural network is φ: Rd → R and ∇ φ ≈ ∇ f; in addition, a "soft analysis" of ∇ φ gives a dual perspective on the theorem. Throughout, we come back to recent probabilistic models that are formulated as ∇ φ ≈ ∇ f, and conclude with a critique of denoising autoencoders.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.