On approximating $\nabla f$ with neural networks

Saeed Saremi

On approximating ∇ f with neural networks

Abstract

Consider a feedforward neural network : Rd→ Rd such that ≈ ∇ f, where f:Rd → R is a smooth function, therefore must satisfy ∂j i = ∂i j pointwise. We prove a theorem that a network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic departure from the well-known results for one hidden layer. The proof of the theorem is straightforward, where two backward paths and a weight-tying matrix play the key roles. We then present the alternative, the implicit parametrization, where the neural network is φ: Rd → R and ∇ φ ≈ ∇ f; in addition, a "soft analysis" of ∇ φ gives a dual perspective on the theorem. Throughout, we come back to recent probabilistic models that are formulated as ∇ φ ≈ ∇ f, and conclude with a critique of denoising autoencoders.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…