Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

Geonho Hwang

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

Abstract

Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm DAD. We first show that given a floating-point function ϕ (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network f and DAD(ϕ f), respectively. We further extend this result: given ϕ1,…,ϕn, DAD(ϕi f) can simultaneously represent arbitrary gradients while f represents the target values, under mild conditions. Our results hold for practical activation functions, e.g., ReLU, ELU, GeLU, Swish, Sigmoid, and tanh.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…