Minimum width for universal approximation using ReLU networks on compact domain
Abstract
It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width w enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for Lp approximation of Lp functions from [0,1]dx to Rdy is exactly \dx,dy,2\ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, w=\dx+1,dy\ when the domain is Rdx, our result first shows that approximation on a compact domain requires smaller width than on Rdx. We next prove a lower bound on w for uniform approximation using general activation functions including ReLU: w dy+1 if dx<dy2dx. Together with our first result, this shows a dichotomy between Lp and uniform approximations for general activation functions and input/output dimensions.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.