Optimal Neural Network Approximation for High-Dimensional Continuous Functions
Abstract
Recently, the authors of SYZ22 developed a neural network with width 36d(2d + 1) and depth 11, which utilizes a special activation function called the elementary universal activation function, to achieve the super approximation property for functions in C([a,b]d). That is, the constructed network only requires a fixed number of neurons (and thus parameters) to approximate a d-variate continuous function on a d-dimensional hypercube with arbitrary accuracy. More specifically, only O(d2) neurons or parameters are used. One natural question is whether we can reduce the number of these neurons or parameters in such a network. By leveraging a variant of the Kolmogorov Superposition Theorem, blackwe show that there is a composition of networks generated by the elementary universal activation function with at most 10889d + 10887 nonzero parameters such that this super approximation property is attained. The composed network consists of repeated evaluations of two neural networks: one with width 36(2d+1) and the other with width 36, both having 5 layers. Furthermore, we present a family of continuous functions that requires at least width d, and thus at least d neurons or parameters, to achieve arbitrary accuracy in its approximation. This suggests that the number of nonzero parameters is optimal in the sense that it grows linearly with the input dimension d, unlike some approximation methods where parameters may grow exponentially with d.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.