Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Abstract

This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width O(\d N1/d,\, N+2\) and depth O(L) can approximate a H\"older continuous function on [0,1]d with an approximation rate O(λd (N2L2 N)-α/d), where α∈ (0,1] and λ>0 are H\"older order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function f on [0,1]d, the approximation rate becomes O(\,d\,ωf( (N2L2 N)-1/d)\,), where ωf(·) is the modulus of continuity. We also extend our analysis to any continuous function f on a bounded set. Particularly, if ReLU networks with depth 31 and width O(N) are used to approximate one-dimensional Lipschitz continuous functions on [0,1] with a Lipschitz constant λ>0, the approximation rate in terms of the total number of parameters, W=O(N2), becomes O(λW W), which has not been discovered in the literature for fixed-depth ReLU networks.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…