Interplay between depth and width for interpolation in neural ODEs
Abstract
Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width p and number of layer transitions L (effectively the depth L+1). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset D comprising N pairs of points or two probability measures in Rd within a Wasserstein error margin >0. Our findings reveal a balancing trade-off between p and L, with L scaling as O(1+N/p) for dataset interpolation, and L=O(1+(pd)-1) for measure interpolation. In the autonomous case, where L=0, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of -approximate controllability and establish an error decay of O((p)p-1/d). This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates D. In the high-dimensional setting, we further demonstrate that p=O(N) neurons are likely sufficient to achieve exact control.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.