Interplay between depth and width for interpolation in neural ODEs

Enrique Zuazua

Interplay between depth and width for interpolation in neural ODEs

Abstract

Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width p and number of layer transitions L (effectively the depth L+1). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset D comprising N pairs of points or two probability measures in Rd within a Wasserstein error margin >0. Our findings reveal a balancing trade-off between p and L, with L scaling as O(1+N/p) for dataset interpolation, and L=O(1+(pd)-1) for measure interpolation. In the autonomous case, where L=0, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of -approximate controllability and establish an error decay of O((p)p-1/d). This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates D. In the high-dimensional setting, we further demonstrate that p=O(N) neurons are likely sufficient to achieve exact control.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…