Mixtures of Neural Operators Reduce Active Complexity in Operator Learning
Abstract
Operator-learning systems are not governed solely by total parameter count; for one query, the relevant bottleneck can be the model that must be loaded and evaluated. We study this distinction for classical neural operators on compact Sobolev subsets through a constructive comparison between routed mixtures of neural operators (MoNOs) and a fixed single-neural-operator construction. The comparison concerns expert-active complexity relative to that baseline, with total stored size and routing search accounted separately. A MoNO routes each input function through a tree to one expert. Our main theorem shows that every scalar uniformly continuous nonlinear operator with bounded output Sobolev radius on the approximation set admits a MoNO approximation whose active expert has smaller depth, width, and rank scaling than the analyzed single-neural-operator construction; for Lipschitz targets these expert quantities are bounded by O(-1). The theorem turns localization into an operator-level accounting of active expert size, routing depth, and number of experts. We also prove a quantitative universal approximation theorem for the underlying neural-operator architecture, with explicit dependence on compact-set diameter and modulus of continuity.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.