Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks
Abstract
Natural-gradient methods markedly accelerate the training of Physics-Informed Neural Networks (PINNs), yet their Gauss--Newton update must be solved in the parameter space, incurring a prohibitive O(n3) time complexity, where n is the number of network trainable weights. We show that exactly the same step can instead be formulated in a generally smaller residual space of size m = Σγ Nγ dγ, where each residual class γ (e.g. PDE interior, boundary, initial data) contributes Nγ collocation points of output dimension dγ. Building on this insight, we introduce Dual Natural Gradient Descent (D-NGD). D-NGD computes the Gauss--Newton step in residual space, augments it with a geodesic-acceleration correction at negligible extra cost, and provides both a dense direct solver for modest m and a Nystrom-preconditioned conjugate-gradient solver for larger m. Experimentally, D-NGD scales second-order PINN optimization to networks with up to 12.8 million parameters, delivers one- to three-order-of-magnitude lower final error L2 than first-order methods (Adam, SGD) and quasi-Newton methods, and -- crucially -- enables natural-gradient training of PINNs at this scale on a single GPU.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.