Taming Preconditioner Drift: Unlocking the Potential of Second-Order Optimizers for Federated Learning on Non-IID Data
Abstract
Second-order optimizers can significantly accelerate large-scale training, yet their naive federated variants are often unstable or even diverge on non-IID data. We show that a key culprit is preconditioner drift: client-side second-order training induces heterogeneous curvature-defined geometries (i.e., preconditioner coordinate systems), and server-side model averaging updates computed under incompatible metrics, corrupting the global descent direction. To address this geometric mismatch, we propose FedPAC, a preconditioner alignment and correction framework for reliable federated second-order optimization. FedPAC explicitly decouples parameter aggregation from geometry synchronization by: (i) Alignment (i.e.,aggregating local preconditioners into a global reference and warm-starting clients via global preconditioner); and (ii) Correction (i.e., steering local preconditioned updates using a global preconditioned direction to suppress long-term drift). We provide drift-coupled non-convex convergence guarantees with linear speedup under partial participation. Empirically, FedPAC consistently improves stability and accuracy across vision and language tasks, achieving up to 5.8\% absolute accuracy gain on CIFAR-100 with ViTs. Code is available at https://anonymous.4open.science/r/FedPAC-8B24.