When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold
Abstract
Polyak-Ruppert averaging yields an asymptotically normal estimator with sandwich covariance H-1SH-1, the foundation of online inference. When the gradient step is preconditioned by a data-driven matrix Pt, we ask how fast Pt must stabilize for the central limit theorem (CLT) to remain valid. We resolve this via an exact preconditioner-isolating decomposition of the averaged error that confines Pt to a dynamic remainder Rn, leaving the martingale and Taylor terms preconditioner-free. Let Mt = (Pt H)-1 denote the effective inverse drift matrix, with \|Mt - Mt-1\|op t-β and step-size exponent α ∈ (1/2, 1). We identify a stabilization-rate threshold β > (α+1)/2 and prove that, within the class of polynomial rate hypotheses used in our upper bound, it cannot be weakened: the dynamic remainder n\,Rn vanishes in L2 whenever β > (α+1)/2, and we exhibit sequences satisfying those hypotheses for which it does not vanish when β (α+1)/2. A single stabilization argument certifies three SA variants - SA-AdaGrad, SA-RMSProp, and SA-ONS - with gain t = c/t, each delivering one-step L2(op) stabilization of order t-1, yielding the CLT n(xn - x*) N(0, H-1SH-1); under bounded inputs the pathwise rate β = 1 further preserves the n-1/6 Wasserstein rate at α* = 2/3. Under standard regularity conditions, Wald-type online inference remains valid for dynamically preconditioned averaged SGD whose stabilization rate exceeds the threshold.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.