Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

Abstract

We study online covariance matrix estimation for Polyak--Ruppert averaged stochastic gradient descent (SGD). The online batch-means estimator of Zhu, Chen and Wu (2023) achieves an operator-norm convergence rate of O(n-(1-α)/4), which yields O(n-1/8) at the optimal learning-rate exponent α → 1/2+. A rigorous per-block bias analysis reveals that re-tuning the block-growth parameter improves the batch-means rate to O(n-(1-α)/3), achieving O(n-1/6). The modified estimator requires no Hessian access and preserves O(d2) memory. We provide a complete error decomposition into variance, stationarity bias, and nonlinearity bias components. A weighted-averaging variant that avoids hard truncation is also discussed. We establish the minimax rate (n-(1-α)/2) for Hessian-free covariance estimation from the SGD trajectory: a Le Cam lower bound gives (n-(1-α)/2), and a trajectory-regression estimator--which estimates the Hessian by regressing SGD increments on iterates--achieves O(n-(1-α)/2), matching the lower bound. The construction reveals that the bottleneck is the sublinear accumulation of information about the Hessian from the SGD drift.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…