Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization

Abstract

We study the convergence of the last iterate (i.e., the (N+1)-th iterate) of the AdaGrad method. Although AdaGrad -- an adaptive subgradient method -- underpins a wide class of algorithms, most existing convergence analyses focus on averaged (or best) iterates. We derive worst-case upper bounds on the suboptimality of the final point and show that, with an optimally tuned stepsize parameter, the last iterate converges at the rate O(1/N1/4). We complement this guarantee with matching lower-bound constructions, proving that this rate is tight and that AdaGrad's last-iterate rate is strictly worse than the classical O(1/N1/2) rate for its averaged iterate. Technically, our analysis introduces an exponent parameter that captures the growth of the cumulative squared subgradients; combined with the last-iterate inequality of Zamani and Glineur (2025), this reduces the problem to bounding a particular series.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…