Restart and Adaptive Acceleration in Stochastic Gradient Methods
Abstract
We study restart schemes in stochastic optimization problems for non-smooth and weakly convex that satisfy a Kurdyka-Łojasiewicz inequality. We show that using restarts allows us to leverage the KŁ inequalities to achieve improved rates of convergence, with acceleration depending explicitly on the KŁ exponent. Furthermore, optimal restart schedules lead to learning-rates akin to Polyak steps for SGD. While regularity constants such as the KŁ exponent are typically unknown in practice, we prove that restart schemes are robust to a significant misspecification of these constants, hence nearly adaptive. We detail numerical experiments on both toy problems, where the KŁ exponent is controlled, and training of Large Language Models (LLMs).
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.