Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and (L0, L1)-Smoothness
Abstract
We study first-order methods for convex optimization problems with functions f satisfying the recently proposed -smoothness condition ||∇2f(x)|| (||∇ f(x)||), which generalizes the L-smoothness and (L0,L1)-smoothness. While accelerated gradient descent AGD is known to reach the optimal complexity O(L R / ) under L-smoothness, where is an error tolerance and R is the distance between a starting and an optimal point, existing extensions to -smoothness either incur extra dependence on the initial gradient, suffer exponential factors in L1 R, or require costly auxiliary sub-routines, leaving open whether an AGD-type O((0) R / ) rate is possible for small-, even in the (L0,L1)-smoothness case. We resolve this open question. Leveraging a new Lyapunov function and designing new algorithms, we achieve O((0) R / ) oracle complexity for small- and virtually any . For instance, for (L0,L1)-smoothness, our bound O(L0 R / ) is provably optimal in the small- regime and removes all non-constant multiplicative factors present in prior accelerated algorithms.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.