Accelerated Gradient Descent via Long Steps
Abstract
Recently Grimmer [1] showed for smooth convex optimization by utilizing longer steps periodically, gradient descent's textbook LD2/2T convergence guarantees can be improved by constant factors, conjecturing an accelerated rate strictly faster than O(1/T) could be possible. Here we prove such a big-O gain, establishing gradient descent's first accelerated convergence rate in this setting. Namely, we prove a O(1/T1.0564) rate for smooth convex minimization by utilizing a nonconstant nonperiodic sequence of increasingly large stepsizes. It remains open if one can achieve the O(1/T1.178) rate conjectured by Das Gupta et. al. [2] or the optimal gradient method rate of O(1/T2). Big-O convergence rate accelerations from long steps follow from our theory for strongly convex optimization, similar to but somewhat weaker than those concurrently developed by Altschuler and Parrilo [3].
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.