Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule
Abstract
Can we accelerate convergence of gradient descent without changing the algorithm -- just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in k 2 ≈ k0.7864 iterations, where =1+2 is the silver ratio and k is the condition number. This is intermediate between the textbook unaccelerated rate k and the accelerated rate k due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate - 2 ≈ -0.7864. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period k 2. This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.