Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-ojasiewicz condition
Abstract
In this work, we establish the linear convergence estimate for the gradient descent involving the delay τ∈N when the cost function is μ-strongly convex and L-smooth. This result improves upon the well-known estimates in Arjevani et al. ASS and Stich-Karmireddy SK in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate η can be extended from η≤ 1/(10Lτ) to η≤ 1/(4Lτ) for τ =1 and η≤ 3/(10Lτ) for τ ≥ 2, where L >0 is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak-ojasiewicz\,(PL) condition, for which the available choice of learning rate is further improved as η≤ 9/(10Lτ) for the large delay τ. The framework of the proof for this result is also extended to the stochastic gradient descent with time-varying delay under the PL condition. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.