Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

Marcel Hudiani

Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

Abstract

We study the convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function F is globally convex or non-convex whose gradient is γ-H\"older. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem, we recover results for both SGD and SHB: s≤ t \|∇ F(ws)\|2 = o(tp-1) for non-convex objectives and F(wτ t) - F* = o(t2γ/(1+γ) · (p-1,-2p+1)-ε) for β ∈ (0, 1), τ := ∈f \ t > 0 : F(wt) = F*\, and s ≤ t F(ws) - F* = o(tp-1) for convex objectives F whose minimum is F*. In addition, we proved that SHB with constant momentum parameter β ∈ (0, 1) attains a convergence rate of F(wt) - F* = O(t(p-1,-2p+1) 2 tδ) with probability at least 1-δ when F is convex and γ = 1 and step size αt = (t-p) with p ∈ (12, 1).

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…