Unified Optimal Analysis of the (Stochastic) Gradient Method

Abstract

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on μ-convex functions under a (milder than standard) L-smoothness assumption. We show that for carefully chosen stepsizes SGD converges after T iterations as O( LR2 [-μ4LT] + σ2μ T ) where σ2 measures the variance in the stochastic noise. For deterministic gradient descent (GD) and SGD in the interpolation setting we have σ2 =0 and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…