Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

Abstract

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(/T) for strongly convex functions, instead of O( ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O(/T).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…