Stochastic Nested Variance Reduction for Nonconvex Optimization

Quanquan Gu

Stochastic Nested Variance Reduction for Nonconvex Optimization

Abstract

We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K+1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, the proposed algorithm converges to an ε-approximate first-order stationary point (i.e., \|∇ F(x)\|2≤ ε) within O(n ε-2+ε-3 n1/2ε-2) number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+n2/3ε-2) and that of SCSG O(n ε-2+ε-10/3 n2/3ε-2). For gradient dominated functions, our algorithm also achieves better gradient complexity than the state-of-the-art algorithms. Thorough experimental results on different nonconvex optimization problems back up our theory.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…