SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Abstract
In this paper, we propose a new technique named Stochastic Path-Integrated Differential EstimatoR (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO+, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO+ algorithms achieve a record-breaking gradient computation cost of O( ( n1/2 ε-2, ε-3 ) ) for finding an ε-approximate first-order and O( ( n1/2 ε-2+ε-2.5, ε-3 ) ) for finding an (ε, O(ε0.5))-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of O( d ( n1/2 ε-2, ε-3) ) which outperforms all existing results.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.