Natasha 2: Faster Non-Convex Optimization Than SGD
Abstract
We design a stochastic algorithm to train any smooth neural network to -approximate local minima, using O(-3.25) backpropagations. The best result was essentially O(-4) by SGD. More broadly, it finds -approximate local minima of any smooth nonconvex function in rate O(-3.25), with only oracle access to stochastic gradients.
0
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.