Natasha 2: Faster Non-Convex Optimization Than SGD

Abstract

We design a stochastic algorithm to train any smooth neural network to -approximate local minima, using O(-3.25) backpropagations. The best result was essentially O(-4) by SGD. More broadly, it finds -approximate local minima of any smooth nonconvex function in rate O(-3.25), with only oracle access to stochastic gradients.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…