"Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions
Abstract
We develop and analyze a variant of Nesterov's accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is "guilty" of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function f with Lipschitz continuous gradient and Hessian, we compute a point x with \|∇ f(x)\| ε in O(ε-7/4 (1/ ε) ) gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only O(ε-5/3 (1/ ε) ) evaluations.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.