Gradient descent avoids strict saddles with a simple line-search method too

Abstract

It is known that gradient descent (GD) on a C2 cost function generically avoids strict saddle points when using a small, constant step size. However, no such guarantee existed for GD with a line-search method. We provide one for a modified version of the standard Armijo backtracking method with generic, arbitrarily large initial step size. The proof underlines the double role of the Luzin N-1 property for the iteration maps, and allows to forgo the habitual Lipschitz gradient assumption. We extend this to the Riemannian setting (RGD), assuming the retraction is real analytic (though the cost function still only needs to be C2). In closing, we also improve guarantees for RGD with a constant step size in some scenarios.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…