Convergence to minima for the continuous version of Backtracking Gradient Descent
Abstract
The main result of this paper is: Theorem. Let f:Rk→ R be a C1 function, so that ∇ f is locally Lipschitz continuous. Assume moreover that f is C2 near its generalised saddle points. Fix real numbers δ0>0 and 0<α <1. Then there is a smooth function h:Rk→ (0,δ0] so that the map H:Rk→ Rk defined by H(x)=x-h(x)∇ f(x) has the following property: (i) For all x∈ Rk, we have f(H(x)))-f(x)≤ -α h(x)||∇ f(x)||2. (ii) For every x0∈ Rk, the sequence xn+1=H(xn) either satisfies n→∞||xn+1-xn||=0 or n→∞||xn||=∞. Each cluster point of \xn\ is a critical point of f. If moreover f has at most countably many critical points, then \xn\ either converges to a critical point of f or n→∞||xn||=∞. (iii) There is a set E1⊂ Rk of Lebesgue measure 0 so that for all x0∈ Rk E1, the sequence xn+1=H(xn), if converges, cannot converge to a generalised saddle point. (iv) There is a set E2⊂ Rk of Lebesgue measure 0 so that for all x0∈ Rk E2, any cluster point of the sequence xn+1=H(xn) is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.