Convergence to minima for the continuous version of Backtracking Gradient Descent

Tuyen Trung Truong

Convergence to minima for the continuous version of Backtracking Gradient Descent

Abstract

The main result of this paper is: Theorem. Let f:Rk→ R be a C1 function, so that ∇ f is locally Lipschitz continuous. Assume moreover that f is C2 near its generalised saddle points. Fix real numbers δ0>0 and 0<α <1. Then there is a smooth function h:Rk→ (0,δ0] so that the map H:Rk→ Rk defined by H(x)=x-h(x)∇ f(x) has the following property: (i) For all x∈ Rk, we have f(H(x)))-f(x)≤ -α h(x)||∇ f(x)||2. (ii) For every x0∈ Rk, the sequence xn+1=H(xn) either satisfies n→∞||xn+1-xn||=0 or n→∞||xn||=∞. Each cluster point of \xn\ is a critical point of f. If moreover f has at most countably many critical points, then \xn\ either converges to a critical point of f or n→∞||xn||=∞. (iii) There is a set E1⊂ Rk of Lebesgue measure 0 so that for all x0∈ Rk E1, the sequence xn+1=H(xn), if converges, cannot converge to a generalised saddle point. (iv) There is a set E2⊂ Rk of Lebesgue measure 0 so that for all x0∈ Rk E2, any cluster point of the sequence xn+1=H(xn) is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…