One-dimensional System Arising in Stochastic Gradient Descent
Abstract
We consider SDEs of the form dXt = |f(Xt)|/tγ dt+1/tγ dBt, where f(x) behaves comparably to |x|k in a neighborhood of the origin, for k∈ [1,∞). We show that there exists a threshold value :=γ for γ, depending on k, such that when γ ∈ (1/2, γ) then P(Xn→ 0) = 0, and for the rest of the permissible values P(Xn→ 0)>0. The previous results extend for discrete processes that satisfy Xn+1-Xn = f(Xn)/nγ +Yn/nγ. Here, Yn+1 are martingale differences that are a.s. bounded. This result shows that for a function F, whose second derivative at degenerate saddle points is of polynomial order, it is always possible to escape saddle points via the iteration Xn+1-Xn =F'(Xn)/nγ +Yn/nγ for a suitable choice of γ.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.