Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Jianfeng Lu

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Abstract

We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local ojasiewicz condition introduced by Chatterjee in chatterjee2022convergence and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…