Sharp Analysis of Stochastic Optimization under Global Kurdyka-ojasiewicz Inequality
Abstract
We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Lojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of O(ε-(4-α)/α) for SGD when the objective satisfies the so called α-PL condition, where α is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of O(ε-2/α) when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of α=1 which appears in applications such as policy optimization in reinforcement learning.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.