On Tackling High-Dimensional Nonconvex Stochastic Optimization via Stochastic First-Order Methods with Non-smooth Proximal Terms and Variance Reduction
Abstract
When the nonconvex problem is complicated by stochasticity, the sample complexity of stochastic first-order methods may depend linearly on the problem dimension, which is undesirable for large-scale problems. To alleviate this linear dependence, we adopt non-Euclidean settings and propose two choices of non-smooth proximal terms when taking the stochastic gradient steps. This approach leads to stronger convergence metric, incremental computational overhead, and potentially dimension-insensitive sample complexity. We also consider further acceleration through variance reduction which achieves near optimal sample complexity and, to our best knowledge, is the first such result in the 1/∞ setting. Since the use of non-smooth proximal terms is unconventional, the convergence analysis deviates much from algorithms in Euclidean settings or employing Bregman divergence, providing tools for analyzing other non-Euclidean choices of distance functions. Efficient resolution of the subproblems in various scenarios is also discussed and simulated. We illustrate the dimension-insensitive property of the proposed methods via preliminary numerical experiments.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.