Stochastic Bilevel Optimization with Heavy-Tailed Noise

Abstract

This paper considers the smooth bilevel optimization in which the lower-level problem is strongly convex and the upper-level problem is possibly nonconvex. We focus on the stochastic setting where the algorithm can access the unbiased stochastic gradient evaluation with heavy-tailed noise, which is prevalent in many machine learning applications, such as training large language models and reinforcement learning. We propose a nested-loop normalized stochastic bilevel approximation (N2SBA) for finding an ε-stationary point with the stochastic first-order oracle (SFO) complexity of O(7p-3p-1 σpp-1 ε-4 p - 2p-1), where is the condition number, p∈(1,2] is the order of central moment for the noise, and σ is the noise level. Furthermore, we specialize our idea to solve the nonconvex-strongly-concave minimax optimization problem, achieving an ε-stationary point with the SFO complexity of~ O(2p-1p-1 σpp-1 ε-3p-2p-1). All the above upper bounds match the best-known results under the special case of the bounded variance setting, i.e., p=2. We also conduct the numerical experiments to show the empirical superiority of the proposed methods.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…