Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

Abstract

This paper studies the complexity of finding an ε-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F2SA, achieving the O(ε-6) upper complexity bound for first-order smooth problems. This is slower than the optimal (ε-4) complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F2SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F2SA-p that uses pth-order finite difference for hyper-gradient approximation and improves the upper bound to O(p ε-4-p/2) for pth-order smooth problems. Finally, we demonstrate that the (ε-4) lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F2SA-p is nearly optimal in the highly smooth region p = ( ε-1 / ε-1).

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…