Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees
Abstract
We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER tracking of a weakly convex loss function. We adopt the norm of the gradient of the Moreau envelope as our measure of stationarity and show that the sample complexity of O(-3) is possible in both cases, with only the constant larger in the second case. Finally, we demonstrate the performance of our algorithm with a robust learning example and a weakly convex, non-smooth regression example.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.