On the Limits of Biased Derivative Information for Nonconvex Stochastic Optimization

Abstract

We consider the problem of finding δ-stationary points for δ> 0, i.e., x ∈ Rd such that ||∇ F(x)|| δ, for smooth, non-convex objectives, where the derivative oracles are not only stochastic but also biased. In the first-order setting, we provide tight lower bounds for finding an O((ε+ B2)1/2)-stationary point, for ε> 0 and where B is a bound on the gradient bias, matching the upper bounds of Ajalloeian and Stich (2020). We then establish bias-dependent lower bounds for algorithms that use higher-order derivative information for finding O(ε+ B)-stationary points, where B is a bound on the maximum bias for all derivatives. To complement these lower bounds, we develop trust-region based methods that, for certain ranges of bias, provide guarantees that match the corresponding lower bounds. We further improve upon the oracle complexity in high bias settings through a higher-order variance reduction scheme, in particular demonstrating the benefits, in some cases, of using higher-order derivative information, whereas such improvements are known to be unattainable for stochastic unbiased settings.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…