Stochastic Zeroth-Order Optimization Under Heavy-Tailed Noise

Abstract

We study stochastic zeroth-order (ZO) optimization of smooth nonconvex objectives under heavy-tailed sample-gradient noise. This regime is motivated by empirical evidence that gradient noise in modern machine learning can violate the bounded-variance assumptions used in classical ZO theory. While first-order methods have optimal rates under bounded p-th moment noise for p∈(1,2], analogous high-probability guarantees for nonconvex ZO methods are much less understood. The ZO setting is not a direct corollary of first-order theory. First-order methods observe stochastic gradients, whereas derivative-free methods only query noisy function values and build finite-difference estimates. Thus, weak-Lp control of ∇ F(x;ξ)-∇ f(x) must first be transferred to scalar directional estimates. We propose the Robust Scalar-Clipped Zeroth-Order method (RSC-ZO), a two-point method that clips each scalar directional derivative before aggregation. Under sample-wise smoothness and a weak-Lp tail condition on the sample-gradient noise, RSC-ZO finds an -stationary point with high probability using O\!( dp2(p-1)-3p-2p-1 ) noisy function evaluations. This matches the optimal first-order -dependence. At p=2, the bound becomes O(d-4), matching the classical stochastic ZO dimension--accuracy dependence, but with a high-probability guarantee and under a weaker weak-L2 condition that can allow infinite variance. We also analyze a momentum variant and quantify its batch-size/stepsize tradeoff.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…