The Dual Averaging Power-Prox Method with Application to Heavy-Tail Incremental Gradient
Abstract
We study finite-sum composite optimization under two departures from classical stochastic gradient descent theory that are central in practice: incremental gradient access and heavy-tailed gradient noise. Specifically, we consider fixed cyclic passes over component gradients and assume that, at the optimum, component gradients have a bounded q-th centralized moment for some q∈(1,2]. This setting is much closer to modern ML training practice than the assumptions used in classical SGD theory, yet its theoretical understanding remains limited. We propose a Dual Averaging Power-Prox method for incremental gradients and establish, to the best of our knowledge, the first convergence analysis in this regime. We further show that our method achieves a better asymptotic convergence rate than the corresponding SGD method with i.i.d. (with-replacement) sampling.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.