RanSOM: Second-Order Momentum with Randomized Scaling for Constrained and Unconstrained Optimization
Abstract
Momentum methods, such as Polyak's Heavy Ball, are the standard for training deep networks but suffer from curvature-induced bias in stochastic settings, limiting convergence to suboptimal O(ε-4) rates. Existing corrections typically require expensive auxiliary sampling or restrictive smoothness assumptions. We propose RanSOM, a unified framework that eliminates this bias by replacing deterministic step sizes with randomized steps drawn from distributions with mean ηt. This modification allows us to leverage Stein-type identities to compute an exact, unbiased estimate of the momentum bias using a single Hessian-vector product computed jointly with the gradient, avoiding auxiliary queries. We instantiate this framework in two algorithms: RanSOM-E for unconstrained optimization (using exponentially distributed steps) and RanSOM-B for constrained optimization (using beta-distributed steps to strictly preserve feasibility). Theoretical analysis confirms that RanSOM recovers the optimal O(ε-3) convergence rate under standard bounded noise, and achieves optimal rates for heavy-tailed noise settings (p ∈ (1, 2]).
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.