Streaming Krylov-Accelerated Stochastic Gradient Descent

Abstract

We present SKA-SGD (Streaming Krylov-Accelerated Stochastic Gradient Descent), a novel optimization approach that accelerates convergence for ill-conditioned problems by projecting stochastic gradients onto a low-dimensional Krylov subspace. Directly inspired by recent advances in s-step Conjugate Gradient methods with streaming Gauss-Seidel Gram solvers dambra2025sstep, our method extends these techniques to the stochastic optimization domain. Our approach combines three key innovations: (1) projection coefficients computed via a single streaming Gauss-Seidel iteration, which is mathematically equivalent to Modified Gram-Schmidt orthogonalization; (2) a Chebyshev polynomial basis for constructing the Krylov subspace, providing superior numerical stability; and (3) efficient implementation for AMD GPUs using HIP. We prove that our streaming approach achieves a backward error near machine precision with O(s2) complexity rather than O(s3), where s is the Krylov subspace dimension. Experimental results demonstrate that SKA-SGD significantly outperforms standard SGD and Adam in convergence rate and final error, particularly for problems with condition numbers exceeding 103. GPU performance analysis reveals a crossover point where communication-avoiding benefits outweigh computational overhead, typically occurring at moderate scale (p ≈ 64 processors) for problem sizes n ≥ 106.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…