Markovian Compression: Looking to the Past Helps Accelerate the Future
Abstract
This paper deals with distributed optimization problems that use compressed communication to achieve efficient performance and mitigate communication bottleneck. We propose a family of compression schemes in which operators transform vectors fed to their input according to a Markov chain, i.e. the stochasticity of the compressors depends on previous iterations. The compressors are implemented in the vanilla Quantized Stochastic Gradient Descent algorithm (QSGD), and, to further improve the efficiency and convergence rate, in the momentum accelerated QSGD. We provide convergence results for our algorithms with Markovian compressors, the analysis covers non-convex, Polyak-Lojasiewicz, and strongly convex cases. To demonstrate the applicability of our approach to distributed data-parallel optimization problems, we conduct experiments on the CIFAR-10 and GLUE datasets with the Resnet-18 and DeBERTaV3 models. Practical results show the superiority of methods that use our compressor design over existing schemes.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.