Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction
Abstract
Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of O(d1/2T-1/4), where d represents the dimension and T is the iteration number. In this paper, we improve this convergence rate to O(d1/2T-1/3) by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of O(m1/4d1/2T-1/2), where m denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of O(d1/2T-1/2 + dn-1/2) and O(d1/4T-1/4) respectively, outperforming the previous results of O(dT-1/4 + dn-1/2) and O(d3/8T-1/8), where n represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.