Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

Abstract

Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named Adaptive Stochastic Sign SGD (Ada-StoSign) and β-Stochastic Sign SGD (β-StoSign), each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the ∞ of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign converges in expectation with a rate O( T/T + 1/M), where M is the number of clients. To the best of our knowledge, when M is sufficiently large, Ada-StoSign outperforms the state-of-the-art sign-based method whose convergence rate is O(T-1/4). Under bounded gradient assumption, β-StoSign achieves quantifiable Byzantine resilience and privacy assurances, and works with partial client participation and mini-batch gradients which could be unbounded. We corroborate and complement our theories by experiments on MNIST and CIFAR-10 datasets.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…