Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

Abstract

We consider centralized distributed optimization in the classical federated learning setup, where n workers jointly find an -stationary point of an L-smooth, d-dimensional nonconvex function f, having access only to unbiased stochastic gradients with variance σ2. Each worker requires at most h seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are τs and τw seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to n. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term h σ2 L n 2, which improves with the number of workers n, where = f(x0) - f*, and x0 ∈ Rd is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers τs, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term τs d L and the variance-dependent runtime term h σ2 L 2, better than poly-logarithmically in n, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new "worst-case" function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…