How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity
Abstract
Distributed optimization algorithms are widely used in machine learning. This paper investigates how a small amount of data sharing can improve their performance. Focusing on general linear models, we analyze the effects of data sharing on both primal and primal-dual optimization methods. Our contributions are threefold. First, from a theoretical perspective, we show that minimal data sharing improves algorithmic performance by shifting data from less favorable to more favorable structures. Contrary to the common belief that data heterogeneity is always harmful, we prove that while heterogeneity generally slows convergence in primal methods such as FedAvg and distributed PCG, it can accelerate convergence in primal-dual consensus algorithms like distributed ADMM, Fed-ADMM, and EXTRA by enriching dual dynamics. This reveals a form of duality in how heterogeneity affects different algorithm families. Second, building on this insight, we design a meta-algorithm for minimal data sharing, adaptable to both primal and primal-dual methods. We show that with as little as 1 percent shared data, convergence can be significantly accelerated across machine learning tasks. Finally, we argue from a broader perspective that even limited collaboration can yield large synergies, an idea that transcends the optimization context. Our findings provide both theoretical and practical guidance for improving distributed learning through minimal cooperation and motivate further exploration of cross-agent collaboration in solving complex global learning problems.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.