Asynchronous Distributed Optimization with Stochastic Delays

Abstract

We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of variance-reduced gradient algorithms like SAGA work well -- little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with m machines, under a natural stochastic delay model with an mean delay of m, ADSAGA converges in O((n + m)(1/ε)) iterations, where n is the number of component functions, and is a condition number. This complexity sits squarely between the complexity O((n + )(1/ε)) of SAGA without delays and the complexity O((n + m)(1/ε)) of parallel asynchronous algorithms where the delays are arbitrary (but bounded by O(m)), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in O(n2(1/ε)) iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…