Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation
Abstract
We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of F+G subject to convex constraints, where F is the smooth strongly convex sum of the agent's losses and G is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of F. SONATA achieves precision ε>0 on the objective value in O(g (1/ε)) gradient computations at each node and O(g (1-)-1/2 (1/ε)) communication steps, where g is the condition number of F and characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing F, whose rate depends on much larger quantities than g (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision ε>0 in O((β/μ) (1/ε)) iterations and O((β/μ) (1-)-1/2 (1/ε)) communication steps, where β measures the degree of similarity of the agents' losses and μ is the strong convexity constant of F. Therefore, when β/μ < g, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.