On the Convergence of Decentralized Gradient Descent
Abstract
Consider the consensus problem of minimizing f(x)=Σi=1n fi(x) where each fi is only known to one individual agent i out of a connected network of n agents. All the agents shall collaboratively solve this problem and obtain the solution subject to data exchanges restricted to between neighboring agents. Such algorithms avoid the need of a fusion center, offer better network load balance, and improve data privacy. We study the decentralized gradient descent method in which each agent i updates its variable x(i), which is a local approximate to the unknown variable x, by combining the average of its neighbors' with the negative gradient step -α ∇ fi(x(i)). The iteration is x(i)(k+1) Σneighbor j of i wij x(j)(k) - α ∇ fi(x(i)(k)), each agent i, where the averaging coefficients form a symmetric doubly stochastic matrix W=[wij] ∈ Rn × n. We analyze the convergence of this iteration and derive its converge rate, assuming that each fi is proper closed convex and lower bounded, ∇ fi is Lipschitz continuous with constant Lfi, and stepsize α is fixed. Provided that α < O(1/Lh) where Lh=i\Lfi\, the objective error at the averaged solution, f(1nΣi x(i)(k))-f*, reduces at a speed of O(1/k) until it reaches O(α). If fi are further (restricted) strongly convex, then both 1nΣi x(i)(k) and each x(i)(k) converge to the global minimizer x* at a linear rate until reaching an O(α)-neighborhood of x*. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an O(α)-neighborhood of the true unknown sparse signal.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.