Decentralized Stochastic Nonconvex Optimization under the (L0,L1)-Smoothness

Abstract

This paper focuses on the decentralized stochastic optimization problem f(x)=1mΣi=1m fi(x) over a connected network of n agents, where each local function has the form of fi(x) = E[F(x; ξi)] which satisfies the (L0,L1)-smooth condition but possibly nonconvex and each random variable ξi follows distribution Di. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve an ε-stationary point at each local agent. We present a new framework for analyzing decentralized first-order methods in the (L0,L1)-smooth setting, based on the Lyapunov function related to the product of the gradient norm and the consensus error. We show that the proposed algorithm attains the upper bounds on the sample complexity of O(m-1(Lfσ2Δfε-4 + σ2ε-2 + Lf-2L13σ2Δfε-1 + Lf-2L12σ2)) per agent and the communication complexity of O((Lfε-2 + L1ε-1)γ-1/2Δf), where Lf=L0 +L1ζ, σ2 is the variance of the stochastic gradient, Δf is the initial optimal function value gap, γ is the spectral gap of the network, and ζ is the degree of the gradient dissimilarity. In the special case of L1=0, the above results (nearly) match the lower bounds of decentralized stochastic nonconvex optimization under the standard smoothness. We also conduct numerical experiments to show the empirical superiority of our method.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…