Decentralized Relaxed Smooth Optimization with Gradient Descent Methods
Abstract
L0-smoothness, which has been pivotal to advancing decentralized optimization theory, is often fairly restrictive for modern tasks like deep learning. The recent advent of relaxed (L0,L1)-smoothness condition enables improved convergence rates for gradient methods. Despite centralized advances, its decentralized extension remains unexplored and challenging. In this work, we propose the first general framework for decentralized gradient descent (DGD) under (L0,L1)-smoothness by introducing novel analysis techniques. For deterministic settings, our method with adaptive clipping achieves the best-known convergence rates for convex/nonconvex functions without prior knowledge of L0 and L1 and bounded gradient assumption. In stochastic settings, we derive complexity bounds and identify conditions for improved complexity bound in convex optimization. The empirical validation with real datasets demonstrates gradient-norm-dependent smoothness, bridging theory and practice for (L0,L1)-decentralized optimization algorithms.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.