Fast decentralized non-convex finite-sum optimization with recursive variance reduction

Abstract

This paper considers decentralized minimization of N:=nm smooth non-convex cost functions equally divided over a directed network of n nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an ε-accurate first-order stationary point with O(\N12,n(1-λ)-2,n23m13(1-λ)-1\Lε-2) gradient complexity, where (1-λ)∈(0,1] is the spectral gap of the network weight matrix and L is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that n = O(N12(1-λ)3), this gradient complexity furthers reduces to O(N12Lε-2), independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of 1/n compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…