Byzantine Stochastic Gradient Descent

Jerry Li

Byzantine Stochastic Gradient Descent

Abstract

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the m machines which allegedly compute stochastic gradients every iteration, an α-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds -approximate minimizers of convex functions in T = O( 12 m + α22 ) iterations. In contrast, traditional mini-batch SGD needs T = O( 12 m ) iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…