Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches

Abstract

In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of O(LRn + LR p (1/δ)ε n) under (ε, δ)-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires (n) batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using n batch gradient steps with batch size n, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to n1/4 batch gradient steps with batch size n3/4. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…