Limits on Gradient Compression for Stochastic Optimization
Abstract
We consider stochastic optimization over p spaces using access to a first-order oracle. We ask: What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates? We characterize this precision for every p≥ 1 by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new and easy to implement. In particular, our results are exact for p=2 and p=∞, showing the minimum precision needed in these settings are (d) and ( d), respectively. The latter result is surprising since recovering the gradient vector will require (d) bits.
0