Asynchronous Variance-reduced Block Schemes for Composite Nonconvex Stochastic Optimization: Block-specific Steplengths and Adapted Batch-sizes
Abstract
We consider the minimization of a sum of an expectation-valued coordinate-wise Li-smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each iteration, a single block is randomly chosen to update its estimates by a proximal variable sample-size stochastic gradient scheme, while the remaining blocks are kept invariant. Notably, each block employs a steplength that is in accordance with its block-specific Lipschitz constant while block-specific batch-sizes are random variables updated at a rate that grows either at a geometric or polynomial rate with the (random) number of times that block is selected. We show that every limit point for almost every sample path is a stationary point and establish the ergodic non-asymptotic rate O(1/K) . Iteration and oracle complexity to obtain an ε-stationary point are shown to be O(1/ε) and O(1/ε2), respectively. Furthermore, under a μ -proximal Polyak-ojasiewicz (PL) condition with the batch size increasing at a geometric rate, we prove that the suboptimality diminishes at a geometric rate, the optimal deterministic rate while iteration and oracle complexity to obtain an ε-optimal solution are proven to be O( (L max/μ) (1/ε)) and O((L ave/μ) (1/ε)1+c ) with c≥ 0, respectively. In pursuit of less aggressive sampling rates, when the batch sizes increase at a polynomial rate of degree v ≥ 1, suboptimality decays at a corresponding polynomial rate while the iteration and oracle complexity to obtain an ε-optimal solution are provably O ( v(1/ε)1/v) and O (ev v2v+1(1/ε)1+1/v), respectively.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.