Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Abstract

Many fundamental problems in machine learning can be formulated by the convex program \[ θ∈ Rd\ Σi=1nfi(θ), \] where each fi is a convex, Lipschitz function supported on a subset of di coordinates of θ. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one fi term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the fi's, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to ε-accuracy in O(Σi=1n di (1 /ε)) gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires O(nd (1/ε)) gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an fi term at every iteration via a novel combination of cutting-plane and interior-point methods.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…