Parameter-free version of Adaptive Gradient Methods for Strongly-Convex Functions

Abstract

The optimal learning rate for adaptive gradient methods applied to λ-strongly convex functions relies on the parameters λ and learning rate η. In this paper, we adapt a universal algorithm along the lines of Metagrad, to get rid of this dependence on λ and η. The main idea is to concurrently run multiple experts and combine their predictions to a master algorithm. This master enjoys O(d log T) regret bounds.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…