Near-Optimal No-Regret Learning in General Games

Noah Golowich

Near-Optimal No-Regret Learning in General Games

Abstract

We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains poly( T) regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after T rounds of interaction, each player experiences total regret that is poly( T). Our bound improves, exponentially, the O(T1/2) regret attainable by standard no-regret learners in games, the O(T1/4) regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the O(T1/6) bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of O( 1T).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…