Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model
Abstract
This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called ~and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an -approximate CCE in a general-sum Markov game using O( H4 S Σi=1m Ai2 ) samples, where m is the number of players, S indicates the number of states, H is the horizon, and Ai denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when the number of players is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an -approximate Nash equilibrium with minimal samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.