Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation

Lillian J. Ratliff

Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation

Abstract

We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-regret is impossible without incurring an exponential dependence on the episode length H. They then turn to the weaker notion of Nash-value regret and propose a V-learning algorithm with regret O(K2/3) after K episodes. However, their algorithm and guarantee do not adapt to the difficulty of the problem: even in the case where the opponent follows a fixed policy and thus O(K) external regret is well-known to be achievable, their result is still the worse rate O(K2/3) on a weaker metric. In this work, we fully address both limitations. First, we introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret and naturally reduces to external regret when the opponent follows a fixed policy. Moreover, under this new metric, we propose a parameter-free algorithm that achieves an O( \K + (CK)1/3,LK\) regret bound, where C quantifies the variance of the opponent's policies and L denotes the number of policy switches (both at most O(K)). Therefore, our results not only recover the two extremes -- O(K) external regret when the opponent is fixed and O(K2/3) Nash-value regret in the worst case -- but also smoothly interpolate between these extremes by automatically adapting to the opponent's non-stationarity. We achieve so by first providing a new analysis of the epoch-based V-learning algorithm by Mao et al. (2022), establishing an O(η C + K/η) regret bound, where η is the epoch incremental factor. Next, we show how to adaptively restart this algorithm with an appropriate η in response to the potential non-stationarity of the opponent, eventually achieving our final results.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or open the topic learn hub

Discussion (0)

Sign in to join the discussion.

Loading comments…