Approximating the Uniform Value in Hidden Stochastic Games with Doeblin Condition
Abstract
We study zero-sum two-player hidden stochastic games, where players receive partial observations of the state. We focus on a central solution concept for analyzing long-duration stochastic games: the uniform value, a limiting average payoff that both players can guarantee for sufficiently long durations. In the general case, prior work provides examples of games that do not have a uniform value. Moreover, for the subclass of games that do have a uniform value, there exists no algorithm that approximates it. Therefore, we generalize the Doeblin condition for Markov chains (which guarantees the existence of a unique invariant measure) to hidden stochastic games. Informally, the Doeblin condition for hidden stochastic games requires that, for every way to play the game, there exists a fixed belief such that, no matter the initial belief over the state of the game, after sufficiently many stages, the posterior belief is probably close to this fixed belief. Under the Doeblin condition, we prove the existence of the uniform value, provide an algorithm to approximate it, and prove that no algorithm can compute it exactly. Then, we identify structural conditions on the transition function that ensure the Doeblin condition holds both in the blind setting, where observations are uninformative, and in the hidden setting, where observations are partially informative. When considering games with only one player, namely partially observable Markov decision processes, our results provide a novel subclass in which the uniform value exists and can be approximated, but cannot be computed exactly
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.