Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Abstract

In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…