Strategies in POMDPs with Stage Duration

Abstract

Partially observable Markov decision processes (POMDPs) with stage duration provide a framework for approximating continuous-time behavior by scaling transition probabilities with a stage duration parameter h ∈ (0,1]. While previous literature has primarily focused on the limit of the discounted value as the stage duration h vanishes, this paper investigates the global behavior of the asymptotic value, V(h), across varying stage durations. Our main result demonstrates that any strategy in a POMDP with stage duration h can be mimicked in the base POMDP (h=1). Specifically, we provide an explicit construction showing that for any strategy in the POMDP with stage duration h, there exists a strategy in the base POMDP that secures the same asymptotic payoff. As a consequence of this theorem, we establish that the value function V(h) is nondecreasing with respect to h, and that the continuous-time limit h 0 V(h) exists.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…