Fixed-point equations solving Risk-sensitive MDP with constraint
Abstract
There are no computationally feasible algorithms that provide solutions to the finite horizon Risk-sensitive Constrained Markov Decision Process (Risk-CMDP) problem, even for problems with moderate horizon. With an aim to design the same, we derive a fixed-point equation such that the optimal policy of Risk-CMDP is also a solution. We further provide two optimization problems equivalent to the Risk-CMDP. These formulations are instrumental in designing a global algorithm that converges to the optimal policy. The proposed algorithm is based on random restarts and a local improvement step, where the local improvement step utilizes the solution of the derived fixed-point equation; random restarts ensure global optimization. We also provide numerical examples to illustrate the feasibility of our algorithm for inventory control problem with risk-sensitive cost and constraint. The complexity of the algorithm grows only linearly with the time-horizon.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.