Stochastic Approximation for Risk-aware Markov Decision Processes
Abstract
We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance ε>0 for the optimal Q-value estimation gap and learning rate k∈(1/2,\,1], the overall convergence rate of our algorithm is (((1/δε)/ε2)1/k+((1/ε))1/(1-k)) with probability at least 1-δ.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.