Lipschitz Regularity in Wasserstein Robust Stochastic Optimal Control
Abstract
Robust Markov decision processes provide a principled framework for protecting sequential decision-making against transition-law misspecification and have attracted substantial recent research interest. As in non-robust stochastic optimal control, an important question is whether the robust value function is sufficiently regular for approximation and learning. This paper studies Lipschitz regularity of optimal value functions for Wasserstein robust stochastic optimal control on possibly unbounded Polish state spaces under an infinite-horizon discounted reward criterion. We consider two robustness formulations: a kernel-robust model, in which the adversary perturbs the next-state distribution within a Wasserstein ball around a nominal transition kernel, and a noise-robust model, in which the adversary perturbs the driving noise of the state transition dynamics. In the non-robust setting, Lipschitz rewards and Lipschitz transition dynamics do not, in general, imply a Lipschitz value function. In contrast, we show that in these Wasserstein robust formulations, Lipschitz assumptions on the model primitives yield Lipschitz robust value functions. Thus, Wasserstein robustness not only protects against misspecification but also regularizes the Bellman fixed point, providing stability relevant to discretization, value-function approximation, estimation, and learning.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.