Quantum Algorithms for Finite-horizon Markov Decision Processes
Abstract
In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment's dynamics (i.e., transition probabilities), we prove that our Quantum Value Iteration (QVI) algorithm QVI-1 achieves a quadratic speedup in the size of the action space (A) compared with the classical value iteration algorithm for computing the optimal policy (π*) and the optimal V-value function (V0*). Furthermore, our algorithm QVI-2 provides an additional speedup in the size of the state space (S) when obtaining near-optimal policies and V-value functions. Both QVI-1 and QVI-2 achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on S and A. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms QVI-3 and QVI-4 achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of A, estimation error (ε), and time horizon (H). More importantly, we prove quantum lower bounds to show that QVI-3 and QVI-4 are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.