Coherent Rollout Oracles for Finite-Horizon Sequential Decision Problems
Abstract
Coherent quantum rollout for sequential decision problems requires a unitary simulator: randomness must live in explicit quantum registers, and basis-state selectors must be mapped to actions reversibly. With branch-dependent valid actions, this mapping is totalized coherent rank-select over an entangled N-bit validity mask: return the position of the r-th valid bit, or a sentinel if r is out of range. We give the first reversible-circuit complexity analysis of this primitive. For selector width w = 2(N+1) , rank-select admits an O(Nw)-gate low-ancilla bounded-span scan, proved gate-optimal in its model, and an O(N w)-gate low-ancilla blocked construction when long-range gates are available; across all bounded-fan-in layouts, the unconditional gate lower bound is (N). Composing rank-select with reversible transition and predicate-evaluation circuits gives an explicit polynomial-size coherent rollout oracle for finite-horizon planning problems satisfying these primitive assumptions. The resulting oracle satisfies the access model of the best-arm pipeline of Wang et al., yielding O(k/) coherent oracle calls against the standard classical (k/2) arm-pull lower bound. We give a bounded-influence lifting theorem that extends this lower-bound construction from a base configuration to an exponential family of configurations. We instantiate the construction on SIR epidemic intervention, with a stochastic placement-game sanity check, and machine-check the main results in Lean 4. Code and proofs: https://github.com/BinRoot/b01t/tree/main/demos/rollout.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.