Harnessing Environmental Memory with Reinforcement Learning in Open Quantum Systems
Abstract
Non-Markovian memory effects in open quantum systems provide valuable resources for preserving coherence and enhancing controllability. However, exploiting them requires strategies adapted to history-dependent dynamics. We introduce a reinforcement-learning framework that autonomously learns to amplify information backflow in a driven two-level system coupled to a structured reservoir. Using a reward based on the positive time derivative of the trace distance associated with the Breuer-Laine-Piilo measure, we train PPO and SAC agents and benchmark their performance against gradient-based optimal control theory (OCT). While OCT enhances a single dominant backflow peak, RL policies broaden this revival and activate additional contributions in later memory windows, producing sustained positive trace-distance growth over a longer duration. Consequently, the integrated non-Markovianity achieved by RL substantially exceeds that obtained with OCT. These results demonstrate how long-horizon, model-free learning naturally uncovers distributed-backflow strategies and highlight the potential of reinforcement learning for engineering memory effects in open quantum systems.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.