High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Control in HL-3 Tokamak
Abstract
The success of reinforcement learning (RL)-based control in tokamaks, an emerging technique for controlled nuclear fusion with improved flexibility, typically requires substantial interaction with a simulator capable of accurately evolving the high-dimensional plasma state. Compared to first-principle-based simulators, whose intense computations lead to sluggish RL training, we devise an effective method to acquire a fully data-driven simulator, by mitigating the arising compounding error issue due to the underlying autoregressive nature. With high accuracy and appealing extrapolation capability, this high-fidelity dynamics model subsequently enables the rapid training of a qualified RL agent to directly generate engineering-reasonable actuator commands, aiming at the desired long-term targets of plasma configuration. Together with a surrogate model for Equilibrium Fitting code based on neural network, named EFITNN, the RL agent successfully maintains a 400-ms, 1 kHz trajectory control with accurate waveform tracking of plasma current and last closed flux surface on the HL-3 tokamak. Furthermore, it also demonstrates the feasibility of zero-shot adaptation to changed triangularity targets, confirming the robustness of the developed data-driven dynamics model. Our work underscores the advantage of fully data-driven dynamics models in yielding RL-based trajectory control policies at a sufficiently fast pace, an anticipated engineering requirement in daily discharge practices for the upcoming ITER device.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.