Optimal navigation in two-dimensional flows: Control theory and reinforcement learning
Abstract
Zermelo's navigation problem seeks the trajectory of minimal travel time between two points in a fluid flow. We address this problem for an agent -- such as a floating drone or active particle -- that is advected by a two-dimensional flow, self-propels at a fixed speed smaller than or comparable to the characteristic flow velocity, and can steer its direction. The flows considered span increasing levels of complexity, from steady solid-body rotation and time-dependent sink-vortex to the Taylor-Green flow and turbulence in the inverse energy cascade regime. Although optimal-control theory provides time-minimizing trajectories, these solutions become unstable in chaotic regimes characterized by positive finite-time Lyapunov exponents. To design robust navigation strategies, we apply reinforcement learning and compare Q-learning with a one-step actor-critic algorithm. Both methods achieve successful navigation, yielding mean travel times within 3-10% of optimal-control solutions in regular flows, while the discrepancy increases to 35-75% in time-dependent turbulent flows. Finally, we show that agents trained on coarse-grained turbulent flows generalize to the full velocity field. This robustness to incomplete flow information is essential for practical navigation in real-world oceanic and atmospheric environments.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.