Certifying Hamilton-Jacobi Reachability Learned via Reinforcement Learning
Abstract
We present a framework to certify Hamilton--Jacobi (HJ) reachability learned by reinforcement learning (RL). Building on a discounted initial time travel-cost formulation that makes small-step RL value iteration provably equivalent to a forward Hamilton--Jacobi (HJ) equation with damping, we convert certified learning errors into calibrated inner/outer enclosures of strict backward reachable tube. The core device is an additive-offset identity: if Wλ solves the discounted travel-cost Hamilton--Jacobi--Bellman (HJB) equation, then W:=Wλ + solves the same PDE with a constant offset λ. This means that a uniform value error is exactly equal to a constant HJB offset. We establish this uniform value error via two routes: (A) a Bellman operator-residual bound, and (B) a HJB PDE-slack bound. Our framework preserves HJ-level safety semantics and is compatible with deep RL. We demonstrate the approach on a double-integrator system by formally certifying, via satisfiability modulo theories (SMT), a value function learned through reinforcement learning to induce provably correct inner and outer backward-reachable set enclosures over a compact region of interest.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.