Learning When to Act: Communication-Efficient Reinforcement Learning via Run-Time Assurance

Tristan Schuler

Learning When to Act: Communication-Efficient Reinforcement Learning via Run-Time Assurance

Abstract

Safe reinforcement learning (RL) typically asks what an agent should do. We ask when it needs to act, and show that a single policy can jointly learn control inputs and communication-efficient timing decisions under a pointwise Lyapunov safety shield. We focus on stabilization around a known equilibrium, where CARE-based LQR backups, Lyapunov certificates, and classical Lyapunov-STC are well defined, enabling clean comparison against analytical baselines. A run-time assurance (RTA) layer overrides the policy via a one-step-ahead Lyapunov prediction and a precomputed LQR backup, providing a strictly stronger guarantee than constrained MDP methods that enforce safety only in expectation. On an inverted pendulum, cart--pole, and planar quadrotor, the learned policy achieves 1.91×, 1.45×, and 3.51× higher mean inter-sample interval (MSI) than a Lyapunov-triggered baseline; a fixed LQR controller at the same average rate is unstable on all three plants, showing that adaptive timing, not a lower average rate, makes sparsity safe. A CARE-derived Lyapunov reward transfers across environments without redesign, with a single weight wc controlling the stability--communication tradeoff; ablations confirm the RTA shield is essential, with its removal reducing MSI by 1.27--1.84× and degrading state norms. A preference-conditioned extension recovers the full tradeoff frontier from one model at 211 of training compute, and SAC experiments show the results are algorithm-agnostic across discrete and continuous domains. A 12-state 3D quadrotor case study extends the framework to higher-dimensional systems where classical STC is intractable, and robustness to 30\% mass variation and disturbances shows graceful degradation, with the RTA absorbing what the learned policy cannot.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…