On Safer Reinforcement Learning for Sedation and Analgesia in Intensive Care
Abstract
Pain management in intensive care usually involves complex trade-offs, since both inadequate and excessive treatment can compromise patient safety. Prior work on reinforcement learning for sedation and analgesia has explored how to optimize these interventions, but has not considered patient survival or partial observability. To investigate the risks of these design choices, we developed an offline deep reinforcement learning framework that suggests hourly medication doses based on recurrent state representations. Using retrospective data from 47,144 ICU stays in the MIMIC-IV database, we trained and evaluated behavior-regularized actor-critic models that prescribe continuous doses of opioids, propofol, benzodiazepines, and dexmedetomidine according to two goals: reduce pain or jointly reduce pain and 30-day post-discharge mortality. Although the two resulting policies were associated with lower pain, clinician agreement with the pain-only policy was positively correlated with mortality (ρ=0.119, p<0.0001), while agreement with the joint policy was negatively correlated (ρ=-0.316, p<0.0001). We found that such divergence arose from a different response to high levels of comorbidity. This suggests that valuing post-discharge outcomes could be critical for learning safer treatment policies, even if a short-term goal remains the primary objective.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.