Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making
Abstract
Mission critical applications, such as UAV-assisted IoT networks require risk-aware decision-making under dynamic topologies and uncertain channels. We propose meta-conservative quantile regression (M-CQR), a meta-offline distributional MARL algorithm that integrates conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML) for rapid adaptation. Two variants are developed: meta-independent CQR (M-I-CQR) and meta-CTDE-CQR. In a UAV-based communication scenario, M-CTDE-CQR achieves up to 50% faster convergence and outperforms baseline MARL methods, offering improved scalability, robustness, and adaptability for risk-sensitive decision-making. Code is available at https://github.com/Eslam211/MAMetaODRL
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.