From Theory to Practice with RAVEN-UCB: Addressing Non-Stationarity in Multi-Armed Bandits through Variance Adaptation

Abstract

The Multi-Armed Bandit (MAB) problem is challenging in non-stationary environments where reward distributions evolve dynamically. We introduce RAVEN-UCB, a novel algorithm that combines theoretical rigor with practical efficiency via variance-aware adaptation. It achieves tighter regret bounds than UCB1 and UCB-V, with gap-dependent regret of order K σ2 T / and gap-independent regret of order K T T. RAVEN-UCB incorporates three innovations: (1) variance-driven exploration using σk2 / (Nk + 1) in confidence bounds, (2) adaptive control via αt = α0 / (t + ε), and (3) constant-time recursive updates for efficiency. Experiments across non-stationary patterns - distributional changes, periodic shifts, and temporary fluctuations - in synthetic and logistics scenarios demonstrate its superiority over state-of-the-art baselines, confirming theoretical and practical robustness.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…