LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Abstract
We investigate the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a Best-of-Both-Worlds (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on Follow-The-Regularized-Leader (FTRL) with Tsallis entropy, referred to as the α-Linear-Contextual (LC)-Tsallis-INF. We show that its regret is at most O((T)) in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most O(T) in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the margin condition with a parameter β ∈ (1, ∞], which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves O((T)1+β2+βT12+β) regret under the margin condition.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.