Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions
Abstract
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a (,C,T) self-bounding constraint the algorithm achieves O((Σi≠ i* 1i)+((K-1)T(Σi≠ i* 1i)2)+C(Σi≠ i*1i)+((K-1)TCΣi≠ i*1i)) regret bound, where T is the time horizon, K is the number of arms, i are the suboptimality gaps, i* is the best arm, C is the corruption magnitude, and +(x) = (1, x). The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.