A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
Abstract
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set X⊂Rd, a fixed budget T, and an unpredictable sequence of parameters θtt=1T, an algorithm will aim to correctly identify the best arm x* := x∈XxΣt=1Tθt with probability as high as possible. Prior work has addressed the stationary setting where θt = θ1 for all t and demonstrated that the error probability decreases as (-T /*) for a problem-dependent constant *. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over X at each time then the error probability decreases as (-T2(1)/d), where (1) = x ≠ x* (x* - x) 1TΣt=1T θt. As there exist environments where (1)2/ d 1/ *, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.