Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Abstract

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a O(TA T(T)) regret in T rounds on some "easy" instances, where A is the number of arms and T the number of change-points, without prior knowledge of T. In contrast with recently proposed algorithms that are agnostic to T, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…