Stochastic Multi-armed Bandits in Constant Space

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all K arms. We give an algorithm using O(1) words of space with regret \[ Σi=1K1i i T \] where i is the gap between the best arm and arm i and is the gap between the best and the second-best arms. If the rewards are bounded away from 0 and 1, this is within an O( 1/) factor of the optimum regret possible without space constraints.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…