The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits
Abstract
We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of O(n2) requires ((1/)(1/)) passes. Here, n is the number of arms and is the reward gap between the best and the second-best arms. Our result matches the O((1))-pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses O(1) memory and answers an open question posed by Assadi and Wang [STOC'20].
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.