Towards Instance Optimal Bounds for Best Arm Identification
Abstract
In the classical best arm identification (Best-1-Arm) problem, we are given n stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least 1-δ, using as few samples as possible. Understanding the sample complexity of Best-1-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-1-Arm. Given an instance I, let μ[i] be the ith largest mean and [i]=μ[1]-μ[i] be the corresponding gap. H(I)=Σi=2n[i]-2 is the complexity of the instance. The gap-entropy conjecture states that (H(I)·(δ-1+Ent(I))) is an instance lower bound, where Ent(I) is an entropy-like term determined by the gaps, and there is a δ-correct algorithm for Best-1-Arm with sample complexity O(H(I)·(δ-1+Ent(I))+[2]-2[2]-1). If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-1-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O(H(I)·(δ-1 +Ent(I))+[2]-2[2]-1polylog(n,δ-1))\] samples in expectation. For the lower bound, we show that for any Gaussian Best-1-Arm instance with gaps of the form 2-k, any δ-correct monotone algorithm requires (H(I)·(δ-1 + Ent(I))) samples in expectation.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.