Approximate Top-m Arm Identification with Heterogeneous Reward Variances
Abstract
We study the effect of reward variance heterogeneity in the approximate top-m arm identification setting. In this setting, the reward for the i-th arm follows a σ2i-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify m arms with the largest means within error ε out of the n arms, with probability at least 1-δ. We show that the worst-case sample complexity of this problem is ( Σi =1n σi2ε2 1δ + Σi ∈ Gm σi2ε2 (m) + Σj ∈ Gl σj2ε2 Ent(σ2Gr) ), where Gm, Gl, Gr are certain specific subsets of the overall arm set \1, 2, …, n\, and Ent(·) is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.