Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

Abstract

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations with different assumptions, we provide lower bounds for regret with standard asymptotics O(t) but novel coefficients and provide optimal algorithms, thus proving that these bounds cannot be improved.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…