An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems

Abstract

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the εt-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time t goes to infinity. A particular example sequence of \εt\ having the asymptotic convergence rate in the order of (1-1t)4 that holds from a sufficiently large t is also discussed.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…