A Note on a Tight Lower Bound for MNL-Bandit Assortment Selection Models

Abstract

In this short note we consider a dynamic assortment planning problem under the capacitated multinomial logit (MNL) bandit model. We prove a tight lower bound on the accumulated regret that matches existing regret upper bounds for all parameters (time horizon T, number of items N and maximum assortment capacity K) up to logarithmic factors. Our results close an O(K) gap between upper and lower regret bounds from existing works.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…