Stochastic Top-K Subset Bandits with Linear Space and Non-Linear Feedback
Abstract
Many real-world problems like Social Influence Maximization face the dilemma of choosing the best K out of N options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses K out of N arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen K arms. The direct use of multi-armed bandit requires choosing among N-choose-K options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in N. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a regret bound of O(K12N13T23) for a time horizon T, which is sub-linear in all parameters T, N, and K. %When applied to the problem of Social Influence Maximization, the performance of the proposed algorithm surpasses the UCB algorithm and some more sophisticated domain-specific methods.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.