Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback
Abstract
We study the Submodular Welfare Problem (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under bandit feedback. Classical SWP assumes full value-oracle access, achieving (1-1/e) approximations via continuous-greedy algorithms. We extend this to a multi-agent combinatorial bandit framework (MA-CMAB), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving O(T2/3) regret against a (1-1/e) benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.