Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
Abstract
We study distributed adversarial bandits, where N agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is Θ((ρ-1/2+K/N)T), where T is the horizon, K is the number of actions, and ρ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound O(ρ-1/3(KT)2/3) of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost ρ-1/4T and a bandit cost KT/N. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in Rd, obtaining a regret bound of O((ρ-1/2+1/N)dT), achieved with only O(d) communication cost per agent and per round via a volumetric spanner.