Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits
Abstract
We study cooperative stochastic multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification. In each of T rounds, each of N agents selects an arm, the environment generates a clean reward vector, and an adversary perturbs the observed feedback subject to a global corruption budget . Performance is measured by team regret under a coordinate-wise nondecreasing, L-Lipschitz scalarization φ, covering linear, Chebyshev, and smooth monotone utilities. Our main contribution is a communication-corruption coupling: we show that a fixed environment-side budget can translate into an effective corruption level ranging from to N, depending on whether agents share raw samples, sufficient statistics, or only arm recommendations. We formalize this via a protocol-induced multiplicity functional and prove regret bounds parameterized by the resulting effective corruption. As corollaries, raw-sample sharing can suffer an N-fold larger additive corruption penalty, whereas summary sharing and recommendation-only sharing preserve an unamplified O() term and achieve centralized-rate team regret. We further establish information-theoretic limits, including an unavoidable additive () penalty and a high-corruption regime =(NT) where sublinear regret is impossible without clean information. Finally, we characterize how a global budget of verified observations restores learnability. That is, verification is necessary in the high-corruption regime, and sufficient once it crosses the identification threshold, with certified sharing enabling the team's regret to become independent of .
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.