The Algorithmic Advantage: How Reinforcement Learning Generates Rich Communication
Abstract
We analyze strategic communication when advice is generated by a reinforcement-learning algorithm rather than by a fully rational sender. Building on the cheap-talk framework of Crawford and Sobel (1982), an advisor adapts its messages based on payoff feedback, while a decision maker best-responds. We provide a theoretical analysis of the long-run communication outcomes induced by such reward-driven adaptation. With aligned preferences, we establish that learning robustly leads to informative communication even from uninformative initial policies. With misaligned preferences, no stable outcome exists; instead, learning generates cycles that sustain highly informative communication and payoffs exceeding those of any static equilibrium.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.