No Coin Left Behind: Maximizing Strategic Surplus Against No-Regret Dynamics
Abstract
We investigate the strategic surplus obtainable against a Follow-the-Regularized-Leader (FTRL) learner with constant step size η in n× m two-player zero-sum games played over T rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that the extraction of such regret-scale surplus is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for a fixed max-min optimizer, we establish a sweeping law of order Ω(Nsub/η), proving that utility surplus scales with the number of the learner's suboptimal actions N and vanishes in their absence. Second, for an alternating optimizer, a surplus of Ω(ηT/poly(n,m)) can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers a sharp geometric dichotomy: non-steep regularizers allow the optimizer to realize the maximal transient surplus via finite-time elimination of suboptimal actions, whereas steep regularizers introduce a vanishing tail correction that can delay surplus saturation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and propose a susceptibility measure quantifying which regularizers are most vulnerable to learner-aware strategic steering.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.