Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning
Abstract
We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For K actions, losses in [0,1], and a unique best action separated from the second-best action by gap Δ, the known lower bound is of order K\Δ,\, or equivalently, up to universal constants, of order \[ KΔ+ K. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ RegT 1000 · ( KΔ+ K) \] for every horizon T. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost K/.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.