Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
Abstract
In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal O (Σj: j>0 (T) \j, ε \ ) instance-dependent regret bound, where T is the finite learning horizon, j denotes the suboptimality gap between the optimal arm and a suboptimal arm j, and ε is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an ((K) \, ε \ ) instance-dependent regret lower bound and an (T(K) + (K)ε) minimax lower bound, where K is the total number of actions and denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an ε-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra (T) factor.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.