Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

Nidhi Hegde

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

Abstract

In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal O (Σj: j>0 (T) \j, ε \ ) instance-dependent regret bound, where T is the finite learning horizon, j denotes the suboptimality gap between the optimal arm and a suboptimal arm j, and ε is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an ((K) \, ε \ ) instance-dependent regret lower bound and an (T(K) + (K)ε) minimax lower bound, where K is the total number of actions and denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an ε-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra (T) factor.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or open the topic learn hub

Discussion (0)

Sign in to join the discussion.

Loading comments…