Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

Ming Li

Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

Abstract

This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget ε, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…