Locally Differentially-Private Randomized Response for Discrete Distribution Learning
Abstract
We consider a setup in which confidential i.i.d. samples X1,…c,Xn from an unknown finite-support distribution p are passed through n copies of a discrete privatization channel (a.k.a. mechanism) producing outputs Y1,…c,Yn. The channel law guarantees a local differential privacy of ε. Subject to a prescribed privacy level ε, the optimal channel should be designed such that an estimate of the source distribution based on the channel outputs Y1,…c,Yn converges as fast as possible to the exact value p. For this purpose we study the convergence to zero of three distribution distance metrics: f-divergence, mean-squared error and total variation. We derive the respective normalized first-order terms of convergence (as n∞), which for a given target privacy ε represent a rule-of-thumb factor by which the sample size must be augmented so as to achieve the same estimation accuracy as that of a non-randomizing channel. We formulate the privacy-fidelity trade-off problem as being that of minimizing said first-order term under a privacy constraint ε. We further identify a scalar quantity that captures the essence of this trade-off, and prove bounds and data-processing inequalities on this quantity. For some specific instances of the privacy-fidelity trade-off problem, we derive inner and outer bounds on the optimal trade-off curve.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.