Private Distribution Learning with Public Data: The View from Sample Compression

Abstract

We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution p belonging to a class Q, with the goal of outputting an estimate of p while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class Q is connected to the existence of a sample compression scheme for Q, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over Rd; and (2) leads to new ones, including sample complexity upper bounds for arbitrary k-mixtures of Gaussians over Rd, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in Rd, at least d public samples are necessary for private learnability, which is close to the known upper bound of d+1 public samples.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…