Differentially Private Clustering in Data Streams

Abstract

Clustering problems (such as k-means and k-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms may not be as applicable in many scenarios. In this work, we provide the first differentially private algorithms for k-means and k-median clustering of d-dimensional Euclidean data points over a stream with length at most T using space that is sublinear (in T) in the continual release setting where the algorithm is required to output a clustering at every timestep. We achieve (1) an O(1)-multiplicative approximation with O(k1.5 · poly(d,(T))) space and poly(k,d,(T)) additive error, or (2) a (1+γ)-multiplicative approximation with Oγ(poly(k,2Oγ(d),(T))) space for any γ>0, and the additive error is poly(k,2Oγ(d),(T)). Our main technical contribution is a differentially private clustering framework for data streams which only requires an offline DP coreset or clustering algorithm as a blackbox.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…