Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering
Abstract
We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information S from publicly accessible data X without using statistics. We consider the problem of generating and releasing a quantization X of X to minimize the privacy leakage of S to X while maintaining a certain level of utility (or, inversely, the quantization loss). The variables S and S are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction L0(S → X) and the refined information I*(S; X) (also called the maximin information) of S. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution X by iteratively merging elements in the alphabet of X. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of X by observing X and the maximal distortion of the released data X. We show that the value of the maximin information I*(S; X) can be determined by dividing the confusability graph into connected subgraphs. Hence, I*(S; X) can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G\'acs-K\"orner common information is the stochastic version of I* and indicates the attainability of statistical indistinguishability.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.