Information-theoretic Estimation of the Risk of Privacy Leaks
Abstract
Recent work~Liu2016 has shown that dependencies between items in a dataset can lead to privacy leaks. We extend this concept to privacy-preserving transformations, considering a broader set of dependencies captured by correlation metrics. Specifically, we measure the correlation between the original data and their noisy responses from a randomizer as an indicator of potential privacy breaches. This paper aims to leverage information-theoretic measures, such as the Maximal Information Coefficient (MIC), to estimate privacy leaks and derive novel, computationally efficient privacy leak estimators. We extend the 1-to-2 formulation~Evfimievski2003 to incorporate entropy, mutual information, and the degree of anonymity for a more comprehensive measure of privacy risk. Our proposed hybrid metric can identify correlation dependencies between attributes in the dataset, serving as a proxy for privacy leak vulnerabilities. This metric provides a computationally efficient worst-case measure of privacy loss, utilizing the inherent characteristics of the data to prevent privacy breaches.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.