Differentially Private Kernel Density Estimation
Abstract
We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE), offering not only improved privacy-utility tradeoff but also better efficiency over prior results. Specifically, we study the mathematical problem: given a similarity function f (or DP KDE) and a private dataset X ⊂ Rd, our goal is to preprocess X so that for any query y∈Rd, we approximate Σx ∈ X f(x, y) in a differentially private fashion. The best previous algorithm for f(x,y) =\| x - y \|1 is the node-contaminated balanced binary tree by [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. Their algorithm requires O(nd) space and time for preprocessing with n=|X|. For any query point, the query time is d n, with an error guarantee of (1+α)-approximation and ε-1 α-0.5 d1.5 R 1.5 n. In this paper, we improve the best previous result [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024] in three aspects: - We reduce query time by a factor of α-1 n. - We improve the approximation ratio from α to 1. - We reduce the error dependence by a factor of α-0.5. From a technical perspective, our method of constructing the search tree differs from previous work [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. In prior work, for each query, the answer is split into α-1 n numbers, each derived from the summation of n values in interval tree countings. In contrast, we construct the tree differently, splitting the answer into n numbers, where each is a smart combination of two distance values, two counting values, and y itself. We believe our tree structure may be of independent interest.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.