Differentially Private Sparse Linear Regression with Heavy-tailed Responses

Abstract

As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of \( O( s* 1 2 · ( dn)ζ1 + ζ + s* 1 + 2ζ2 + 2ζ · (2 dn )ζ1 + ζ ) \) under the (, δ)-DP model, where n is the sample size, d is the dimensionality, s* is the sparsity of the parameter, and ζ ∈ (0, 1] characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves \( O((s*)3/2 dn ). \) Compared to the first result, this bound is independent of the tail parameter ζ. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular'' data.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…