Hidden-State Privacy Has an Empty Middle
Abstract
Of 1,536 Gaussian release covariances we tested for single-layer hidden-state privacy, zero achieve both moderate utility and moderate privacy against an adaptive retrieval attacker. We prove a complementary Fisher-ball lower bound: every full-rank Gaussian release at O(1) Fisher utility admits a direction whose Mahalanobis signal grows linearly in hidden width, ruling out uniform Gaussian safety in the class and matching the empirical empty middle. The diagonal inverse-Fisher release Σdiag(K) = (2K/d)\,diag(1/Fii) is the unique minimax-optimal diagonal mechanism at first-order KL budget K and the only release with worst-attacker top-1 0.001 at every point of a 32 model-layer grid, but it sits on a privacy/utility edge rather than filling the middle. A generalized-eigen mechanism reaching 13× Pareto reduction under Euclidean retrieval collapses to 100\% top-1 under the adaptive Mahalanobis attacker, and a full-trajectory sequence inverter recovers 94\% of clean GPT-2 prefixes but 0\% under Σdiag. A split-memory transformer trained from scratch reaches GMah ∈ [20, 33] at 90M and maintains a 6--24× advantage over same-budget GPT baselines from 30M to 1B at a fixed-token language-modeling loss penalty; pretrained models top out at 9.3. These results reframe hidden-state release from mechanism-design within the Gaussian class to architecture or release co-design.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.