A Unified Theory of Random Projection for Influence Functions
Abstract
Influence functions and related data attribution scores take the form of gF-1g, where F 0 is a curvature operator. In modern overparametrized models, forming or inverting F∈Rd× d is prohibitive, motivating scalable influence computation via random projection with a sketch P ∈ Rm× d. This practice is commonly justified via the Johnson--Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, JL does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely-used techniques, such as ridge regularization and structured curvature approximations. We develop a unified theory characterizing when projection provably preserves influence functions. When g,g∈range(F), we show that: 1) Unregularized projection: exact preservation holds iff P is injective on range(F), which necessitates m≥ rank(F); 2) Regularized projection: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the effective dimension of F at the regularization scale; 3) Factorized influence: for Kronecker-factored curvatures F=A E, the guarantees continue to hold for decoupled sketches P=PA PE, even though such sketches exhibit row correlations that violate i.i.d. assumptions. Beyond this range-restricted setting, we analyze out-of-range test gradients and quantify a leakage term that arises when test gradients have components in (F). This yields guarantees for influence queries on general test points. Overall, this work develops a novel theory that characterizes when projection provably preserves influence and provides principled guidance for choosing the sketch size in practice.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.