Delta-Audit: Explaining What Changes When Models Change
Abstract
Model updates (new hyperparameters, kernels, depths, solvers, or data) change performance, but the reason often remains opaque. We introduce Delta-Attribution (-Attribution), a model-agnostic framework that explains what changed between versions A and B by differencing per-feature attributions: φ(x)=φB(x)-φA(x). We evaluate φ with a -Attribution Quality Suite covering magnitude/sparsity (L1, Top-k, entropy), agreement/shift (rank-overlap@10, Jensen--Shannon divergence), behavioural alignment (Delta Conservation Error, DCE; Behaviour--Attribution Coupling, BAC; CO), and robustness (noise, baseline sensitivity, grouped occlusion). Instantiated via fast occlusion/clamping in standardized space with a class-anchored margin and baseline averaging, we audit 45 settings: five classical families (Logistic Regression, SVC, Random Forests, Gradient Boosting, kNN), three datasets (Breast Cancer, Wine, Digits), and three A/B pairs per family. Findings. Inductive-bias changes yield large, behaviour-aligned deltas (e.g., SVC poly\!→rbf on Breast Cancer: BAC≈0.998, DCE≈6.6; Random Forest feature-rule swap on Digits: BAC≈0.997, DCE≈7.5), while ``cosmetic'' tweaks (SVC gamma=scale vs.\ auto, kNN search) show rank-overlap@10=1.0 and DCE≈0. The largest redistribution appears for deeper GB on Breast Cancer (JSD≈0.357). -Attribution offers a lightweight update audit that complements accuracy by distinguishing benign changes from behaviourally meaningful or risky reliance shifts.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.