Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
Abstract
Frozen Gemma 4 31B weights pretrained exclusively on text, unmodified, transfer through a thin trainable interface to non-text modalities the substrate has never processed. On the L24--L29 slice (192 attention heads), an English-text TxtCopy attention probe (95 sentences) and per-head ablation impact on four non-language token-pattern tasks (binary copy, associative recall, 1D cellular automaton Rule 90, binary addition) jointly classify four heads -- L26.28, L27.28, L27.2, L27.3 -- as top-tier on both signals. The slice-level joint coincidence is significant under hypergeometric null (P = 0.0013, N=192, K=38, n=4) and survives multiplicity-aware permutation tests (PV4 = 0.013). Pretrained Gemma L26 reaches 60.22% on OGBench cube-double-play-task1 vs ~1% for random-init Gemma (+59pt at n=3); a FrozenRandom-GPT2 control with correct 1/dk scaling also fails. Head-level causal validation: zeroing L26.28 in the trained cube-task1 IQL agent drops success 63.3\% 10.0\% vs 46.7\% for a layer-matched low-TxtCopy negative control (3.2× specificity at n=30; n=5 paired-t p=0.039). A full L26 sweep places L26.28 at rank 4 of 32. Honest negatives: within-L26 Spearman ρ(TxtCopy, drop) = +0.37 (opposite of within-layer causal reading); single-head activation patching does not transfer the matching variable; the 4 named heads alone do not suffice on any task; Walker2d-DT and scene-task1 recruit L24 outside the named slice and show null head-ablation specificity. We frame the contribution as a cross-distribution importance fingerprint at the slice level plus head-level causal evidence on one cross-modality target.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.