Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
Abstract
Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix B = (F F + η I)-1 during the interactive feature-learning stage of grokking: similar features have negative off-diagonal entries Bj , producing an effective repulsive force that drives them apart. However, the theorem does not specify when this mechanism becomes empirically observable, nor whether it leaves a measurable spectral signature in the parameter updates. We test this directly on Tian's modular addition setup ( M = 71 , K = 2048 , MSE loss) and observe a clear structure-mechanism dissociation. The predicted sign rule holds robustly on the top-200 most-similar feature pairs across activations (empirical sign-match rising from 0.865 to 0.985 on σ = x2 across 5 seeds, and saturating at 1.000 on σ = ReLU ). However, the spectral signature in the parameter updates is strongly activation-dependent. With σ = x2 , a simple slope detector on the rolling eigengap σ2 / σ3 of W fires in 15/15 grokking seeds at epoch 174 (IQR [173,174]) and in 0/15 non-grokking controls, with 229 × late-stage magnitude separation; the spectrum is rank-2. In contrast, with σ = ReLU , the detector never fires and the spectrum remains effectively rank-1. This dissociation aligns with Tian's Theorem 5 distinction between focused (power-law) and spreading (ReLU) memorization: while the sign structure of B depends only on F F , how feature repulsion translates into weight updates critically depends on the activation derivative σ' .
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.