Grokking as Dimensional Phase Transition in Neural Networks
Abstract
Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a dimensional phase transition: effective dimensionality~D crosses from sub-diffusive (subcritical, D < 1) to super-diffusive (supercritical, D > 1) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, D reflects gradient field geometry, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain D ≈ 1 regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized D(t) crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.