Circuit Synchronization Precedes Generalization: A Causal Precursor to Grokking
Abstract
Grokking is the delayed generalisation phenomenon where a transformer trained on modular arithmetic abruptly transitions from near-chance to near-perfect validation accuracy. It has been attributed to a Fourier-based algorithmic circuit, but its timing, causal structure, and controllability remain poorly understood. We introduce the Frequency Synchronization Degree (FSD), a normalised, permutation-tested metric for Fourier circuit synchronisation requiring no prior knowledge of the circuit. Across nine modular addition configurations (five primes, three seeds), FSD reaches its post-grokking level 500 to 3000 steps before grokking (mean lead 1722 steps, every configuration positive, sign-test p approx 0.004), and synchronises before a restricted-logit loss baseline in all nine cases, making it the earliest available predictor. We give direct causal evidence that the inter-phase gap is a regularisation phenomenon: forking training at the FSD-ceiling step and varying weight decay lambda produces monotonically earlier grokking, with delta-t proportional to 1/lambda. This law replicates across three primes (R-squared 0.89 to 0.99 on seed-averaged delta-t); per-run R-squared is unstable due to the chaotic transition, so we report error bars rather than single runs. Grokking occurs at a near-constant memorisation norm across lambda, grounding the constant in a threshold mechanism. This is not an artefact of applying a Fourier detector to a Fourier circuit: on the non-abelian group S5, a basis-faithful generalisation of FSD precedes grokking on all six seeds, while the original Fourier FSD does not. Using the FSD ceiling to schedule a weight-decay increase also accelerates grokking over a fixed schedule without destabilising training. An attention-only variant groks with a strong FSD precursor while an MLP-only model never groks.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.