Features have life history. And we should care
Abstract
Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M and -410M as the carrier scaffold: 50 sparse features with stable life histories, around which the model's representational structure organises. It has four properties. (i)~It assembles early: features emerge, die, and reorganise 40\!× faster in the first 1\% of training than afterwards, and the scaffold is already largely fixed by then. (ii)~It is load-bearing: joint cross-layer ablation identifies the carriers as far more load-bearing than any count-matched non-scaffold population, a gap invisible to per-firing single-feature methods. (iii)~Function precedes direction: which features will become carriers is already predictable from training-onset firing patterns alone, correctly distinguishing future carriers from non-carriers in 4 of 5 cases, before the geometry has settled. (iv)~It seeds subsequent development: by the end of training, scaffold carriers have recruited 64\% of all active features into the scaffold hierarchy. Life history is consistent with a two-phase account of training: selection appears to largely determine the scaffold in the first 1\%; the remaining 99\% appears to calibrate geometry around a substrate already set.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.