Distribution-Aware Feature Selection for SAEs
Abstract
Sparse autoencoders (SAEs) decompose neural activations into interpretable features. A widely adopted variant, the TopK SAE, reconstructs each token from its K most active latents. However, this approach is inefficient, as some tokens carry more information than others. BatchTopK addresses this limitation by selecting top activations across a batch of tokens. This improves average reconstruction but risks an "activation lottery," where rare high-magnitude features crowd out more informative but lower-magnitude ones. To address this issue, we introduce Sampled-SAE: we score the columns (representing features) of the batch activation matrix (via L2 norm or entropy), forming a candidate pool of size Kl, and then apply Top-K to select tokens across the batch from the restricted pool of features. Varying l traces a spectrum between batch-level and token-specific selection. At l=1, tokens draw only from K globally influential features, while larger l expands the pool toward standard BatchTopK and more token-specific features across the batch. Small l thus enforces global consistency; large l favors fine-grained reconstruction. On Pythia-160M, no single value optimizes l across all metrics: the best choice depends on the trade-off between shared structure, reconstruction fidelity, and downstream performance. Sampled-SAE thus reframes BatchTopK as a tunable, distribution-aware family.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.