Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

Abstract

Sparse autoencoders (SAEs) have become a leading tool for interpreting the representations of vision foundation models, decomposing their polysemantic activations into a larger set of sparse, more monosemantic features. The Top-k SAE, a now-standard variant, enforces sparsity architecturally through its activation function, retaining only the k most active latents per input. Because it was designed precisely to avoid the 1 penalty used by earlier SAEs and its known drawbacks, it has not been combined with an explicit sparsity regularizer, despite retaining limitations of its own, such as a budget k that is fixed regardless of input complexity and a tendency to overfit to the training value of k. We introduce two sparsity regularizers compatible with the Top-k architecture, both acting on the activations before the Top-k selection: an 1 penalty on the unselected (off-support) units, and a scale-invariant 1/2-ratio penalty that concentrates the code onto fewer effective units. Both penalties are applied only to the batch-active units, those selected by the Top-k operator at least once within the batch. Across two datasets, three vision foundation models, and a range of k, both regularizers consistently improve monosemanticity at no cost to reconstruction quality. The 1/2 penalty further concentrates information into fewer latents, making reconstruction more robust to the inference-time choice of k and improving small-budget linear probing. Our central finding is that hard architectural sparsity and soft sparsity regularization are complementary rather than mutually exclusive.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…