Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

Akmal Xodarev

Feature Learning in Wide Neural Networks under μP: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

Abstract

We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization (μP). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under μP, identifying the maximal admissible weight w* on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate O(N-1). Second, we characterize identifiability of the mean-field limit: two admissible parameter measures induce the same network function in L2 exactly when their active components agree modulo the finite-rank realization symmetry of the architecture. The orbit depth D*orb is separated from the moment-variety depth D*var. Third, under the Barron-Hermite target condition the active support of the long-time limit measure admits a sparse-dictionary decomposition: it is supported on at most S* atoms modulo finite-rank realization symmetry, with S* bounded by an explicit coefficient-threshold number. Fourth, we derive the total feature-learning-error decomposition into statistical, optimization, propagation-of-chaos, and sparse-residual components, with a target-dependent Hermite/Barron tail replacing any initialization-only residual. The four results are tied together by an architectural identity: the triple (w*, D*orb, S*) -- the maximal admissible weight, the orbit identifiability depth, and the sparse-dictionary depth at which the target is realizable -- is the natural learning cell of the architecture-data pair (σ, ρ). The proofs are self-contained except for standard results from μP and mean-field Langevin theory.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…