Closed-form r norm scaling with data for overparameterized linear regression and diagonal linear networks under p bias
Abstract
For overparameterized linear regression with isotropic Gaussian design and minimum-p interpolator p∈(1,2], we give a unified, high-probability characterization for the scaling of the family of parameter norms \\ wp r \\r ∈ [1,p] with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in X Y, yielding closed-form predictions for (i) a data-dependent transition n (the "elbow"), and (ii) a universal threshold r=2(p-1) that separates wp r's which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all* r norms within the family r∈ [1,p] under p-biased interpolation, and explains in one picture which norms saturate and which increase as n grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale α to an effective peff(α) via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on wp r, our results suggest that their predictive power will depend sensitively on which lr norm is used.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.