On the Optimal Weighted 2 Regularization in Overparameterized Linear Regression
Abstract
We consider the linear model y = X β + ε with X∈ Rn× p in the overparameterized regime p>n. We estimate β via generalized (weighted) ridge regression: βλ = (XTX + λ w) XTy, where w is the weighting matrix. Under a random design setting with general data covariance x and anisotropic prior on the true coefficients EββT = β, we provide an exact characterization of the prediction risk E(y-xTβλ)2 in the proportional asymptotic limit p/n→ γ ∈ (1,∞). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting λ opt for the ridge parameter λ and confirm the implicit 2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that λ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when both X and β are anisotropic. Finally, we determine the optimal weighting matrix w for both the ridgeless (λ 0) and optimally regularized (λ = λ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.