High-Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transition
Abstract
We consider a sparse linear regression model Y=Xβ*+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and β* is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error β\|Y-Xβ\|2, where the minimization is over all k-sparse binary vectors β. The approximation reveals interesting structural properties of the underlying regression problem. In particular, a) We establish that n*=2k p/ (2k/σ2+1) is a phase transition point with the following "all-or-nothing" property. When n exceeds n*, (2k)-1\|β2-β*\|0≈ 0, and when n is below n*, (2k)-1\|β2-β*\|0≈ 1, where β2 is the optimal solution achieving the smallest squared error. With this we prove that n* is the asymptotic threshold for recovering β* information theoretically. b) We compute the squared error for an intermediate problem β\|Y-Xβ\|2 where minimization is restricted to vectors β with \|β-β*\|0=2k ζ, for ζ∈ [0,1]. We show that a lower bound part (ζ) of the estimate, which corresponds to the estimate based on the first moment method, undergoes a phase transition at three different thresholds, namely ninf,1=σ2 p, which is information theoretic bound for recovering β* when k=1 and σ is large, then at n* and finally at nLASSO/CS. c) We establish a certain Overlap Gap Property (OGP) on the space of all binary vectors β when n ck p for sufficiently small constant c. We conjecture that OGP is the source of algorithmic hardness of solving the minimization problem β\|Y-Xβ\|2 in the regime n<nLASSO/CS.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.