Thresholded Lasso for high dimensional variable selection
Abstract
Given n noisy samples with p dimensions, where n p, we show that the multi-step thresholding procedure based on the Lasso -- we call it the Thresholded Lasso, can accurately estimate a sparse vector β ∈ Rp in a linear model Y = X β + ε, where Xn × p is a design matrix normalized to have column 2-norm n, and ε N(0, σ2 In). We show that under the restricted eigenvalue (RE) condition, it is possible to achieve the 2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle while selecting a sufficiently sparse model -- hence achieving sparse \ oracle \ inequalities; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Cand\`es-Tao 07), if X obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most s0 irrelevant variables in the model in the worst case, where s0 ≤ s is the smallest integer such that for λ = 2 p/n, Σi=1p (βi2, λ2 σ2) ≤ s0 λ2 σ2. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.