Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding
Abstract
Variable selection in linear models plays a pivotal role in modern statistics. Hard-thresholding methods such as l0 regularization are theoretically ideal but computationally infeasible. In this paper, we propose a new approach, called the LAGS, short for "least absulute gradient selector", to this challenging yet interesting problem by mimicking the discrete selection process of l0 regularization. To estimate β under the influence of noise, we consider, nevertheless, the following convex program [β = arg min1n\|XT(y - Xβ)\|1 + λnΣi = 1pwi(y;X;n)|βi|] λn > 0 controls the sparsity and wi > 0 dependent on y, X and n is the weights on different βi; n is the sample size. Surprisingly, we shall show in the paper, both geometrically and analytically, that LAGS enjoys two attractive properties: (1) LAGS demonstrates discrete selection behavior and hard thresholding property as l0 regularization by strategically chosen wi, we call this property "pseudo-hard thresholding"; (2) Asymptotically, LAGS is consistent and capable of discovering the true model; nonasymptotically, LAGS is capable of identifying the sparsity in the model and the prediction error of the coefficients is bounded at the noise level up to a logarithmic factor--- p, where p is the number of predictors. Computationally, LAGS can be solved efficiently by convex program routines for its convexity or by simplex algorithm after recasting it into a linear program. The numeric simulation shows that LAGS is superior compared to soft-thresholding methods in terms of mean squared error and parsimony of the model.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.