A Risk Ratio Comparison of l0 and l1 Penalized Regression

Abstract

There has been an explosion of interest in using l1-regularization in place of l0-regularization for feature selection. We present theoretical results showing that while l1-penalized linear regression never outperforms l0-regularization by more than a constant factor, in some cases using an l1 penalty is infinitely worse than using an l0 penalty. We also show that the "optimal" l1 solutions are often inferior to l0 solutions found using stepwise regression. We also compare algorithms for solving these two problems and show that although solutions can be found efficiently for the l1 problem, the "optimal" l1 solutions are often inferior to l0 solutions found using greedy classic stepwise regression. Furthermore, we show that solutions obtained by solving the convex l1 problem can be improved by selecting the best of the l1 models (for different regularization penalties) by using an l0 criterion. In other words, an approximate solution to the right problem can be better than the exact solution to the wrong problem.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…