False Variable Selection Rates in Regression

Abstract

There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression model. Recent papers have struggled with controlling this quantity when the predictors are correlated. This paper shows that this full model definition of FDR suffers from unintuitive and potentially undesirable behavior in the presence of correlated predictors. We propose a new false selection error criterion, the False Variable Rate (FVR), that avoids these problems and behaves in a more intuitive manner. We discuss the behavior of this criterion and how it compares with the traditional FDR, as well as presenting guidelines for determining which is appropriate in a particular setting. Finally, we present a simple estimation procedure for FVR in stepwise variable selection. We analyze the performance of this estimator and draw connections to recent estimators in the literature.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…