A Note on High Dimensional Linear Regression with Interactions
Abstract
The problem of interaction selection has recently caught much attention in high dimensional data analysis. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss issues such as a valid way of defining importance for the main effects and interaction effects, the invariance principle, and the strong heredity condition. Then we focus on two-stage methods, which are computationally attractive for large p problems but regarded heuristic in the literature. We will revisit the counterexample of Turlach (2004) and provide new insight to justify two-stage methods from a theoretical perspective. In the end, we suggest some new strategies for interaction selection under the marginality principle, which is followed by a numerical example.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.