EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference with ML-Generated Variables
Abstract
Despite increasing popularity in empirical studies, the integration of machine learning generated variables into regression models for statistical inference suffers from the measurement error problem, which can bias estimation and threaten the validity of inferences. In this paper, we develop a novel approach to alleviate associated estimation biases. Our proposed approach, EnsembleIV, creates valid and strong instrumental variables from weak learners in an ensemble model, and uses them to obtain consistent estimates that are robust against the measurement error problem. Our empirical evaluations, using both synthetic and real-world datasets, show that EnsembleIV can effectively reduce estimation biases across several common regression specifications, and can be combined with modern deep learning techniques when dealing with unstructured data.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.