Automatic Locally Robust GMM with Machine-Learning-Generated Regressors
Abstract
Machine-learning (ML) methods now routinely generate regressors used in subsequent econometric analyses, for example, estimated propensity scores, control-function residuals, imputed covariates, learned proxies, or low-dimensional embeddings of high-dimensional data. As these ML-generated regressors become ubiquitous, the lack of general inference methods for models that use them has become a critical limitation. Standard plug-in and Double ML procedures ignore how generated regressors enter later stages, leading to large biases and invalid inference. We develop a three-step locally robust GMM framework for inference with ML generated regressors. A key new insight is downstream local robustness: by a functional chain rule, moment functions that are constructed to be orthogonal to the second step eliminate the complicated indirect (conditioning) effects from the ML-generated regressors. We show how to implement this automatically by estimating the associated Riesz representers through cross-fitted auxiliary regressions, allowing for generic non-Donsker ML in both early steps. In leading treatment-effect and counterfactual settings, simulations demonstrate severe bias in existing methods and reductions of 85-95% using our procedures.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.