Debiased Machine Learning U-statistics
Abstract
We propose a method to debias estimators based on U-statistics with machine-learning (ML) first steps. Standard plug-in estimators often suffer from regularization and model-selection biases, leading to invalid inference. We characterize orthogonal adjustment terms for two-step U-statistics, construct Debiased Machine Learning (DML) U-estimators, develop a cross-fitting algorithm, and establish a general asymptotic theory for inference with ML first steps. We illustrate the methodology with applications to Inequality of Opportunity (IOp), the Area Under the ROC Curve, and conditional moment restrictions. For the latter setting, we introduce Kernel-DML, a class of orthogonal estimators based on identification-preserving kernel distance criteria, and apply it to semiparametric production function estimation. Using European survey data, we provide debiased ML-based estimates of income IOp and show that the widely used Conditional Inference Forest plug-in approach can substantially underestimate IOp. Moreover, plug-in results vary substantially across ML methods, whereas the debiased estimates are considerably more stable.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.