Estimating Representative Causal Effects with Double Machine Learning

Abstract

Double Machine Learning is widely used to estimate treatment effects from non-experimental data. The "residuals-on-residuals" regression (RORR) is especially popular for its simplicity and computational tractability. However, with heterogeneous treatment effects, the proper interpretation of RORR may not be well understood. We show that, for non-binary treatments with continuous dose-response functions, RORR estimates a conditional variance-weighted average of derivatives evaluated at treatment values not in the observed dataset. This estimand does not equal the Average Causal Derivative (ACD) in general. Hence, even if all units share the same dose-response function, RORR does not estimate an average treatment effect in the population represented by the sample. We propose an alternative estimator for the ACD that is well suited to the large datasets found in applied data science settings. We demonstrate the pitfalls of RORR and the favorable properties of the proposed estimator through an illustrative numerical example and with real-world data from Netflix. Our methodology is used by default in Netflix's observational causal inference platform, where it regularly powers causal research and decision-making at scale.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…