Demystifying and avoiding the OLS "weighting problem": Unmodeled heterogeneity and straightforward solutions

Abstract

Researchers frequently estimate treatment effects by regressing outcomes (Y) on treatment (D) and covariates (X). Even without unobserved confounding, the coefficient on D yields a conditional-variance-weighted average of strata-wise effects, not the average treatment effect. Scholars have proposed characterizing the severity of these weights, evaluating resulting biases, or changing investigators' target estimand to the conditional-variance-weighted effect. We aim to demystify these weights, clarifying how they arise, what they represent, and how to avoid them. Specifically, these weights reflect misspecification bias from unmodeled treatment-effect heterogeneity. Rather than diagnosing or tolerating them, we recommend avoiding the issue altogether, by relaxing the standard regression assumption of "single linearity" to one of "separate linearity" (of each potential outcome in the covariates), accommodating heterogeneity. Numerous methods--including regression imputation (g-computation), interacted regression, and mean balancing weights--satisfy this assumption. In many settings, the efficiency cost to avoiding this weighting problem altogether will be modest and worthwhile.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…