Semiparametric Difference-in-Differences with Potentially Many Control Variables
Abstract
This paper discusses difference-in-differences (DID) estimation when there exist many control variables, potentially more than the sample size. In this case, traditional estimation methods, which require a limited number of variables, do not work. One may consider using statistical or machine learning (ML) methods. However, by the well-known theory of inference of ML methods proposed in Chernozhukov et al. (2018), directly applying ML methods to the conventional semiparametric DID estimators will cause significant bias and make these DID estimators fail to be sqrtN-consistent. This article proposes three new DID estimators for three different data structures, which are able to shrink the bias and achieve sqrtN-consistency and asymptotic normality with mean zero when applying ML methods. This leads to straightforward inferential procedures. In addition, I show that these new estimators have the small bias property (SBP), meaning that their bias will converge to zero faster than the pointwise bias of the nonparametric estimator on which it is based.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.