Two-stage Least Squares with Clustered Data under the Local Average Treatment Effect Framework

Abstract

To estimate the causal effect of an endogenous treatment using clustered data, the canonical two-stage least squares (2sls) estimates a linear regression of the outcome on treatment status using an instrumental variable (IV) and conducts inference with cluster-robust standard errors. When both the treatment and the IV vary within clusters, an alternative two-stage least squares with fixed effects (2sfe) additionally includes cluster indicators in the regression, thereby incorporating cluster information into point estimation as well. This paper studies the trade-off between these approaches within the local average treatment effect (LATE) framework. When clusters are homogeneous, we show that both approaches yield valid large-sample inference for the LATE, and that 2sfe is more efficient than canonical 2sls only when the variation in cluster-specific effects dominates idiosyncratic variation and the IV has sufficient within-cluster variation. When clusters are heterogeneous, we show that 2sfe identifies a weighted average of cluster-specific LATEs, whereas the canonical 2sls generally does not. We further propose a test for detecting cluster heterogeneity.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…