Robust Design-Based Estimation and Inference for Stratified Randomized Trials with Varying Cluster Sizes

Abstract

Clustered randomized controlled trials are often stratified or pair-matched to improve covariate balance and efficiency. Sample average treatment effects (SATEs) are commonly estimated by averaging stratum-level treatment-control mean contrasts -- an approach that is natural and widely used. We show that, in stratified clustered trials with heterogeneous cluster sizes, such estimators need not be consistent for the SATE. They can converge to the wrong limit even under correct randomization and without model misspecification. The source is a covariance between cluster sizes and treatment effects: stratumwise averaging mis-weights clusters in a way that produces bias of constant order, regardless of sample size. We study the Hájek (ratio) estimator as a robust alternative. By aggregating outcomes within treatment groups before taking their difference, it remains consistent in clustered trials that grow by increasing strata sizes or the number of strata. Despite that, its use in design-based analyses of clustered trials has been limited by the lack of variance estimators. We develop a design-based variance estimator that applies to any number of strata of any size, and show that it is asymptotically conservative, a property that holds even when some strata contain only a single treated or control unit. We also present tests improving the coverage of Wald tests when the number of clusters is moderate. The framework extends naturally to covariate-adjusted estimators via a variance orthogonality property.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…