Propensity Score Propagation: A General Framework for Design-Based Inference with Unknown Propensity Scores
Abstract
Design-based inference, also known as randomization-based or finite-population inference, provides a principled framework for trustworthy statistical inference by attributing randomness solely to the design mechanism (e.g., treatment assignment, survey sampling, or missingness), without imposing super-population distributional or modeling assumptions on outcome data. From Fisher's and Neyman's seminal work to the recent resurgence of design-based inference, this perspective has played a central role in causal inference, survey sampling, and missing data analysis. However, a fundamental obstacle has limited its use in many modern applications: existing design-based inference theory typically relies on known propensity scores (i.e., known design probabilities), whereas propensity scores are usually unknown in observational studies, real-world survey settings, and missing data problems. We propose propensity score propagation, a general framework for valid design-based inference with unknown propensity scores. The framework introduces a regeneration-and-union procedure that propagates uncertainty from propensity score estimation into downstream design-based inference without imposing super-population outcome assumptions. It accommodates both parametric and nonparametric propensity score models, integrates seamlessly with existing design-based methods developed under known propensity scores, and applies broadly across design-based inference problems. Theoretical results and simulation studies show that the proposed framework achieves nominal coverage, even when existing approaches exhibit substantial under-coverage.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.