Doubly robust integration of nonprobability and probability survey data

Abstract

Doubly robust estimators for estimating the population mean (or prevalence) of an outcome have been proposed for integrating outcome and covariate data from a nonprobability sample with covariate data from a probability survey. These estimators combine inverse probability weighting estimation with mass imputation. However, the question of how to combine these doubly robust estimators with a Horvitz-Thompson or Hajek estimator that uses only outcome data from the probability survey has received only limited attention. In this paper, we first review previously proposed doubly robust estimators that use outcome data from only the nonprobability sample. We extend these estimators to enable estimation of domain (subpopulation) means (or prevalences), possibly using data from individuals outside the domain to improve estimation when the domain is small. We then consider how to combine this doubly robust estimator with a Horvitz-Thompson or Hajek estimator that uses only the probability survey data. We describe efficient combined estimators, and provide formulae for their repeated-sampling variances and for estimators of these variances. We also investigate the asymptotic relative efficiencies of the combined estimators compared to their two component estimators, and carry out a simulation study to assess their relative efficiencies in finite samples. These relative efficiencies depend on the ratio of the variances of the two component estimators and on how predictive the covariates are of the outcome.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…