Ecological Regression with Partial Identification

Abstract

Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for EI and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regarded as a first attempt in this venerable literature to limit the scope of the key form of non-identifiability in EI. To study the operating characteristics of our model, we have amassed the largest collection of data with known ground truth ever applied to evaluate solutions to the EI problem. We collect and study 459 datasets from a variety of fields including public health, political science, and sociology. The datasets contain a total of 2,370,854 geographic units (e.g., precincts), with an average of 5,165 geographic units per dataset. Our replication data are publicly available via the Harvard Dataverse (Jiang et al. 2018) and may serve as a useful resource for future researchers. For all real data sets in our collection that fit our proposed rules, our approach reduces the width of the Duncan and Davis (1953) deterministic bound, on average, by about 45\%, while still capturing the true district level parameter in excess of 97\% of the time. .

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…