Pseudo-R2 statistics under complex sampling

Abstract

Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R2 coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox--Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design-consistent estimator of the population model summary. I also show that for logistic regression models under case--control sampling the usual Cox--Snell and Nagelkerke R2 are not design-consistent, but are systematically larger than would be obtained with a cross-sectional or cohort sample, even in settings where the weighted and unweighted logistic regression estimators are similar or identical.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…