A Bayesian Prevalence Incidence Cure model for estimating survival using Electronic Health Records with incomplete baseline diagnoses

Abstract

Retrospective cohorts can be extracted from Electronic Health Records (EHR) to study prevalence, time until disease or event occurrence and cure proportion in real world scenarios. However, EHR are collected for patient care rather than research, so typically have complexities, such as patients with missing baseline disease status. Prevalence-Incidence (PI) models, which use a two-component mixture model to account for this missing data, have been proposed. However, PI models are biased in settings in which some individuals will never experience the endpoint (they are 'cured'). To address this, we propose a Prevalence Incidence Cure (PIC) model, a 3 component mixture model that combines the PI model framework with a cure model. Our PIC model enables estimation of the prevalence, time-to-incidence, and the cure proportion, and allows for covariates to affect these. We adopt a Bayesian inference approach, and focus on the interpretability of the prior. We show in a simulation study that the PIC model has smaller bias than a PI model for the survival probability; and compare inference under vague, informative and misspecified priors. We illustrate our model using a dataset of 1964 patients undergoing treatment for Diabetic Macular Oedema, demonstrating improved fit under the PIC model.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…