Survival analysis under label shift
Abstract
Let P represent the source population with complete data, containing covariate Z and response T, and Q the target population, where only the covariate Z is available. We consider a setting with both label shift and label censoring. Label shift assumes that the marginal distribution of T differs between P and Q, while the conditional distribution of Z given T remains the same. Label censoring refers to the case where the response T in P is subject to random censoring. Our goal is to leverage information from the label-shifted and label-censored source population P to conduct statistical inference in the target population Q. We propose a parametric model for T given Z in Q and estimate the model parameters by maximizing an approximate likelihood. This allows for statistical inference in Q and accommodates a range of classical survival models. Under the label shift assumption, the likelihood depends not only on the unknown parameters but also on the unknown distribution of T in P and Z in Q, which we estimate nonparametrically. The asymptotic properties of the estimator are rigorously established and the effectiveness of the method is demonstrated through simulations and a real data application. This work is the first to combine survival analysis with label shift, offering a new research direction in this emerging topic.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.