Learning Stable Predictors from Weak Supervision under Distribution Shift
Abstract
Learning from weak, proxy, or relative supervision is common when ground-truth labels are unavailable, but robustness under distribution shift remains poorly understood because the supervision mechanism itself may change across environments. We formalize this phenomenon as supervision drift, defined as changes in P(y x, c) across contexts, and study it in CRISPR-Cas13d transcriptomic perturbation experiments where guide efficacy is inferred indirectly from RNA-seq responses. Using publicly available data spanning two human cell lines and multiple post-induction timepoints, we construct a controlled non-IID benchmark with explicit domain (cell line) and temporal shifts, while reusing a fixed weak-label construction across all contexts to avoid changing targets. Across linear and tree-based models, weak supervision supports meaningful learning in-domain (ridge R2 = 0.356, Spearman ρ= 0.442) and partial cross-cell-line transfer (ρ≈ 0.40). In contrast, temporal transfer collapses across all model classes considered, yielding negative R2 and weak or near-zero ρ (ridge R2 = -0.145, ρ= 0.008; XGBoost R2 = -0.155, ρ= 0.056; random forest R2 = -0.322, ρ= 0.139). Additional robustness analyses using externally recomputed weak labels, shift-score quantification, and simple mitigation baselines preserve the same qualitative pattern. Feature-label association and feature-importance analyses remain relatively stable across cell lines but change sharply over time, indicating that failures arise from supervision drift rather than model capacity or simple covariate shift. These results show that strong in-domain performance under weak supervision can be misleading and motivate feature stability as a lightweight diagnostic for non-transferability before deployment.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.