Decision-Making under Miscalibration

Abstract

ML-based predictions are used to inform consequential decisions about individuals. How should we use predictions (e.g., risk of heart attack) to inform downstream binary classification decisions (e.g., undergoing a medical procedure)? When the risk estimates are perfectly calibrated, the answer is well understood: a classification problem's cost structure induces an optimal treatment threshold j. In practice, however, some amount of miscalibration is unavoidable, raising a fundamental question: how should one use potentially miscalibrated predictions to inform binary decisions? We formalize a natural (distribution-free) solution concept: given anticipated miscalibration of α, we propose using the threshold j that minimizes the worst-case regret over all α-miscalibrated predictors, where the regret is the difference in clinical utility between using the threshold in question and using the optimal threshold in hindsight. We provide closed form expressions for j when miscalibration is measured using both expected and maximum calibration error, which reveal that it indeed differs from j (the optimal threshold under perfect calibration). We validate our theoretical findings on real data, demonstrating that there are natural cases in which making decisions using j improves the clinical utility.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…