Trustworthy Predictive Distributions for Tail Events with Semiparametric Diagnostic Transport Maps
Abstract
Machine learning forecast systems are moving beyond point predictions to full predictive distributions for future outcomes y conditional on complex inputs x. However, these distributions are often locally miscalibrated, especially for high-stakes tail events where accurate uncertainty quantification is most needed to establish trust in models. Local miscalibration occurs because training data often lack examples of low-frequency events. The goal of this paper is to describe a simple, yet flexible framework that produces interpretable and robust predictive distributions that are easy to fit and may outperform high-complexity forecasting systems when train examples are limited. With this goal in mind, we introduce a semiparametric version of the Local Amortized Diagnostic and Reshaping (LADaR) framework that posits a covariate-dependent parametric model for a diagnostic transport map regressed nonparametrically on inputs to describe how to correct tail probabilities across the feature space to match calibration data. These maps provide the user with local, real-time diagnostics and a recalibrated predictive distribution through an interpretable composition with the base model. We apply these semiparametric diagnostic transport maps to short-term tropical cyclone intensity forecasting to detect evolutionary modes linked to local miscalibration in the National Hurricane Center's forecasts and improve predictions for severe weather hazards.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.