What an Amortized X-ray Posterior Cannot See: Gain Shifts, Silent Miscalibration, and Where Nested Sampling Still Earns Its Cost
Abstract
Neural posterior estimation (NPE) gives X-ray spectral fits a posterior in milliseconds instead of the minutes nested sampling costs, but without its calibration guarantee or goodness-of-fit. The simulation-based inference (SBI) literature has trust diagnostics for this gap that have not been benchmarked on X-ray spectra. We provide the first such benchmark on one real XMM-Newton EPIC-pn response: a 5-parameter absorbed continuum across three count regimes (~100, 1000, 10000 counts), four misspecification families, and nested sampling on the exact Poisson likelihood as reference. A posterior-predictive check catches an unmodeled 6.4 keV line (ROC AUC 0.97 above ~1000 counts), where a missed line biases the photon index by +0.20 at bright counts. A 3% detector gain shift stays at chance (AUC ~0.50, 36 cells) for all three per-spectrum scores while distorting the continuum; only nested sampling's evidence flags it (Delta log Z ~ -7.8 at medium counts). Separately, one flow passed every recovery check yet was miscalibrated (marginal coverage deviation 0.113); reseeds and an uncapped retrain trace this to single-flow undertraining, not the count regime, and split-conformal repaired it (0.113 -> 0.026). Recovery metrics do not certify calibration, and a fast amortized posterior still needs an evidence-based check in the loop.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.