Radiomics-Guided Vision Transformers for Survival Analysis

Abstract

Vision Transformers (ViTs) have shown strong empirical performance on high-dimensional medical imaging data, yet their behavior under survival objectives and the interpretability of their attention mechanisms remain poorly understood. Under shallow ViTs, we design controlled experiments showing that token-level attention dynamics can recover outcome-relevant regions and that attention-based thresholding enables effective token pruning, improving both interpretability and predictive performance. We also study pretrained deep ViTs for survival analysis and propose a radiomics-guided hybrid model that integrates pixel-based embeddings with interpretable radiomic features through a multimodal Cox framework and contrastive alignment. Applied to a COVID-19 chest X-ray cohort with a composite ICU admission or mortality endpoint, the proposed approach achieves competitive discrimination while providing clinically meaningful attention maps and feature-group importance.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…