Bayesian Integration of Nonlinear Incomplete Clinical Data

Abstract

Multimodal clinical data are characterized by high dimensionality, heterogeneous representations, and structured missingness, posing significant challenges for predictive modeling, data integration, and interpretability. We propose BIONIC (Bayesian Integration of Nonlinear Incomplete Clinical data), a unified probabilistic framework that integrates heterogeneous multimodal data under missingness through a joint generative-discriminative latent architecture. BIONIC uses pretrained embeddings for complex modalities such as medical images and clinical text, while incorporating structured clinical variables directly within a Bayesian multimodal formulation. The proposed framework enables robust learning in partially observed and semi-supervised settings by explicitly modeling modality-level and variable-level missingness, as well as missing labels. We evaluate BIONIC on three multimodal clinical and biomedical datasets, demonstrating strong and consistent discriminative performance compared to representative multimodal baselines, particularly under incomplete data scenarios. Beyond predictive accuracy, BIONIC provides intrinsic interpretability through its latent structure, enabling population-level analysis of modality relevance and supporting clinically meaningful insight.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…