RareCollab: an LLM-powered framework for multimodal reasoning in Mendelian disease diagnosis
Abstract
Rare disease diagnosis increasingly relies on integrating genomic, phenotypic and transcriptomic evidence, yet these signals remain difficult to reconcile within a common interpretive framework. Here we present RareCollab, an LLM-powered framework for multimodal reasoning in Mendelian disease diagnosis that integrates more than 100 diagnostic evidence signals across DNA, RNA, phenotype, curated variant-level knowledge, and in-silico pathogenicity evidence. This design enables large language models to operate as calibrated, interpretable reasoning modules rather than as a single end-to-end ranker. We applied RareCollab to 890 patients from three cohorts, including 119 Undiagnosed Diseases Network probands with paired DNA and RNA data, constituting a large systematic benchmark for multimodal rare disease diagnosis under paired genomic and transcriptomic evaluation. In this real-world multimodal benchmark, RareCollab prioritized 94% of diagnostic genes within the top 10. Across recall thresholds from top 1 to top 10, it consistently outperformed proprietary phenotype-driven LLM baselines including Claude Sonnet 4.6 and GPT-5-mini by more than 25% on average and surpassed established state-of-the-art variant prioritization methods by 11%-24%. RareCollab also reshapes the diagnostic contribution of RNA evidence, which contributes to prioritization of the diagnostic gene in 35% of cases (42/119). Together, these results establish RareCollab as a scalable and interpretable framework for multimodal rare disease diagnosis.