OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Abstract
Prostate cancer is one of the most prevalent and deadly cancers among men, motivating the development of accurate and accessible imaging technologies for early detection. Ultrasound computed tomography (USCT) reconstructs quantitative tissue parameters such as speed-of-sound (SOS) and is a promising low-cost alternative to existing modalities. However, prostate USCT remains challenging due to limited-angle acquisition, strong tissue heterogeneity, bone-induced wave distortion, and the lack of large-scale, anatomically realistic datasets for method development and evaluation. We introduce OPENPROS, the first large-scale benchmark dataset for limited-angle prostate USCT, designed to systematically evaluate machine learning methods for quantitative inverse problems. OPENPROS contains over 280,000 paired samples of realistic 2D SOS maps and corresponding ultrasound full-waveform data, generated from anatomically accurate 3D digital prostate models derived from 4 clinical MRI/CT scans and 62 ex vivo prostate specimens with experimental ultrasound measurements. Wave propagation is simulated under clinically realistic configurations using open-source finite-difference time-domain and Runge-Kutta solvers. We provide standardized training, in-distribution, and out-of-distribution benchmarks and evaluate representative deep learning baselines. While learning-based methods substantially improve inference speed and reconstruction accuracy over physics-based approaches, results highlight persistent challenges in robustness, generalization, and high-resolution reconstruction quality. By publicly releasing OPENPROS, we establish a rigorous benchmark to support research in inverse problems, physics-guided learning, and operator learning, and to bridge the gap between machine learning research and practical USCT deployment. The dataset is available at https://open-pros.github.io/.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.