Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science
Abstract
Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as do-operators in Pearl's interventional calculus to simultaneously handle latent confounders and real interventional data. Theoretically, d-variable causal structure is identifiable with O(d) single-variable interventions -- the minimum under physical realizability constraints. In Intrinsic Evaluation on synthetic data (γ=0.2--0.8), CFM-SD achieves average F1=0.800 vs. F1=0.127--0.562 for all baselines. In Extrinsic Evaluation on real scientific data, CFM-SD achieves 57--58\% bias reduction in molecular toxicity prediction and battery electrolyte optimization, demonstrating practical value beyond synthetic benchmarks.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.