MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules

Abstract

MultiPUFFIN is a domain-informed multimodal foundation model for predicting thermophysical properties of small molecules, addressing a critical gap in chemical engineering, drug discovery, and materials science. Existing molecular foundation models pretrain on millions of molecules to learn general-purpose representations, but their standard MLP output layers impose no physical constraints, vapor pressure predictions may violate monotonic temperature dependence, and viscosity curves may lack the functional form required by process simulators. Domain-informed approaches that guarantee thermodynamic consistency have remained limited to single properties and small datasets, whereas multimodal foundation models have focused on biological activity rather than thermophysical properties. MultiPUFFIN fills this gap by fusing SMILES sequences, 2D molecular graphs, and 3D conformer geometries through bidirectional cross-modal attention and gated fusion, supplemented by auxiliary encoders for experimental conditions and molecular descriptors. The backbone is pretrained on 500,000 unlabelled PubChem molecules using three complementary self-supervised objectives. A condition-aware refinement stack of five conditioners (temperature, pH, pressure, polymorph, and measurement method) routes each property to a four-head tournament that selects the best-performing thermodynamically informed head for that property. MultiPUFFIN achieves a mean test R2 of 0.784 and outperforms fine-tuned ChemBERTa-2 on all nine properties despite training on roughly 2,000x fewer labeled molecules.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…