Closing the Prior-Posterior Loop: Self-Reflective Molecular Design with Analysis-Driven LLM Iteration

Abstract

Can a general-purpose large language model design molecules with the precision of a seasoned chemist? Current LLM-based frameworks answer this question with scalar feedback loops - generate, score, reject - that amount to informed trial-and-error. Here we show that replacing a single number with the full physicochemical rationale from first-principles calculations transforms the LLM from a stochastic sampler into a causal reasoner. Our system couples retrieval-augmented generation with a self-reflection module that feeds orbital energies, atomic charges, and electron densities - rather than compressed scores - back into the design loop. On HOMO-LUMO gap targets from 2.0 to 5.0 eV, this structure-property-relationship (SPR) reflection achieves a deviation as low as 0.0014 eV with a 100% success rate under the SPR+RAG configuration, consistently outperforming scalar-feedback and non-reflective baselines in median and mean deviation. The framework generalizes seamlessly to dipole-moment design, synthetic accessibility optimization, and molecular docking, and proves robust across 7 distinct LLM backbones. These results establish a new paradigm: when the model understands not only that a molecule fails, but why, iterative molecular design becomes genuinely mechanistic.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…