MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

Abstract

Hypothesis ranking is vital for automated scientific discovery, especially in cost-intensive, throughput-limited natural science domains. Current methods focus on pre-experiment ranking, relying solely on language model reasoning without empirical feedback. We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests. Due to the impracticality of real experiments, we propose a simulator grounded in domain-specific concepts that models hypothesis performance as a function of similarity to a hidden ground truth, perturbed by noise. Validated against 124 hypotheses with experimentally reported outcomes, the simulator approximates real results with consistent trend alignment. Although deviations exist, they mimic wet-lab noise, promoting more robust ranking strategies. We frame experiment-guided ranking as a sequential decision-making problem and propose an in-context reinforcement learning (ICRL) framework. Our LLM-based policy decomposes hypotheses into functional elements, clusters them by mechanistic roles, and prioritizes recombinations based on feedback. Experiments show our approach significantly outperforms pre-experiment baselines and strong ablations. Our toolkit, comprising the simulator and ICRL framework, enables systematic research on experiment-guided ranking, with the policy serving as a strong proof of concept.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…