Data-model Coevolution as the Architectural Principle for AI-Native Materials Databases
Abstract
AI-native approaches are reshaping computational materials discovery into iterative data-model coevolution cycles. However, most existing materials databases remain fundamentally data-centric, where predictive models remain external to database state and data growth is decoupled from model updating. Here we formalize data-model coevolution as the architectural basis of AI-native materials databases, where data and predictive models evolve through endogenous generation-evaluation-refinement cycles. Using the Li-P-S ternary as a demonstrative prototype, we generated approximately 70,000 candidate structures, more than 10,000 of which satisfy the stable-unique-novel (S.U.N.) criterion, achieving rapid saturation of local chemical environments together with stabilization of energy distributions. We autonomously found chemically plausible phases and motifs outside the Materials Project (MP) and Alexandria databases, including a stable Li2PS3 phase, the (PS3)33- trimer, the (P3S8)3- ring, two isomers of the (P2S8)2- ring, and polymeric (PS4)nn- chains. Within two to three iterations, the integrated predictive models converged to high precision under a low first-principles cost, and the resulting data-model state can be directly queried for atomistic and electronic-structure properties within the same unified framework. Data-model states can be reused and extended across related chemical systems, enabling scalable and continuous accumulation of computational materials knowledge. These results demonstrate data-model coevolution as a practical architectural principle for AI-era materials data infrastructure.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.