Representation of compounds for machine-learning prediction of physical properties
Abstract
The representations of a compound, called "descriptors" or "features", play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a systematic set of descriptors from simple elemental and structural representations. First it is applied to a large dataset composed of the cohesive energy for about 18000 compounds computed by density functional theory (DFT) calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the "chemical accuracy" of 1 kcal/mol (0.043 eV/atom). The procedure is also applied to two smaller datasets, i.e., a dataset of the lattice thermal conductivity (LTC) for 110 compounds computed by DFT calculation and a dataset of the experimental melting temperature for 248 compounds. We examine the performance of the descriptor sets on the efficiency of Bayesian optimization in addition to the accuracy of the kernel ridge regression models. They exhibit good predictive performances.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.