A DNN Biophysics Model with Topological and Electrostatic Features
Abstract
In this project, we present a deep neural network (DNN)-based biophysics model that uses multi-scale and uniform topological and electrostatic features to predict protein properties, such as Coulomb energies or solvation energies. The topological features are generated using element-specific persistent homology (ESPH) on a selection of heavy atoms or carbon atoms. The electrostatic features are generated using a novel Cartesian treecode, which adds underlying electrostatic interactions to further improve the model prediction. These features are uniform in number for proteins of varying sizes; therefore, the widely available protein structure databases can be used to train the network. These features are also multi-scale, allowing users to balance resolution and computational cost. The optimal model trained on more than 17,000 proteins for predicting Coulomb energy achieves MSE of approximately 0.024, MAPE of 0.073 and R2 of 0.976. Meanwhile, the optimal model trained on more than 4,000 proteins for predicting solvation energy achieves MSE of approximately 0.064, MAPE of 0.081, and R2 of 0.926, showing the efficiency and fidelity of these features in representing the protein structure and force field. The feature generation algorithms also have the potential to serve as general tools for assisting machine learning based prediction of protein properties and functions.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.