In Silico Functional Profiling of Engineered Small Molecules: A Machine Learning Approach Leveraging PubChem Identifiers (CIDSID ML model)

Abstract

The article introduces a concept for a time- and cost-effective methodological framework leveraging machine learning (ML) models for both early-stage drug development and clinical trial support. The rationale for this approach is the inherent scalability and speed enabled by using pre-calculated data embedded in existing PubChem identifiers (CID and SID), thereby eliminating the computationally intensive step of on-the-fly molecular descriptor generation. The approach was effectively demonstrated across four diverse bioassays: antagonists of the human D3 dopamine receptor, Rab9 promoter activators, small-molecule inhibitors of CHOP, and antagonists of the human M1 muscarinic receptor. A comparison, based on Matthews correlation coefficient (MCC), was conducted between the CIDSID ML model, the MORGAN2-based ML model, and the RDKit-transformed SMILES model for these four case studies, revealing that no method is universally superior in terms of performance. Furthermore, the CIDSID model averaged a rapid execution time of only 3.3 seconds; the ML models relying on explicit structural descriptors, such as MORGAN2 and RDKit-transformed SMILES, demonstrated high computational costs, with processing times averaging 106.0 and 109.6 seconds, respectively. While negligible for a single ML model, these times would cause a significant difference in computational resource consumption when scaled across a framework involving over a million buildings. Moreover, the CIDSID ML model achieved strong average performance metrics: Accuracy of 83.52%, Precision of 89.62%, Recall of 75.65%, F1-Score of 81.93% and ROC of 83.53%.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…