Aptamer-protein interaction prediction model based on transformer
Abstract
Aptamers are single-stranded DNA/RNAs or short peptides with unique tertiary structures that selectively bind to specific targets. They have great potential in the detection and medical fields. Here, we present SelfTrans-Ensemble, a deep learning model that integrates sequence information models and structural information models to extract multi-scale features for predicting aptamer-protein interactions (APIs). The model employs two pre-trained models, ProtBert and RNA-FM, to encode protein and aptamer sequences, along with features generated from primary sequence and secondary structural information. To address the data imbalance in the aptamer dataset imbalance, we incorporated short RNA-protein interaction data in the training set. This resulted in a training accuracy of 98.9% and a test accuracy of 88.0%, demonstrating the model's effectiveness in accurately predicting APIs. Additionally, analysis using molecular simulation indicated that SelfTrans-Ensemble is sensitive to aptamer sequence mutations. We anticipate that SelfTrans-Ensemble can offer a more efficient and rapid process for aptamer screening.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.