中国物理B ›› 2025, Vol. 34 ›› Issue (8): 88709-088709.doi: 10.1088/1674-1056/adea9b

所属专题: SPECIAL TOPIC — A celebration of the 90th Anniversary of the Birth of Bolin Hao

• • 上一篇    下一篇

RLsite: Integrating 3D-CNN and BiLSTM for RNA-ligand binding site prediction

Yan Zou(邹艳), Lang Yang(杨浪), Yanhui Liu(刘艳辉), and Yuyu Feng(冯玉宇)†   

  1. School of Physics, Guizhou University, Guiyang 550000, China
  • 收稿日期:2025-05-06 修回日期:2025-06-22 接受日期:2025-07-02 出版日期:2025-07-17 发布日期:2025-07-21
  • 通讯作者: Yuyu Feng E-mail:fengyy@gzu.edu.cn
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant No. 12204118) and the Guizhou University Talent Fund (Grant No. [2022]30).

RLsite: Integrating 3D-CNN and BiLSTM for RNA-ligand binding site prediction

Yan Zou(邹艳), Lang Yang(杨浪), Yanhui Liu(刘艳辉), and Yuyu Feng(冯玉宇)†   

  1. School of Physics, Guizhou University, Guiyang 550000, China
  • Received:2025-05-06 Revised:2025-06-22 Accepted:2025-07-02 Online:2025-07-17 Published:2025-07-21
  • Contact: Yuyu Feng E-mail:fengyy@gzu.edu.cn
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant No. 12204118) and the Guizhou University Talent Fund (Grant No. [2022]30).

摘要: Accurate identification of RNA-ligand binding sites is essential for elucidating RNA function and advancing structure-based drug discovery. Here, we present RLsite, a novel deep learning framework that integrates energy-, structure- and sequence-based features to predict nucleotide-level binding sites with high accuracy. RLsite leverages energy-based three-dimensional representations, obtained from atomic probe interactions using a pre-trained ITScore-NL potential, and models their contextual features through a 3D convolutional neural network (3D-CNN) augmented with self-attention. In parallel, structure-based features, including network properties, Laplacian norm, and solvent-accessible surface area, together with sequence-based evolutionary constraint scores, are mapped along the RNA sequence and used as sequential descriptors. These descriptors are modeled using a bidirectional long short-term memory (BiLSTM) network enhanced with multi-head self-attention. By effectively fusing these complementary modalities, RLsite achieves robust and precise binding site prediction. Extensive evaluations across four diverse RNA-ligand benchmark datasets demonstrate that RLsite consistently outperforms state-of-the-art methods in terms of precision, recall, Matthews correlation coefficient (MCC), area under the curve (AUC), and overall robustness. Notably, on a particularly challenging test set composed of RNA structures containing junctions, RLsite surpasses the second-best method by 7.3% in precision, 3.4% in recall, 7.5% in MCC, and 10.8% in AUC, highlighting its potential as a powerful tool for RNA-targeted molecular design.

关键词: RNA-ligand, binding sites prediction, deep learning, self-attention

Abstract: Accurate identification of RNA-ligand binding sites is essential for elucidating RNA function and advancing structure-based drug discovery. Here, we present RLsite, a novel deep learning framework that integrates energy-, structure- and sequence-based features to predict nucleotide-level binding sites with high accuracy. RLsite leverages energy-based three-dimensional representations, obtained from atomic probe interactions using a pre-trained ITScore-NL potential, and models their contextual features through a 3D convolutional neural network (3D-CNN) augmented with self-attention. In parallel, structure-based features, including network properties, Laplacian norm, and solvent-accessible surface area, together with sequence-based evolutionary constraint scores, are mapped along the RNA sequence and used as sequential descriptors. These descriptors are modeled using a bidirectional long short-term memory (BiLSTM) network enhanced with multi-head self-attention. By effectively fusing these complementary modalities, RLsite achieves robust and precise binding site prediction. Extensive evaluations across four diverse RNA-ligand benchmark datasets demonstrate that RLsite consistently outperforms state-of-the-art methods in terms of precision, recall, Matthews correlation coefficient (MCC), area under the curve (AUC), and overall robustness. Notably, on a particularly challenging test set composed of RNA structures containing junctions, RLsite surpasses the second-best method by 7.3% in precision, 3.4% in recall, 7.5% in MCC, and 10.8% in AUC, highlighting its potential as a powerful tool for RNA-targeted molecular design.

Key words: RNA-ligand, binding sites prediction, deep learning, self-attention

中图分类号:  (RNA)

  • 87.14.gn
87.15.A- (Theory, modeling, and computer simulation) 87.15.B- (Structure of biomolecules)