Chin. Phys. B, 2020, Vol. 29(10): 108704    DOI: 10.1088/1674-1056/abb303
Special Issue: SPECIAL TOPIC — Modeling and simulations for the structures and functions of proteins and nucleic acids
Topical Review—Modeling and simulations for the structures and functions of proteins and nucleic acids  

Computational prediction of RNA tertiary structures using machine learning methods

Bin Huang(黄斌)1,2, Yuanyang Du(杜渊洋)1,2, Shuai Zhang(张帅)1,2, Wenfei Li(李文飞)1,2, Jun Wang (王骏)1,2, and Jian Zhang(张建)1,2,
1 National Laboratory of Solid State Microstructures, School of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China
2 Institute for Brain Sciences, Kuang Yaming Honors School, Nanjing University, Nanjing 210093, China

RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.

Keywords:  RNA structure prediction      RNA scoring function      knowledge-based potentials      machine learning      convolutional neural networks  
Received:  27 June 2020      Revised:  22 August 2020      Published:  05 October 2020
PACS:  87.15.B- (Structure of biomolecules) (RNA)  
  07.05.Mh (Neural networks, fuzzy logic, artificial intelligence)  
* Project supported by the National Natural Science Foundation of China (Grant Nos. 11774158, 11974173, 11774157, and 11934008).

Bin Huang(黄斌), Yuanyang Du(杜渊洋), Shuai Zhang(张帅), Wenfei Li(李文飞), Jun Wang (王骏), and Jian Zhang(张建)† Computational prediction of RNA tertiary structures using machine learning methods 2020 Chin. Phys. B 29 108704

The architecture of the multilayer perceptron used in the work.[23] It contains a single hidden layer. The inputs are structural features, and the output is a score that indicates the quality of the structural candidates.

The architecture of the CNN network in this work.[26] Note that not all convolutional layers are shown due to space limitations. Each cube represents a 3D image. The input layer has three channels, similar to the RGB channels in 2D images. The output is a single score, indicating the likeness of the input structure to the native structure.

3dRNAscore KB RASP Rosetta CNN model
Dataset-I 84/85 80/85 79/85 53/85 62/85
Dataset-II 17/20 20/20 12/20 12/20 19/20
Dataset-III 5/18 1/18 4/18 13/18
The performance of different scoring functions. In each cell, the first number is the number of RNAs that are correctly identified, and the second is the total RNAs in the dataset.[26] The bold number indicates the best one among the same dataset.

