Computational prediction of RNA tertiary structures using machine learning methods
Bin Huang(黄斌)1,2, Yuanyang Du(杜渊洋)1,2, Shuai Zhang(张帅)1,2, Wenfei Li(李文飞)1,2, Jun Wang (王骏)1,2, and Jian Zhang(张建)1,2,†
1National Laboratory of Solid State Microstructures, School of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China 2Institute for Brain Sciences, Kuang Yaming Honors School, Nanjing University, Nanjing 210093, China
RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.
* Project supported by the National Natural Science Foundation of China (Grant Nos. 11774158, 11974173, 11774157, and 11934008).
Cite this article:
Bin Huang(黄斌), Yuanyang Du(杜渊洋), Shuai Zhang(张帅), Wenfei Li(李文飞), Jun Wang (王骏), and Jian Zhang(张建)† Computational prediction of RNA tertiary structures using machine learning methods 2020 Chin. Phys. B 29 108704
Fig. 1.
The architecture of the multilayer perceptron used in the work.[23] It contains a single hidden layer. The inputs are structural features, and the output is a score that indicates the quality of the structural candidates.
Fig. 2.
The architecture of the CNN network in this work.[26] Note that not all convolutional layers are shown due to space limitations. Each cube represents a 3D image. The input layer has three channels, similar to the RGB channels in 2D images. The output is a single score, indicating the likeness of the input structure to the native structure.
3dRNAscore
KB
RASP
Rosetta
CNN model
Dataset-I
84/85
80/85
79/85
53/85
62/85
Dataset-II
17/20
20/20
12/20
12/20
19/20
Dataset-III
5/18
–
1/18
4/18
13/18
Table 1.
The performance of different scoring functions. In each cell, the first number is the number of RNAs that are correctly identified, and the second is the total RNAs in the dataset.[26] The bold number indicates the best one among the same dataset.
[1]
Mercer T R, Dinger M E, Mattick J S 2009 Nat. Rev. Genetics 10 155 DOI: 10.1038/nrg2521
Sponer J, Bussi G, Krepl M, Banas P, Bottaro S, Cunha R A, Gil-Ley A, Pinamonti G, Poblete S, Jurecka P, Walter N G, Otyepka M 2018 Chem. Rev. 118 4177 DOI: 10.1021/acs.chemrev.7b00427
Goodfellow I, Bengio Y, Courville A 2016 Deep learning. Adaptive computation and machine learning Cambridge The MIT Press 197 200
[13]
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham H, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D 2016 Nature 529 484 DOI: 10.1038/nature16961
[14]
Alipanahi B, Delong A, Weirauch M T, Frey B J 2015 Nat. Biotech. 33 831 DOI: 10.1038/nbt.3300
[15]
Zhou J, Troyanskaya O G 2015 Nat. Methods 12 931 DOI: 10.1038/nmeth.3547
Shi Y Z, Jin L, Feng C J, Tan Y L, Tan Z J 2018 Plos Comput. Biol. 14 e1006222 DOI: 10.1371/journal.pcbi.1006222
[49]
Jin L, Tan Y L, Wu Y, Wang X, Shi Y Z, Tan Z J 2019 RNA 25 1532 DOI: 10.1261/rna.071662.119
[50]
Wang J M, Cieplak P, Li J, Wang J, Cai Q, Hsieh M J, Lei H X, Luo R, Duan Y 2011 J. Phys. Chem. B 115 3100 DOI: 10.1021/jp1121382
[51]
Li Y, Li H, Pickard F C, Narayanan B, Sen F G, Chan M, Sankaranarayanan S, Brooks B R, Roux B 2017 J. Chem. Theory Comput. 13 4492 DOI: 10.1021/acs.jctc.7b00521
[52]
Bereau T, DiStasio R A, Tkatchenko A, Lilienfeld O A 2018 J. Chem. Phys. 148 241706 DOI: 10.1063/1.5009502
Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki E P, Rivas E, Eddy S R, Bateman A, Finn R D, Petrov A 2018 Nuc. Acids Res. 46 D335 DOI: 10.1093/nar/gkx1038
[71]
Wang J X, Nelson Z K, Tirumala D, Soyer H, Leibo J Z, Munos R, Blundell C, Kumaran D, Botvinick M 2017 arXiv:1611.05763v3 DOI: 10.1145/3386252
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.