中国物理B ›› 2018, Vol. 27 ›› Issue (2): 20503-020503.doi: 10.1088/1674-1056/27/2/020503

所属专题: SPECIAL TOPIC — Soft matter and biological physics

• SPECIAL TOPIC—Soft matter and biological physics • 上一篇    下一篇

Optimizing the atom types of proteins through iterative knowledge-based potentials

Xin-Xiang Wang(汪心享), Sheng-You Huang(黄胜友)   

  1. School of Physics, Huazhong University of Science and Technology, Wuhan 430074, China
  • 收稿日期:2017-09-28 修回日期:2017-12-30 出版日期:2018-02-05 发布日期:2018-02-05
  • 通讯作者: Sheng-You Huang E-mail:huangsy@hust.edu.cn
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant No. 31670724), the National Key Research and Development Program of China (Grant Nos. 2016YFC1305800 and 2016YFC1305805), and the Startup Grant of Huazhong University of Science and Technology, China.

Optimizing the atom types of proteins through iterative knowledge-based potentials

Xin-Xiang Wang(汪心享), Sheng-You Huang(黄胜友)   

  1. School of Physics, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2017-09-28 Revised:2017-12-30 Online:2018-02-05 Published:2018-02-05
  • Contact: Sheng-You Huang E-mail:huangsy@hust.edu.cn
  • About author:05.20.-y; 87.14.E-; 87.15.ad
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant No. 31670724), the National Key Research and Development Program of China (Grant Nos. 2016YFC1305800 and 2016YFC1305805), and the Startup Grant of Huazhong University of Science and Technology, China.

摘要: Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representation of protein structures. A key issue is to determine the minimal protein representations, which is important not only for developing of scoring functions but also for understanding the physics of protein folding. Despite significant progresses in simplifying residues into alphabets, few studies have been done to address the optimal number of atom types for proteins. Here, we have investigated the atom typing issue by classifying the 167 heavy atoms of proteins through 11 schemes with 1 to 20 atom types based on their physicochemical and functional environments. For each atom typing scheme, a statistical mechanics-based iterative method was used to extract atomic distance-dependent potentials from protein structures. The atomic distance-dependent pair potentials for different schemes were illustrated by several typical atom pairs with different physicochemical properties. The derived potentials were also evaluated on a high-resolution test set of 148 diverse proteins for native structure recognition. It was found that there was a crossover around the scheme of four atom types in terms of the success rate as a function of the number of atom types, which means that four atom types may be used when investigating the basic folding mechanism of proteins. However, it was revealed by a close examination of typical potentials that 14 atom types were needed to describe the protein interactions at atomic level. The present study will be beneficial for the development of protein related scoring functions and the understanding of folding mechanisms.

关键词: atom types, knowledge-based potentials, statistical mechanics, iteration

Abstract: Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representation of protein structures. A key issue is to determine the minimal protein representations, which is important not only for developing of scoring functions but also for understanding the physics of protein folding. Despite significant progresses in simplifying residues into alphabets, few studies have been done to address the optimal number of atom types for proteins. Here, we have investigated the atom typing issue by classifying the 167 heavy atoms of proteins through 11 schemes with 1 to 20 atom types based on their physicochemical and functional environments. For each atom typing scheme, a statistical mechanics-based iterative method was used to extract atomic distance-dependent potentials from protein structures. The atomic distance-dependent pair potentials for different schemes were illustrated by several typical atom pairs with different physicochemical properties. The derived potentials were also evaluated on a high-resolution test set of 148 diverse proteins for native structure recognition. It was found that there was a crossover around the scheme of four atom types in terms of the success rate as a function of the number of atom types, which means that four atom types may be used when investigating the basic folding mechanism of proteins. However, it was revealed by a close examination of typical potentials that 14 atom types were needed to describe the protein interactions at atomic level. The present study will be beneficial for the development of protein related scoring functions and the understanding of folding mechanisms.

Key words: atom types, knowledge-based potentials, statistical mechanics, iteration

中图分类号:  (Classical statistical mechanics)

  • 05.20.-y
87.14.E- (Proteins) 87.15.ad (Analytical theories)