摘要 Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representation of protein structures. A key issue is to determine the minimal protein representations, which is important not only for developing of scoring functions but also for understanding the physics of protein folding. Despite significant progresses in simplifying residues into alphabets, few studies have been done to address the optimal number of atom types for proteins. Here, we have investigated the atom typing issue by classifying the 167 heavy atoms of proteins through 11 schemes with 1 to 20 atom types based on their physicochemical and functional environments. For each atom typing scheme, a statistical mechanics-based iterative method was used to extract atomic distance-dependent potentials from protein structures. The atomic distance-dependent pair potentials for different schemes were illustrated by several typical atom pairs with different physicochemical properties. The derived potentials were also evaluated on a high-resolution test set of 148 diverse proteins for native structure recognition. It was found that there was a crossover around the scheme of four atom types in terms of the success rate as a function of the number of atom types, which means that four atom types may be used when investigating the basic folding mechanism of proteins. However, it was revealed by a close examination of typical potentials that 14 atom types were needed to describe the protein interactions at atomic level. The present study will be beneficial for the development of protein related scoring functions and the understanding of folding mechanisms.
Abstract:Knowledge-based scoring functions have been widely used for protein structure prediction, protein-small molecule, and protein-nucleic acid interactions, in which one critical step is to find an appropriate representation of protein structures. A key issue is to determine the minimal protein representations, which is important not only for developing of scoring functions but also for understanding the physics of protein folding. Despite significant progresses in simplifying residues into alphabets, few studies have been done to address the optimal number of atom types for proteins. Here, we have investigated the atom typing issue by classifying the 167 heavy atoms of proteins through 11 schemes with 1 to 20 atom types based on their physicochemical and functional environments. For each atom typing scheme, a statistical mechanics-based iterative method was used to extract atomic distance-dependent potentials from protein structures. The atomic distance-dependent pair potentials for different schemes were illustrated by several typical atom pairs with different physicochemical properties. The derived potentials were also evaluated on a high-resolution test set of 148 diverse proteins for native structure recognition. It was found that there was a crossover around the scheme of four atom types in terms of the success rate as a function of the number of atom types, which means that four atom types may be used when investigating the basic folding mechanism of proteins. However, it was revealed by a close examination of typical potentials that 14 atom types were needed to describe the protein interactions at atomic level. The present study will be beneficial for the development of protein related scoring functions and the understanding of folding mechanisms.
基金资助:Project supported by the National Natural Science Foundation of China (Grant No. 31670724), the National Key Research and Development Program of China (Grant Nos. 2016YFC1305800 and 2016YFC1305805), and the Startup Grant of Huazhong University of Science and Technology, China.
通讯作者:
Sheng-You Huang
E-mail: huangsy@hust.edu.cn
引用本文:
汪心享, 黄胜友. Optimizing the atom types of proteins through iterative knowledge-based potentials[J]. 中国物理B, 2018, 27(2): 20503-020503.
Xin-Xiang Wang, Sheng-You Huang. Optimizing the atom types of proteins through iterative knowledge-based potentials. Chin. Phys. B, 2018, 27(2): 20503-020503.