中国物理B ›› 2010, Vol. 19 ›› Issue (11): 110502-110201.doi: 10.1088/1674-1056/19/11/110502

• • 上一篇    下一篇

Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method

江凡1, 孙重华2   

  1. (1)Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; (2)Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; Graduate School of the Chinese Academy of Sciences, Beijing 100049, China
  • 收稿日期:2010-06-25 修回日期:2010-07-07 出版日期:2010-11-15 发布日期:2010-11-15
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant Nos. 10674172 and 10874229).

Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method

Sun Zhong-Hua(孙重华)a)b) and Jiang Fan(江凡)a)†   

  1. a Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; b Graduate School of the Chinese Academy of Sciences, Beijing 100049, China
  • Received:2010-06-25 Revised:2010-07-07 Online:2010-11-15 Published:2010-11-15
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant Nos. 10674172 and 10874229).

摘要: In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.

Abstract: In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.

Key words: protein binding site, support vector machine regression, cross-validation, neighbour residue

中图分类号:  (Proteins)

  • 87.14.E-
87.15.A- (Theory, modeling, and computer simulation) 87.15.K- (Molecular interactions; membrane-protein interactions) 87.15.N- (Properties of solutions of macromolecules)