Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method
Sun Zhong-Hua(孙重华)a)b) and Jiang Fan(江凡)a)†
a Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; b Graduate School of the Chinese Academy of Sciences, Beijing 100049, China
Abstract In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.
Fund: Project supported by the National Natural Science Foundation of China (Grant Nos. 10674172 and 10874229).
Cite this article:
Sun Zhong-Hua(孙重华) and Jiang Fan(江凡) Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method 2010 Chin. Phys. B 19 110502
[1]
Zhou H X and Qin S B 2007 Bioinformatics 23 2203-2209
[2]
Smith J R and Sternberg M J 2002 Curr. Opin. Struct. Biol. 12 28
[3]
Hu Z, Ma B, Wolfson H and Nussinov R 2000 Proteins 39 331
[4]
Ma B, Elkayam T, Wolfson H and Nussinov 2003 Proc. Natl Acad. Sci. USA 100 5772
[5]
Armon A, Graur Dan and Ben-Tal N 2001 J. Mol. Biol. 307 447
[6]
de Vries S J, van Dijk A D J and Bovin A M J J 2006 Proteins 63 479
[7]
Chen H and Zhou H X 2005 Proteins 61 21
[8]
Janin J, Miller S and Chothia C 1988 J. Mol. Biol. 204 155
[9]
Li N, Sun Z and Jiang F B M C 2008 Bioinformatics 9 553
[10]
Chakrabarti P and Janin J 2002 Proteins 47 334
[11]
Bahadur R P, Chakrabarti P, Rodier F and Janin J 2003 Proteins 53 708
[12]
Vapnik V 1995 The Nature of Statistical Learning Theory (New York: Springer)
[13]
Fan R E, Chen P H and Lin C J 2005 Journal of Machine Learning Research 6 1889
[14]
Kabsch W and Sandor C 1983 Biopolymers Dec 22 2577
[15]
Collaborative Computational Project Number 4. 1994 Acta Crystallogr D 50 760
[16]
Lee B and Richards F M 1971 J. Mol. Biol. 14 379
[17]
Zhang C, Vasmatzis G, Cornette J L and DeLisi C 1997 J. Mol. Biol. 267 707
[18]
Gao L F, Liu X and Guan S 2008 Chin. Phys. B 17 4396
[19]
Liu J F 2009 Chin. Phys. B 18 2615
[20]
Jiang F and Li N 2007 Chin. Phys. 16 392
[21]
Xiao Y and Yao K L 1994 Chin. Phys. 3 788
[22]
Wang X H, Shen Y and Zhang L X 2009 Chin. Phys. B 18 1684 endfootnotesize
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.