Protein structural classification and family identification by multifractal analysis and wavelet spectrum

doi:10.1088/1674-1056/20/1/010505

中国物理B ›› 2011, Vol. 20 ›› Issue (1): 10505-010505.doi: 10.1088/1674-1056/20/1/010505

Protein structural classification and family identification by multifractal analysis and wavelet spectrum

Ahn Vo¹, 朱少茗², 喻祖国²

(1)School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q 4001, Australia; (2)School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q 4001, Australia;School of Mathematics and Computational Science, Xiangtan University, Hunan 411105, China

收稿日期:2010-04-09 修回日期:2010-09-10 出版日期:2011-01-15 发布日期:2011-01-15
基金资助:
Project supported by the Australian Research Council (Grant No. DP0559807), a Research Capacity Building Award at QUT, Scientific Research Fund of Hunan Provincial Education Department of China (Grant No. 06C826), the Chinese Program for New Century Excellent Talents in University (Grant No. NCET-08-06867), the Hunan Provincial Natural Science Foundation of China (Grant No. 10JJ7001), the Program for Furong Scholars of Hunan Province of China, and the Aid program for Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan Province of China.

Protein structural classification and family identification by multifractal analysis and wavelet spectrum

Zhu Shao-Ming(朱少茗)^a)b), Yu Zu-Guo(喻祖国)^a)b)†, and Ahn Vo^a)

^a School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q 4001, Australia; ^b School of Mathematics and Computational Science, Xiangtan University, Hunan 411105, China

Received:2010-04-09 Revised:2010-09-10 Online:2011-01-15 Published:2011-01-15
Supported by:
Project supported by the Australian Research Council (Grant No. DP0559807), a Research Capacity Building Award at QUT, Scientific Research Fund of Hunan Provincial Education Department of China (Grant No. 06C826), the Chinese Program for New Century Excellent Talents in University (Grant No. NCET-08-06867), the Hunan Provincial Natural Science Foundation of China (Grant No. 10JJ7001), the Program for Furong Scholars of Hunan Province of China, and the Aid program for Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan Province of China.

摘要/Abstract

摘要： Family identification is helpful for predicting protein functions. It has been known from the literature that longer sequences of base pairs or amino acids are required to study patterns in biological sequences. Since most protein sequences are relatively short, we randomly concatenate or link the protein sequences from the same family or superfamily together to form longer protein sequences. The 6-letter model, 12-letter model, 20-letter model, the revised Schneider and Wrede scale hydrophobicity, solvent accessibility and stochastic standard state accessibility are used to convert linked protein sequences into numerical sequences. Then multifractal analyses and wavelet analysis are performed on these numerical sequences. The parameters from these analyses can be used to construct parameter spaces where each linked protein is represented by a point. The four classes of proteins, namely the α,β, α+βand α /β classes, are then distinguished in these parameter spaces. The Fisher linear discriminant algorithm is used to assess the discriminant accuracy. Numerical results indicate that the discriminant accuracies are satisfactory in separating these classes. We find that the linked proteins from the same family or superfamily tend to group together and can be separated from other linked proteins. The methods are helpful for identifying the family of an unknown protein.

关键词: protein family, multifractal analysis, wavelet spectrum

Abstract: Family identification is helpful for predicting protein functions. It has been known from the literature that longer sequences of base pairs or amino acids are required to study patterns in biological sequences. Since most protein sequences are relatively short, we randomly concatenate or link the protein sequences from the same family or superfamily together to form longer protein sequences. The 6-letter model, 12-letter model, 20-letter model, the revised Schneider and Wrede scale hydrophobicity, solvent accessibility and stochastic standard state accessibility are used to convert linked protein sequences into numerical sequences. Then multifractal analyses and wavelet analysis are performed on these numerical sequences. The parameters from these analyses can be used to construct parameter spaces where each linked protein is represented by a point. The four classes of proteins, namely the $\alpha$, $\beta$, $\alpha+\beta$ and $\alpha/\beta$ classes, are then distinguished in these parameter spaces. The Fisher linear discriminant algorithm is used to assess the discriminant accuracy. Numerical results indicate that the discriminant accuracies are satisfactory in separating these classes. We find that the linked proteins from the same family or superfamily tend to group together and can be separated from other linked proteins. The methods are helpful for identifying the family of an unknown protein.

Key words: protein family, multifractal analysis, wavelet spectrum

中图分类号: (Fractals)

05.45.Df

64.60.al (Fractal and multifractal systems) 87.15.bd (Secondary structure)

朱少茗, 喻祖国, Ahn Vo. Protein structural classification and family identification by multifractal analysis and wavelet spectrum[J]. 中国物理B, 2011, 20(1): 10505-010505.

Zhu Shao-Ming(朱少茗), Yu Zu-Guo(喻祖国), and Ahn Vo. Protein structural classification and family identification by multifractal analysis and wavelet spectrum[J]. Chin. Phys. B, 2011, 20(1): 10505-010505.

参考文献 71

[1]	Abascal2003 Abascal F and Valencia A 2003 Proteins: Structure, Function, and Genetics 53 683 bibitem
[2]	Weisser2004 Weisser D and Klein-Seetharaman J 2004 Proceeding of ACM Symposium on Applied Computing pp.154-161 bibitem
[3]	Baker2001 Baker D and Sali A 2001 Science 294 93 bibitem
[4]	Gu2008 Gu F, Chen H and Ni J 2008 em BMC Bioinformatics 9 (Suppl 6) S5 bibitem
[5]	An73 Anfinsen C 1973 Science 181 223 bibitem
[6]	Baxevanis2005 Baxevanis A D and Ouellette B F F (Eds) 2005 Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (3rd edition) (John Wiley & Sons, Inc.) bibitem
[7]	WangY2000 Wang Z and Yuan Z 2000 Proteins: Structure, Function, and Genetics 38 165 bibitem
[8]	Kurgan2006 Kurgan L and Homaeian L 2006 Pattern Recognition 39 2323 bibitem
[9]	ZhangY2008 Zhang Y 2008 Current Opinion in Structural Biology 18 342 bibitem
[10]	CaoL2006 Cao Y, Liu S, Zhang L, Qin J, Wang J and Tang K 2006 BMC Bioinformatics 7 20 bibitem
[11]	Chou2000 Chou K C 2000 Curr. Protein Pept. Sci. 1 171 bibitem
[12]	Chou1998 Chou K C and Maggiora G M 1998 Protein Engineering bf 11 523 bibitem
[13]	Kedarisetti2006 Kedarisetti K D, Kurgan L and Dick S 2006 Biochem. Biophys. Res. Commun. 348 981 bibitem
[14]	Chou2001 Chou K C 2001 em Proteins: Structure, Function, and Genetics 43 246 bibitem
[15]	Xiao2006 Xiao X, Shao S, Huang Z and Chou K C 2006 J. Comput. Chem. 27 478 bibitem
[16]	Chou1995 Chou K C 1995 Proteins 21 319 bibitem
[17]	Kurgan2007 Kurgan L and Chen K 2007 em Biochem. Biophys. Res. Commun. 357 453 bibitem
[18]	Kurgan2008 Kurgan L, Cios K and Chen K 2008 BMC Bioinformatics 9 226 bibitem
[19]	Jahandideh2007 Jahandideh S, Abdolmaleki P, Jahandideh M, Sadat Hayatshahi S H 2007 J. Theor. Biol. 244 275 bibitem
[20]	Cai2000 Cai Y D and Zhou G P 2000 Biochimie 82 783 bibitem
[21]	Cai2001 Cai Y D, Liu X J, Xu X and Zhou G P 2001 BMC Bioinformatics 2 3 bibitem
[22]	Cai2003 Cai Y D, Liu X J, Xu X B and Chou K C 2003 J. Theor. Biol. 221 115 bibitem
[23]	Feng2005 Feng K Y, Cai Y D and Chou K C 2005 Biochem. Biophys. Res. Commun. 334 213 bibitem
[24]	Chou2005 Chou K C 2005 Curr. Protein Pept. Sci. 6 423 bibitem
[25]	Murzin1995 Murzin A G, Brenner S E, Hubbard T and Chothia C 1995 J. Mol. Biol. 247 536 bibitem
[26]	Andreeva2004 Andreeva A, Howorth D, Brenner S E, Hubbard T J, Chothia C and Murzin A G 2004 Nucleic Acids Res. 32 D226-9. bibitem
[27]	Orengo1997 Orengo C A, Michie A D, Jones S, Jones D T, Swindells M B and Thornton J M 1997 Structure 5 1093 bibitem
[28]	Yu2006 Yu Z G, Anh V, Lau K S and Zhou L Q 2006 Phys. Rev. E 73 031920 bibitem
[29]	Yang2009a Yang J Y, Yu Z G and Anh V 2009 em Chaos, Solutions and Fractals 40 607 bibitem
[30]	Zhou2007 Zhou Y, Yu Z G and Anh V 2007 Phys. Lett. A 368 314 bibitem
[31]	Yang2009b Yang J Y, Peng Z L, Yu Z G, Zhang R J, Anh V and Wang D S 2009 J. Theor. Biol. 257 618 bibitem
[32]	FJZ2008 Feng J, Liu J H and Zhang H G 2008 Acta Phys. Sin. 57 6868 (in Chinese) bibitem
[33]	CFSWZ2009 Chen Y P, Fu P P, Shi M H, Wu J F and Zhang C B 2009 Acta Phys. Sin. 58 7050 (in Chinese) bibitem
[34]	Mandelbrot1983 Mandelbrot B 1983 The Fractal Geometry of Nature (New York: Academic Press) bibitem
[35]	YAL2001 Yu Z G, Anh V and Lau K S 2001 Phys. Rev. E 64 031903 bibitem
[36]	ALY2002 Anh V V, Lau K S and Yu Z G 2002 em Phys. Rev. E 66 031910 bibitem
[37]	YAL2003 Yu Z G, Anh V and Lau K S 2003 Phys. Rev. E 68 021913 bibitem
[38]	YAL2004 Yu Z G, Anh V and Lau K S 2004 J. Theor. Biol. 226 341 bibitem
[39]	YZYA2008 Yang J Y, Zhou Y, Yu Z G and Anh V 2008 BMC Bioinformatics 9 113 bibitem
[40]	Yu2002 Yu Z G, Anh V V, Gong Z M and Long S C 2002 Chin. Phys. 11 1313 bibitem
[41]	Jeffrey1990 Jeffrey H J 1990 Nucleic Acids Res. 18 2163 bibitem
[42]	HanFu2010 Han J J and Fu W J 2010 Chin. Phys. B 19 010205 bibitem
[43]	Fiser1994 Fiser A, Tusnady G E and Simon I 1994 J. Mol. Graphics 12 302 bibitem
[44]	Basu1997 Basu S, Pan A, Dutta C and Das J 1997 J. Mol. Graph. Model. 15 279 bibitem
[45]	Gao2009Gao J and Xu Z Y 2009 Chin. Phys. B 18 370 bibitem
[46]	GJX2009Gao J, Jiang L L and Xu Z Y 2009 Chin. Phys. B 18 4571 bibitem
[47]	Yu2010 Yu Z G, Xiao Q J, Shi L, Yu J W and Anh V 2010 Chin. Phys. B 19 068701 bibitem
[48]	CGL2005 Chen H, Gu F and Liu F 2005 Proceedings of the 2005 IEEE, Engineering in Medicine and Biology 27^th Annual Conference bibitem
[49]	Marsolo2006 Marsolo K and Ramamohanarao K 2006 em Proceedings of the 15^th ACM International Conference on Information and Knowledge Management pp24--33 bibitem
[50]	QLZM2003 Qiu J D, Liang R P, Zou X Y and Mo J Y 2003 Talanta 61 285 bibitem
[51]	Deschavanne2008 Deschavanne P and Tuffery P 2008 Biochimie 90 615 bibitem
[52]	Bebek2007 Bebek G and Yang J 2007 BMC Bioinformatics 8 335 bibitem
[53]	Chou1974 Chou P Y and Fasman G D 1974 em Biochemistry 13 211 bibitem
[54]	Brown1998 Brown T A 1998 Genetics (3rd Edition) (CHAPMAN & HALL, London) bibitem
[55]	Giuliani2000 Giuliani A, Benigni R, Sirabella P, Zbilut J P and Colosimo A 2000 Biophys. J. 78 136 bibitem
[56]	Macdonald2001 Macdonald J R and Johnson W C 2001 Protein Science 10 1172 bibitem
[57]	Bordo1991 Bordo D and Argos P 1991 J. Mol. Biol. 217 721 bibitem
[58]	Rose1985 Rose G D, Geselowitz A R, Lesser G J, Lee R H and Zehfus M H 1985 Science 229 834 bibitem
[59]	Hasley1986 Hasley T C, Jensen M H, Kadanoff L P, Procaccia I and Shraiman B I 1986 Phys. Rev. A 33 1141 bibitem
[60]	Canessa2000 Canessa E 2000 J. Phys. A 33 3637 bibitem
[61]	Balafas1995 Balafas J S and Dewey T G 1995 Phys. Rev. E 52 880 bibitem
[62]	Chui1992 Chui C K 1992 An Introduction to Wavelets (San Diego: Academic Press Professional) bibitem
[63]	Arneodo1995 Arneodo A, Bacry E and Muzy J F 1995 em Physica A 213 232 bibitem
[64]	Mardia1979 Mardia K V, Kent J T and Billy J M 1979 em Multivariate Analysis (London: Academic Press) bibitem
[65]	Rost1993 Rost B and Sander C 1993 em J. Mol. Biol. 1232 584 bibitem
[66]	Naka1986 Nakashima H, Nishikawa K and Ooi T 1986 J. Biochem. 99 153 bibitem
[67]	Eisen1996a Eisenhaber F, Imperiale F, Argos P and Frommel C 1996 Proteins 25 157 bibitem
[68]	Eisen1996b Eisenhaber F, Frommel C and Argos P 1996 Proteins 25 169 bibitem
[69]	pred Frishman D and Argos P 1996 Protein Engneering 9 133 bibitem
[70]	sopma Geourjon C and Deleage G 1995 Comput. Appl. Biosci. 11 681 bibitem
[71]	hnn Guermeur Y 1997 Combinaison de Classifieurs Statistiques, Application a la Prediction de Structure Secondaire des Proteines, PhD Thesis, Universite Paris endfootnotesize

Protein structural classification and family identification by multifractal analysis and wavelet spectrum

Protein structural classification and family identification by multifractal analysis and wavelet spectrum

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献 71

相关文章 2

Metrics

本文评价

推荐阅读 0

[1]	喻祖国, 肖前军, 石龙, 余君武, Vo Anh. Chaos game representation of functional protein sequences, and simulation and multifractal analysis of induced measures[J]. 中国物理B, 2010, 19(6): 68701-068701.
[2]	喻祖国, Vo Anh, 龚志民, 龙顺湖. Fractals in DNA sequence analysis[J]. 中国物理B, 2002, 11(12): 1313-1318.