|
|
Exploring the relationship between fractal features and bacterial essential genes |
Yong-Ming Yu(余永明)1, Li-Cai Yang(杨立才)1, Qian Zhou(周茜)2,3, Lu-Lu Zhao(赵璐璐)1, Zhi-Ping Liu(刘治平)1 |
1 Department of Biomedical Engineering, Shandong University, Jinan 250061, China;
2 Province-Ministry Joint Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability, Hebei University of Technology, Tianjin 300130, China;
3 Department of Biomedical Engineering, Hebei University of Technology, Tianjin 300130, China |
|
|
Abstract Essential genes are indispensable for the survival of an organism in optimal conditions. Rapid and accurate identifications of new essential genes are of great theoretical and practical significance. Exploring features with predictive power is fundamental for this. Here, we calculate six fractal features from primary gene and protein sequences and then explore their relationship with gene essentiality by statistical analysis and machine learning-based methods. The models are applied to all the currently available identified genes in 27 bacteria from the database of essential genes (DEG). It is found that the fractal features of essential genes generally differ from those of non-essential genes. The fractal features are used to ascertain the parameters of two machine learning classifiers: Naïve Bayes and Random Forest. The area under the curve (AUC) of both classifiers show that each fractal feature is satisfactorily discriminative between essential genes and non-essential genes individually. And, although significant correlations exist among fractal features, gene essentiality can also be reliably predicted by various combinations of them. Thus, the fractal features analyzed in our study can be used not only to construct a good essentiality classifier alone, but also to be significant contributors for computational tools identifying essential genes.
|
Received: 19 December 2015
Revised: 27 February 2016
Accepted manuscript online:
|
PACS:
|
05.45.Df
|
(Fractals)
|
|
87.14.G-
|
(Nucleic acids)
|
|
Fund: Project supported by the Shandong Provincial Natural Science Foundation, China (Grant No. ZR2014FM022). |
Corresponding Authors:
Li-Cai Yang
E-mail: yanglc@sdu.edu.cn
|
Cite this article:
Yong-Ming Yu(余永明), Li-Cai Yang(杨立才), Qian Zhou(周茜), Lu-Lu Zhao(赵璐璐), Zhi-Ping Liu(刘治平) Exploring the relationship between fractal features and bacterial essential genes 2016 Chin. Phys. B 25 060503
|
[1] |
Winzeler E A, Shoemaker D D and Astromoff A, et al. 1999 Science 285 901
|
[2] |
Furney S J, Alb M M and Lopez-Bigas N 2006 BMC Genomics 7 165
|
[3] |
Clatworthy A E, Pierson E and Hung D T 2007 Nat. Chem. Biol. 3 541
|
[4] |
Glass J I, Hutchison C A, Smith H O and Venter J C 2009 Mol. Syst. Biol. 5 330
|
[5] |
Gibson D G, Glass J I and Lartigue C, et al. 2010 Science 329 52
|
[6] |
Kamath R S, Fraser A G, Dong Y, Poulin G, Durbin R, Gotta M, Kanapin A, Le Bot N, Moreno S, Sohrmann M, Welchman D P, Zipperlen P and Ahringer J 2003 Nature 421 231
|
[7] |
Giaever G, Chu A M, Ni L et al. 2002 Nature 418 387
|
[8] |
Gallagher L A, Ramage E, Jacobst M A, Kaul R, Brittnacher M and Manoil C 2007 P. Natl. Acad. Sci. USA 104 1009
|
[9] |
Tong X, Campbell J W, Bal A Zsi G A B, Kay K A, Wanner B L, Gerdes S Y and Oltvai Z N 2004 Biochem. Bioph. Res. Co. 322 347
|
[10] |
Antonia Molina Henares M, de la Torre J, Garcia Salamanca A, Jesus Molina Henares A, Carmen Herrera M, Ramos J L and Duque E 2010 Environ. Microbiol. 12 1468
|
[11] |
Gustafson A M, Snitkin E S, Parker S C, Delisi C and Kasif S 2006 BMC Genomics 7 265
|
[12] |
Jehl M A, Arnold R and Rattei T 2011 Nucleic Acids Res. 391 D591
|
[13] |
Saha S, Heber S and Others Genet. Mol. Res. 5 224
|
[14] |
Sharp P M 1991 J. Mol. Evol. 33 23
|
[15] |
Li M, Zheng R, Zhang H, Wang J and Pan Y 2014 Methods 67 325
|
[16] |
Rocha E P and Danchin A 2003 Nat. Genet. 34 377
|
[17] |
P A L C, Papp B A Z and Hurst L D 2003 Nature 421 496
|
[18] |
Lin Y, Gao F and Zhang C T 2010 Biochem. Bioph. Res. Co. 396 472
|
[19] |
Hawoong J, Zoltan N O and Albert-Laszlo B 2003 ComPlexus 1 19
|
[20] |
Han G S, Yu Z G and Vo A 2011 Chin. Phys. B 20 100504
|
[21] |
Hwang Y C, Lin C C, Chang J Y, Mori H, Juan H F and Huang H C 2009 Mol. Biosyst. 5 1672
|
[22] |
Wang J X, Peng W and Wu F X 2013 Proteom. Clin. Appl. 7 181
|
[23] |
Ning K, Ng H K, Srihari S, Leong H W and Nesvizhskii A I 2010 BMC Bioinformatics 11 505
|
[24] |
Acencio M L and Lemke N 2009 BMC Bioinformatics 10 290
|
[25] |
Seringhaus M, Paccanaro A, Borneman A, Snyder M and Gerstein M 2006 Genome Res. 16 1126
|
[26] |
Wunderlich Z and Mirny L A 2006 Biophys. J. 91 2304
|
[27] |
Estrada E and Rodriguez-Velazquez J A 2005 Phys. Rev. E 71 56103
|
[28] |
Yu Z G, Xiao Q J, Shi L, Wu Y J and Anh V 2010 Chin. Phys. B 19 068701
|
[29] |
Yu Z G, Anh V, Min G Z and Long S C 2002 Chin. Phys. 11 1313
|
[30] |
Zhu S M, Yu Z G and Vo A 2011 Chin. Phys. B 20 100505
|
[31] |
Zhou Q and Yu Y M 2014 Chaos, Solitons and Fractals 69 209
|
[32] |
Zhou Q and Yu Y M 2014 J. Phys. D: Appl. Phys. 47 465401
|
[33] |
Jeffrey H J 1990 Nucleic Acids Res. 18 2163
|
[34] |
Randic M 2008 Chem. Phys. Lett. 456 84
|
[35] |
Yu Z G, Anh V and Lau K S 2004 J. Theor. Biol. 226 341
|
[36] |
Palaniappan K and Mukherjee S 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), Honolulu, Hawaii, USA, p. 189
|
[37] |
Ning L W, Lin H, Ding H, Huang J, Rao N and Guo F B 2014 Genet. Mol. Res 13 4564
|
[38] |
Tang X, Wang J, Zhong J and Pan Y 2014 IEEEACM Trans. Comput. Biol. Bioinform. 11 407
|
[39] |
Yu H Y, Greenbaum D, Lu H X, Zhu X W and Gerstein M 2004 Trends Genet. 20 227
|
[40] |
Deng J, Deng L, Su S, Zhang M, Lin X, Wei L, Minai A A, Hassett D J and Lu L J 2011 Nucleic Acids Res. 39 795
|
No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|