中国物理B ›› 2016, Vol. 25 ›› Issue (6): 60503-060503.doi: 10.1088/1674-1056/25/6/060503
Yong-Ming Yu(余永明), Li-Cai Yang(杨立才), Qian Zhou(周茜), Lu-Lu Zhao(赵璐璐), Zhi-Ping Liu(刘治平)
Yong-Ming Yu(余永明)1, Li-Cai Yang(杨立才)1, Qian Zhou(周茜)2,3, Lu-Lu Zhao(赵璐璐)1, Zhi-Ping Liu(刘治平)1
摘要:
Essential genes are indispensable for the survival of an organism in optimal conditions. Rapid and accurate identifications of new essential genes are of great theoretical and practical significance. Exploring features with predictive power is fundamental for this. Here, we calculate six fractal features from primary gene and protein sequences and then explore their relationship with gene essentiality by statistical analysis and machine learning-based methods. The models are applied to all the currently available identified genes in 27 bacteria from the database of essential genes (DEG). It is found that the fractal features of essential genes generally differ from those of non-essential genes. The fractal features are used to ascertain the parameters of two machine learning classifiers: Naïve Bayes and Random Forest. The area under the curve (AUC) of both classifiers show that each fractal feature is satisfactorily discriminative between essential genes and non-essential genes individually. And, although significant correlations exist among fractal features, gene essentiality can also be reliably predicted by various combinations of them. Thus, the fractal features analyzed in our study can be used not only to construct a good essentiality classifier alone, but also to be significant contributors for computational tools identifying essential genes.
中图分类号: (Fractals)