中国物理B ›› 2016, Vol. 25 ›› Issue (6): 60503-060503.doi: 10.1088/1674-1056/25/6/060503

• GENERAL • 上一篇    下一篇

Exploring the relationship between fractal features and bacterial essential genes

Yong-Ming Yu(余永明), Li-Cai Yang(杨立才), Qian Zhou(周茜), Lu-Lu Zhao(赵璐璐), Zhi-Ping Liu(刘治平)   

  1. 1 Department of Biomedical Engineering, Shandong University, Jinan 250061, China;
    2 Province-Ministry Joint Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability, Hebei University of Technology, Tianjin 300130, China;
    3 Department of Biomedical Engineering, Hebei University of Technology, Tianjin 300130, China
  • 收稿日期:2015-12-19 修回日期:2016-02-27 出版日期:2016-06-05 发布日期:2016-06-05
  • 通讯作者: Li-Cai Yang E-mail:yanglc@sdu.edu.cn
  • 基金资助:

    Project supported by the Shandong Provincial Natural Science Foundation, China (Grant No. ZR2014FM022).

Exploring the relationship between fractal features and bacterial essential genes

Yong-Ming Yu(余永明)1, Li-Cai Yang(杨立才)1, Qian Zhou(周茜)2,3, Lu-Lu Zhao(赵璐璐)1, Zhi-Ping Liu(刘治平)1   

  1. 1 Department of Biomedical Engineering, Shandong University, Jinan 250061, China;
    2 Province-Ministry Joint Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability, Hebei University of Technology, Tianjin 300130, China;
    3 Department of Biomedical Engineering, Hebei University of Technology, Tianjin 300130, China
  • Received:2015-12-19 Revised:2016-02-27 Online:2016-06-05 Published:2016-06-05
  • Contact: Li-Cai Yang E-mail:yanglc@sdu.edu.cn
  • Supported by:

    Project supported by the Shandong Provincial Natural Science Foundation, China (Grant No. ZR2014FM022).

摘要:

Essential genes are indispensable for the survival of an organism in optimal conditions. Rapid and accurate identifications of new essential genes are of great theoretical and practical significance. Exploring features with predictive power is fundamental for this. Here, we calculate six fractal features from primary gene and protein sequences and then explore their relationship with gene essentiality by statistical analysis and machine learning-based methods. The models are applied to all the currently available identified genes in 27 bacteria from the database of essential genes (DEG). It is found that the fractal features of essential genes generally differ from those of non-essential genes. The fractal features are used to ascertain the parameters of two machine learning classifiers: Naïve Bayes and Random Forest. The area under the curve (AUC) of both classifiers show that each fractal feature is satisfactorily discriminative between essential genes and non-essential genes individually. And, although significant correlations exist among fractal features, gene essentiality can also be reliably predicted by various combinations of them. Thus, the fractal features analyzed in our study can be used not only to construct a good essentiality classifier alone, but also to be significant contributors for computational tools identifying essential genes.

关键词: fractal features, bacteria, essential gene, machine learning

Abstract:

Essential genes are indispensable for the survival of an organism in optimal conditions. Rapid and accurate identifications of new essential genes are of great theoretical and practical significance. Exploring features with predictive power is fundamental for this. Here, we calculate six fractal features from primary gene and protein sequences and then explore their relationship with gene essentiality by statistical analysis and machine learning-based methods. The models are applied to all the currently available identified genes in 27 bacteria from the database of essential genes (DEG). It is found that the fractal features of essential genes generally differ from those of non-essential genes. The fractal features are used to ascertain the parameters of two machine learning classifiers: Naïve Bayes and Random Forest. The area under the curve (AUC) of both classifiers show that each fractal feature is satisfactorily discriminative between essential genes and non-essential genes individually. And, although significant correlations exist among fractal features, gene essentiality can also be reliably predicted by various combinations of them. Thus, the fractal features analyzed in our study can be used not only to construct a good essentiality classifier alone, but also to be significant contributors for computational tools identifying essential genes.

Key words: fractal features, bacteria, essential gene, machine learning

中图分类号:  (Fractals)

  • 05.45.Df
87.14.G- (Nucleic acids)