中国物理B ›› 2015, Vol. 24 ›› Issue (12): 128202-128202.doi: 10.1088/1674-1056/24/12/128202

• SPECIAL TOPIC—8th IUPAP International Conference on Biological Physics • 上一篇    下一篇

Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

于家峰a b, 隋天翔a d, 王红梅c, 王春玲c, 荆莉c, 王吉华a c   

  1. a Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China;
    b State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China;
    c College of Physics and Electronic Information, Dezhou University, Dezhou 253023, China;
    d College of Life Science, Shandong Normal University, Jinan 250014, China
  • 收稿日期:2015-01-22 修回日期:2015-04-02 出版日期:2015-12-05 发布日期:2015-12-05
  • 通讯作者: Yu Jia-Feng E-mail:jfyu1979@126.com
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.

Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

Yu Jia-Feng (于家峰)a b, Sui Tian-Xiang (隋天翔)a d, Wang Hong-Mei (王红梅)c, Wang Chun-Ling (王春玲)c, Jing Li (荆莉)c, Wang Ji-Hua (王吉华)a c   

  1. a Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China;
    b State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China;
    c College of Physics and Electronic Information, Dezhou University, Dezhou 253023, China;
    d College of Life Science, Shandong Normal University, Jinan 250014, China
  • Received:2015-01-22 Revised:2015-04-02 Online:2015-12-05 Published:2015-12-05
  • Contact: Yu Jia-Feng E-mail:jfyu1979@126.com
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.

摘要: Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58.

关键词: Agrobacterium tumefaciens strain C58, protein-coding gene, genome re-annotation, graphical representation

Abstract: Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58.

Key words: Agrobacterium tumefaciens strain C58, protein-coding gene, genome re-annotation, graphical representation

中图分类号:  (Nucleic acids, DNA and RNA bases?)

  • 82.39.Pj
87.14.gk (DNA)