Please wait a minute...
Chin. Phys. B, 2025, Vol. 34(8): 088704    DOI: 10.1088/1674-1056/add508
Special Issue:
SPECIAL TOPIC — A celebration of the 90th Anniversary of the Birth of Bolin Hao Prev   Next  

CVTree for 16S rRNA: Constructing taxonomy-compatible all-species living tree effectively and efficiently

Yi-Fei Lu(卢逸飞)2, Xiao-Yang Zhi(职晓阳)2,†, and Guang-Hong Zuo(左光宏)1,‡
1 Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China;
2 Yunnan Institute of Microbiology, Key Laboratory of Microbial Diversity in Southwest China of Ministry of Education, School of Life Sciences, Yunnan University, Kunming 650091, China
Abstract  The composition vector tree (CVTree) method, developed under the leadership of Professor Hao Bailin, is an alignment-free algorithm for constructing phylogenetic trees. Although initially designed for studying prokaryotic evolution based on whole-genome, it has demonstrated broad applicability across diverse biological systems and gene sequences. In this study, we employed two methods, InterList and Hao, of CVTree to investigate the phylogeny and taxonomy of prokaryote based on the 16S rRNA sequences from All-Species Living Tree Project. We have established a comprehensive phylogenetic tree that incorporates the majority of species documented in human scientific knowledge and compared it with the taxonomy of prokaryotes. And the performance of CVTree was also compared with multiple sequence alignment-based approaches. Our results revealed that CVTree methods achieve computational speeds 1-3 orders of magnitude faster than conventional alignment methods while maintaining high consistency with established taxonomic relationships, even outperforming some multiple sequence alignment methods. These findings confirm CVTree's effectiveness and efficiency not only for whole-genome evolutionary studies but also for phylogenetic and taxonomic investigations based on genes.
Keywords:  phylogenetic tree      taxonomy      16S rRNA      ratio of entropy reduction  
Received:  29 March 2025      Revised:  21 April 2025      Accepted manuscript online:  07 May 2025
PACS:  87.15.Qt (Sequence analysis)  
  87.18.Wd (Genomics)  
  87.19.lo (Information theory)  
Fund: GHZ thanks theWenzhou Institute, University of Chinese Academy of Sciences (Grant No. WIUCASQD2021042).
Corresponding Authors:  Xiao-Yang Zhi, Guang-Hong Zuo     E-mail:  xyzhi@ynu.edu.cn;ghzuo@ucas.ac.cn

Cite this article: 

Yi-Fei Lu(卢逸飞), Xiao-Yang Zhi(职晓阳), and Guang-Hong Zuo(左光宏) CVTree for 16S rRNA: Constructing taxonomy-compatible all-species living tree effectively and efficiently 2025 Chin. Phys. B 34 088704

[1] Hugenholtz P, Chuvochina M, Oren A, Parks D H and Soo R M 2021 ISME J. 15 1879
[2] Glöckner F O, Yilmaz P, Quast C, Gerken J, Beccati A, Ciuprina A, Bruns G, Yarza P, Peplies J, Westram R and Ludwig W 2017 Journal of Biotechnology 261 169
[3] Fox G E and Woese C R 1977 Proc. Natl. Acad. Sci. USA 74 4537
[4] Albertsen M, Hugenholtz P, Skarshewski A, Nielsen K L, Tyson G W and Nielsen P H 2013 Nat. Biotechnol. 31 533
[5] Degnan P H and Ochman H 2012 ISME J. 6 183
[6] Cole J R, Wang Q, Fish J A, Chai B, McGarrell D M, Sun Y, Brown C T, Porras-Alfaro A, Kuske C R and Tiedje J M 2014 Nucl. Acids Res. 42 D633
[7] DeSantis T Z, Hugenholtz P, Larsen N, Rojas M, Brodie E L, Keller K, Huber T, Dalevi D, Hu P and Andersen G L 2006 Appl. Environ. Microbiol. 72 5069
[8] Pruesse E, Quast C, Knittel K, Fuchs B M, Ludwig W, Peplies J and Glockner F O 2007 Nucleic. Acids Res. 35 7188
[9] Caporaso J G, Kuczynski J, Stombaugh J, et al. 2010 Nat. Methods 7 335
[10] Hugenholtz P 2002 Genome Biol. 3 reviews0003.1
[11] Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, Schweer T, Peplies J, Ludwig W and Glöckner F O 2014 Nucleic Acids Res. 42 D643
[12] Thompson J D, Higgins D G and Gibson T J 1994 Nucleic. Acids Res. 22 4673
[13] Edgar R C 2004 BMC Bioinf. 5 113
[14] Notredame C, Higgins D G and Heringa J 2000 J. Mol. Biol. 302 205
[15] Katoh K 2002 Nucleic. Acids Res. 30 3059
[16] Pais F S M, Ruy P D C, Oliveira G and Coimbra R S 2014 Algorithms Mol. Biol. 9 4
[17] Jones N C and Pevzner P A 2004 An Introduction to Bioinformatics Algorithms (MIT Press)
[18] Bonham-Carter O, Steele J and Bastola D 2014 Briefings Bioinf. 15 890
[19] Qi J, Wang B and Hao B 2004 J. Mol. Evol. 58 1
[20] Qi J, Luo H and Hao B 2004 Nucleic. Acids Res. 32 W45
[21] Zuo G, Xu Z and Hao B 2013 Genomics Proteomics Bioinformatics 11 61
[22] Zuo G and Hao B 2015 Genomics Proteomics Bioinformatics 13 321
[23] Zuo G, Hao B and Staley J T 2014 Antonie van Leeuwenhoek 105 431
[24] Kjaerbolling I, Vesth T C, Frisvad J C, et al. 2018 Proc. Natl. Acad. Sci. USA 115 E753
[25] Wang H, Xu Z, Gao L and Hao B 2009 BMC Evol. Biol. 9 195
[26] Gao L and Qi J 2007 BMC Evol. Biol. 7 41
[27] Chu K H, Qi J, Yu Z G and Anh V 2004 Mol. Biol. Evol. 21 200
[28] Yuan J, Zhu Q and Liu B 2014 PLoS One 9 e84330
[29] Liu J, Wang H, Yang H, Zhang Y, Wang J, Zhao F and Qi J 2013 Nucleic Acids Res. 41 e3
[30] Zhang Q, Wu Y, Wang J, Wu G, Long W, Xue Z, Wang L, Zhang X, Pang X, Zhao Y, Zhao L and Zhang C 2016 Sci. Rep. 6 27572
[31] Zuo G and Hao B 2017 Phylogenetics (IntechOpen) pp. 93–110
[32] Zuo G, Qi J and Hao B 2018 Genomics Proteomics Bioinformatics 16 310
[33] Zuo G, Xu Z, Yu H and Hao B 2010 Genomics Proteomics Bioinformatics 8 262
[34] Yarza P, Richter M, Peplies J, Euzeby J, Amann R, Schleifer K H, LudwigW, Glöckner F O and Rosselló-Móra R 2008 Syst. Appl. Microbiol. 31 241
[35] Ludwig W, Viver T, Westram R, Francisco Gago J, Bustos-Caparros E, Knittel K, Amann R and Rossello-Mora R 2021 Syst. Appl. Microbiol. 44 126218
[36] Trujillo M E, Dedysh S, DeVos P, Hedlund B, Kämpfer P, Rainey F A and Whitman W B 2015 Bergey’s Manual of Systematics of Archaea and Bacteria (Wiley Online Library)
[37] Benson D A, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman D J, Ostell J and Sayers E W 2012 Nucleic. Acids Res. 41 D36
[38] Parte A C 2014 Nucleic. Acids Res. 42 D613
[39] Parte A C, Sardà Carbasse J, Meier-Kolthoff J P, Reimer L C and Göker M 2020 Int. J. Syst. Evol. Microbiol. 70 5607
[40] Zuo G 2021 Genomics Proteomics Bioinformatics 19 662
[41] Zuo G, Li Q and Hao B 2014 Comput. Biol. Chem. 53 Part A 166
[42] Price M N, Dehal P S and Arkin A P 2010 PLoS One 5 e9490
[43] Sievers F, Wilm A, Dineen D, Gibson T J, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson J D and Higgins D G 2011 Mol. Syst. Biol. 7 539
[44] Zuo G 2025 In Preparation
[45] Letunic I and Bork P 2021 Nucleic Acids Res. 49 W293
[46] Parks D H, Chuvochina M, Waite D W, Rinke C, Skarshewski A, Chaumeil P A and Hugenholtz P 2018 Nat. Biotechnol. 36 996
[47] Hug L A, Baker B J, Anantharaman K, Brown C T, Probst A J, Castelle C J, Butterfield C N, Hernsdorf AW, Amano Y, Ise K, Suzuki Y, Dudek N, Relman D A, Finstad K M, Amundson R, Thomas B C and Banfield J F 2016 Nat. Microbiol. 1 16048
[48] Yarza P, Yilmaz P, Pruesse E, Glöckner F O, LudwigW, Schleifer K-H, Whitman W B, Euzéby J, Amann R and Rosselló-Móra R 2014 Nat. Rev. Microbiol. 12 635
[49] Sneath P H A 1993 International Journal of Systematic and Evolutionary Microbiology 43 626
[50] Fox G E, Wisotzkey J D and Jurtshuk P 1992 International Journal of Systematic and Evolutionary Microbiology 42 166
[51] Staley J T 2006 Philos. Trans. R. Soc. B: Biol. Sci. 361 1899
[52] Saitou N and Nei M 1987 Mol. Biol. Evol. 4 406
[1] Computational study of non-catalytic T-loop pocket on CDK proteins for drug development
Huiwen Wang(王慧雯), Kaili Wang(王凯丽), Zeyu Guan(管泽雨), Yiren Jian(简弋人), Ya Jia(贾亚), Fatah Kashanchi, Chen Zeng(曾辰), Yunjie Zhao(赵蕴杰). Chin. Phys. B, 2017, 26(12): 128702.
[2] Online multiple instance regression
Wang Zhi-Gang (王志岗), Zhao Zeng-Shun (赵增顺), Zhang Chang-Shui (张长水). Chin. Phys. B, 2013, 22(9): 098702.
[3] Predicting the subcellular location of apoptosis proteins based on recurrence quantification analysis and the Hilbert–Huang transform
Han Guo-Sheng(韩国胜), Yu Zu-Guo(喻祖国), and Anh Vo . Chin. Phys. B, 2011, 20(10): 100504.
No Suggested Reading articles found!