|
Special Issue:
|
| SPECIAL TOPIC — A celebration of the 90th Anniversary of the Birth of Bolin Hao |
Prev
Next
|
|
|
CVTree for 16S rRNA: Constructing taxonomy-compatible all-species living tree effectively and efficiently |
| Yi-Fei Lu(卢逸飞)2, Xiao-Yang Zhi(职晓阳)2,†, and Guang-Hong Zuo(左光宏)1,‡ |
1 Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China; 2 Yunnan Institute of Microbiology, Key Laboratory of Microbial Diversity in Southwest China of Ministry of Education, School of Life Sciences, Yunnan University, Kunming 650091, China |
|
|
|
|
Abstract The composition vector tree (CVTree) method, developed under the leadership of Professor Hao Bailin, is an alignment-free algorithm for constructing phylogenetic trees. Although initially designed for studying prokaryotic evolution based on whole-genome, it has demonstrated broad applicability across diverse biological systems and gene sequences. In this study, we employed two methods, InterList and Hao, of CVTree to investigate the phylogeny and taxonomy of prokaryote based on the 16S rRNA sequences from All-Species Living Tree Project. We have established a comprehensive phylogenetic tree that incorporates the majority of species documented in human scientific knowledge and compared it with the taxonomy of prokaryotes. And the performance of CVTree was also compared with multiple sequence alignment-based approaches. Our results revealed that CVTree methods achieve computational speeds 1-3 orders of magnitude faster than conventional alignment methods while maintaining high consistency with established taxonomic relationships, even outperforming some multiple sequence alignment methods. These findings confirm CVTree's effectiveness and efficiency not only for whole-genome evolutionary studies but also for phylogenetic and taxonomic investigations based on genes.
|
Received: 29 March 2025
Revised: 21 April 2025
Accepted manuscript online: 07 May 2025
|
|
PACS:
|
87.15.Qt
|
(Sequence analysis)
|
| |
87.18.Wd
|
(Genomics)
|
| |
87.19.lo
|
(Information theory)
|
|
| Fund: GHZ thanks theWenzhou Institute, University of Chinese Academy of Sciences (Grant No. WIUCASQD2021042). |
Corresponding Authors:
Xiao-Yang Zhi, Guang-Hong Zuo
E-mail: xyzhi@ynu.edu.cn;ghzuo@ucas.ac.cn
|
Cite this article:
Yi-Fei Lu(卢逸飞), Xiao-Yang Zhi(职晓阳), and Guang-Hong Zuo(左光宏) CVTree for 16S rRNA: Constructing taxonomy-compatible all-species living tree effectively and efficiently 2025 Chin. Phys. B 34 088704
|
[1] Hugenholtz P, Chuvochina M, Oren A, Parks D H and Soo R M 2021 ISME J. 15 1879 [2] Glöckner F O, Yilmaz P, Quast C, Gerken J, Beccati A, Ciuprina A, Bruns G, Yarza P, Peplies J, Westram R and Ludwig W 2017 Journal of Biotechnology 261 169 [3] Fox G E and Woese C R 1977 Proc. Natl. Acad. Sci. USA 74 4537 [4] Albertsen M, Hugenholtz P, Skarshewski A, Nielsen K L, Tyson G W and Nielsen P H 2013 Nat. Biotechnol. 31 533 [5] Degnan P H and Ochman H 2012 ISME J. 6 183 [6] Cole J R, Wang Q, Fish J A, Chai B, McGarrell D M, Sun Y, Brown C T, Porras-Alfaro A, Kuske C R and Tiedje J M 2014 Nucl. Acids Res. 42 D633 [7] DeSantis T Z, Hugenholtz P, Larsen N, Rojas M, Brodie E L, Keller K, Huber T, Dalevi D, Hu P and Andersen G L 2006 Appl. Environ. Microbiol. 72 5069 [8] Pruesse E, Quast C, Knittel K, Fuchs B M, Ludwig W, Peplies J and Glockner F O 2007 Nucleic. Acids Res. 35 7188 [9] Caporaso J G, Kuczynski J, Stombaugh J, et al. 2010 Nat. Methods 7 335 [10] Hugenholtz P 2002 Genome Biol. 3 reviews0003.1 [11] Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, Schweer T, Peplies J, Ludwig W and Glöckner F O 2014 Nucleic Acids Res. 42 D643 [12] Thompson J D, Higgins D G and Gibson T J 1994 Nucleic. Acids Res. 22 4673 [13] Edgar R C 2004 BMC Bioinf. 5 113 [14] Notredame C, Higgins D G and Heringa J 2000 J. Mol. Biol. 302 205 [15] Katoh K 2002 Nucleic. Acids Res. 30 3059 [16] Pais F S M, Ruy P D C, Oliveira G and Coimbra R S 2014 Algorithms Mol. Biol. 9 4 [17] Jones N C and Pevzner P A 2004 An Introduction to Bioinformatics Algorithms (MIT Press) [18] Bonham-Carter O, Steele J and Bastola D 2014 Briefings Bioinf. 15 890 [19] Qi J, Wang B and Hao B 2004 J. Mol. Evol. 58 1 [20] Qi J, Luo H and Hao B 2004 Nucleic. Acids Res. 32 W45 [21] Zuo G, Xu Z and Hao B 2013 Genomics Proteomics Bioinformatics 11 61 [22] Zuo G and Hao B 2015 Genomics Proteomics Bioinformatics 13 321 [23] Zuo G, Hao B and Staley J T 2014 Antonie van Leeuwenhoek 105 431 [24] Kjaerbolling I, Vesth T C, Frisvad J C, et al. 2018 Proc. Natl. Acad. Sci. USA 115 E753 [25] Wang H, Xu Z, Gao L and Hao B 2009 BMC Evol. Biol. 9 195 [26] Gao L and Qi J 2007 BMC Evol. Biol. 7 41 [27] Chu K H, Qi J, Yu Z G and Anh V 2004 Mol. Biol. Evol. 21 200 [28] Yuan J, Zhu Q and Liu B 2014 PLoS One 9 e84330 [29] Liu J, Wang H, Yang H, Zhang Y, Wang J, Zhao F and Qi J 2013 Nucleic Acids Res. 41 e3 [30] Zhang Q, Wu Y, Wang J, Wu G, Long W, Xue Z, Wang L, Zhang X, Pang X, Zhao Y, Zhao L and Zhang C 2016 Sci. Rep. 6 27572 [31] Zuo G and Hao B 2017 Phylogenetics (IntechOpen) pp. 93–110 [32] Zuo G, Qi J and Hao B 2018 Genomics Proteomics Bioinformatics 16 310 [33] Zuo G, Xu Z, Yu H and Hao B 2010 Genomics Proteomics Bioinformatics 8 262 [34] Yarza P, Richter M, Peplies J, Euzeby J, Amann R, Schleifer K H, LudwigW, Glöckner F O and Rosselló-Móra R 2008 Syst. Appl. Microbiol. 31 241 [35] Ludwig W, Viver T, Westram R, Francisco Gago J, Bustos-Caparros E, Knittel K, Amann R and Rossello-Mora R 2021 Syst. Appl. Microbiol. 44 126218 [36] Trujillo M E, Dedysh S, DeVos P, Hedlund B, Kämpfer P, Rainey F A and Whitman W B 2015 Bergey’s Manual of Systematics of Archaea and Bacteria (Wiley Online Library) [37] Benson D A, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman D J, Ostell J and Sayers E W 2012 Nucleic. Acids Res. 41 D36 [38] Parte A C 2014 Nucleic. Acids Res. 42 D613 [39] Parte A C, Sardà Carbasse J, Meier-Kolthoff J P, Reimer L C and Göker M 2020 Int. J. Syst. Evol. Microbiol. 70 5607 [40] Zuo G 2021 Genomics Proteomics Bioinformatics 19 662 [41] Zuo G, Li Q and Hao B 2014 Comput. Biol. Chem. 53 Part A 166 [42] Price M N, Dehal P S and Arkin A P 2010 PLoS One 5 e9490 [43] Sievers F, Wilm A, Dineen D, Gibson T J, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson J D and Higgins D G 2011 Mol. Syst. Biol. 7 539 [44] Zuo G 2025 In Preparation [45] Letunic I and Bork P 2021 Nucleic Acids Res. 49 W293 [46] Parks D H, Chuvochina M, Waite D W, Rinke C, Skarshewski A, Chaumeil P A and Hugenholtz P 2018 Nat. Biotechnol. 36 996 [47] Hug L A, Baker B J, Anantharaman K, Brown C T, Probst A J, Castelle C J, Butterfield C N, Hernsdorf AW, Amano Y, Ise K, Suzuki Y, Dudek N, Relman D A, Finstad K M, Amundson R, Thomas B C and Banfield J F 2016 Nat. Microbiol. 1 16048 [48] Yarza P, Yilmaz P, Pruesse E, Glöckner F O, LudwigW, Schleifer K-H, Whitman W B, Euzéby J, Amann R and Rosselló-Móra R 2014 Nat. Rev. Microbiol. 12 635 [49] Sneath P H A 1993 International Journal of Systematic and Evolutionary Microbiology 43 626 [50] Fox G E, Wisotzkey J D and Jurtshuk P 1992 International Journal of Systematic and Evolutionary Microbiology 42 166 [51] Staley J T 2006 Philos. Trans. R. Soc. B: Biol. Sci. 361 1899 [52] Saitou N and Nei M 1987 Mol. Biol. Evol. 4 406 |
| No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|