|
|
|
Curation and featurization of multiple topological materials databases |
| Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡ |
1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; 2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium; 3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium; 4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China |
|
|
|
|
Abstract The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.
|
Received: 27 October 2025
Revised: 11 December 2025
Accepted manuscript online: 12 December 2025
|
|
PACS:
|
07.05.Kf
|
(Data analysis: algorithms and implementation; data management)
|
|
| Fund: Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000). |
Corresponding Authors:
Gian-Marco Rignanese,E-mail:gian-marco.rignanese@uclouvain.be;Hongming Weng,E-mail:hmweng@iphy.ac.cn
E-mail: gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn
|
Cite this article:
Yuqing He(贺雨晴), Matteo Giantomassi, Gian-Marco Rignanese, and Hongming Weng(翁红明) Curation and featurization of multiple topological materials databases 2026 Chin. Phys. B 35 050701
|
[1] Thouless D J, Kohmoto M, Nightingale M P and den Nijs M 1982 Phys. Rev. Lett. 49405 [2] Tsui D C, Stormer H L and Gossard A C 1982 Phys. Rev. Lett. 481559 [3] Laughlin R B 1983 Phys. Rev. Lett. 501395 [4] Kane C L and Mele E J 2005 Phys. Rev. Lett. 95146802 [5] Hasan M Z and Kane C L 2010 Rev. Mod. Phys. 823045 [6] Qi X L and Zhang S C 2011 Rev. Mod. Phys. 831057 [7] Bansil A, Lin H and Das T 2016 Rev. Mod. Phys. 88021004 [8] Kitaev A 2009 Periodic table for topological insulators and superconductors, AIP conference proceedings (American Institute of Physics) Vol. 1134 pp. 22–30 [9] Konig M, Wiedmann S, Brune C, Roth A, Buhmann H, Molenkamp L W, Qi X L and Zhang S C 2007 Science 318766 [10] Fu L, Kane C L and Mele E J 2007 Phys. Rev. Lett. 98106803 [11] Zhang H, Liu C X, Qi X L, Dai X, Fang Z and Zhang S C 2009 Nat. Phys. 5438 [12] Xia Y, Qian D, Hsieh D, Wray L, Pal A, Lin H, Bansil A, Grauer D, Hor Y S, Cava R J, et al. 2009 Nat. Phys. 5398 [13] Chen Y, Analytis J G, Chu J H, Liu Z, Mo S K, Qi X L, Zhang H, Lu D, Dai X, Fang Z, et al. 2009 Science 325178 [14] Benalcazar W A, Bernevig B A and Hughes T L 2017 Science 35761 [15] Wan X, Turner A M, Vishwanath A and Savrasov S Y 2011 Phys. Rev. B 83205101 [16] Wang Z, Sun Y, Chen X Q, Franchini C, Xu G, Weng H, Dai X and Fang Z 2012 Phys. Rev. B 85195320 [17] Young S M, Zaheer S, Teo J C, Kane C L, Mele E J and Rappe A M 2012 Phys. Rev. Lett. 108140405 [18] Wang Z, Weng H, Wu Q, Dai X and Fang Z 2013 Phys. Rev. B 88125427 [19] Liu Z, Jiang J, Zhou B, Wang Z, Zhang Y, Weng H, Prabhakaran D, Mo S K, Peng H, Dudin P, et al. 2014 Nat. Mater. 13677 [20] Lv B, Muff S, Qian T, Song Z, Nie S, Xu N, Richard P, Matt C E, Plumb N C, Zhao L, et al. 2015 Phys. Rev. Lett. 115217601 [21] Hohenberg P and Kohn W 1964 Phys. Rev. 136 B864 [22] Kohn W and Sham L J 1965 Phys. Rev. 140 A1133 [23] Bradlyn B, Elcoro L, Cano J, Vergniory M G, Wang Z, Felser C, Aroyo M I and Bernevig B A 2017 Nature 547298 [24] Hellenbrandt M 2004 Crystallogr. Rev. 1017 [25] Jain A, Ong S, Hautier G, Chen W, Richards W, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K 2013 APL Mater. 1011002 [26] Zhang T, Jiang Y, Song Z, Huang H, He Y, Fang Z, Weng H and Fang C 2019 Nature 566475 [27] Vergniory M G, Elcoro L, Felser C, Regnault N, Bernevig B A and Wang Z 2019 Nature 566480 [28] Tang F, Po H C, Vishwanath A and Wan X 2019 Nature 566486 [29] Vergniory M G, Wieder B J, Elcoro L, Parkin S S, Felser C, Bernevig B A and Regnault N 2022 Science 376 eabg9094 [30] Watanabe H, Po H C and Vishwanath A 2018 Science Advances 4 aat8685 [31] Elcoro L, Wieder B J, Song Z, Xu Y, Bradlyn B and Bernevig B A 2021 Nat. Commun. 125965 [32] Peng B, Jiang Y, Fang Z, Weng H and Fang C 2022 Phys. Rev. B 105235138 [33] Samuel A L 1959 IBM J. Res. Dev. 3210 [34] Zhang Y and Kim E A 2017 Phys. Rev. Lett. 118216401 [35] Zhang Y, Ginsparg P and Kim E A 2020 Phys. Rev. Res. 2023283 [36] Zhang P, Shen H and Zhai H 2018 Phys. Rev. Lett. 120066401 [37] Scheurer M S and Slager R J 2020 Phys. Rev. Lett. 124226401 [38] Donoho D 2006 IEEE Transactions on Information Theory 521289 [39] Acosta C M, Ouyang R, Fazzio A, Scheffler M, Ghiringhelli L M and Carbogno C 2018 arXiv 1805.10950 [40] Ouyang R, Curtarolo S, Ahmetcik E, Scheffler M and Ghiringhelli L M 2018 Phys. Rev. Mater. 2083802 [41] Cao G, Ouyang R, Ghiringhelli L M, Scheffler M, Liu H, Carbogno C and Zhang Z 2020 Phys. Rev. Mater. 4034204 [42] Liu J, Cao G, Zhou Z and Liu H 2021 J. Phys. Condens. Matter 33325501 [43] Claussen N, Bernevig B A and Regnault N 2020 Phys. Rev. B 101245117 [44] Marrazzo A, Gibertini M, Campi D, Mounet N and Marzari N 2019 Nano Lett. 198431 [45] Haastrup S, Strange M, Pandey M, Deilmann T, Schmidt P S, Hinsche N F, Gjerding M N, Torelli D, Larsen P M, Riis-Jensen A C, et al. 20182D Mater. 5042002 [46] Zhou J, Shen L, Costa M D, Persson K A, Ong S P, Huck P, Lu Y, Ma X, Chen Y, Tang H, et al. 2019 Scientific Data 686 [47] Olsen T, Andersen E, Okugawa T, Torelli D, Deilmann T and Thygesen K S 2019 Phys. Rev. Mater. 3024005 [48] Wang D, Tang F, Ji J, Zhang W, Vishwanath A, Po H C and Wan X 2019 Phys. Rev. B 100195108 [49] Schleder G R, Focassio B and Fazzio A 2021 Appl. Phys. Rev. 8031409 [50] Andrejevic N, Andrejevic J, Bernevig B A, Regnault N, Han F, Fabbris G, Nguyen T, Drucker N C, Rycroft C H and Li M 2022 Adv. Mater. 342204113 [51] Ma A, Zhang Y, Christensen T, Po H C, Jing L, Fu L and Soljacic M 2023 Nano Lett. 23772 [52] Wu Q S, Autes G, Mounet N and Yazyev O V 2019 TopoMat: A Database of High-Throughput First-Principles Calculations of Topological Materials, Materials Cloud Archive 2019.0019/v1 [53] He Y, De Breuck P P, Weng H, Giantomassi M and Rignanese G M 2025 npj Computational Materials 11181 [54] Ong S P, Richards W D, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Comput. Mater. Sci. 68314 [55] Ho T K 1995 Random decision forests, Proceedings of 3rd International Conference on Document Analysis and Recognition (Montreal, Canada, 14–16 August 1995) (IEEE) pp. 278–282 [56] McCulloch W S and Pitts W 1943 The bulletin of mathematical biophysics 5115 [57] Knobbe A J, Siebes A, van der Wallen D 1999 Multi-relational Decision Tree Induction In: Zytkow J M, Rauch J (eds) · Principles of Data Mining and Knowledge Discovery. PKDD 1999 Lecture Notes in Computer Science, Vol. 1704(Berlin: Springer) [58] Barnard J and Meng X L 1999 Statistical methods in medical research 817 [59] Ward L, Dunn A, Faghaninia A, Zimmermann N E, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M, Chard K, Asta M, Persson K A, Snyder G J, Foster I and Jain A 2018 Comput. Mater. Sci. 15260 [60] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput. Mater. 783 [61] De Breuck P P 2025 Vibrational properties of solids: a machine learning approach (UCLouvain) [62] Talirz L, Kumbhar S, Passaro E, Yakutovich A V, Granata V, Gargiulo F, Borelli M, Uhrin M, Huber S P, Zoupanos S, Adorf C S, Andersen C W, Schutt O, Pignedoli C A, Passerone D, VandeVondele J, Schulthess T C, Smit B, Pizzi G and Marzari N 2020 Scientific Data 7 [63] Fraux G, Cersonsky R and Ceriotti M 2020 Journal of Open Source Software 52117 [64] Ye C, Wang Y, Xie X, Zhu T, Liu J, He Y, Zhang L, Zhang J, Fang Z, Wang L, et al. 2025 npj Computational Materials 1263 [65] Kresse G and Furthmuller J 1996 Computational Materials Science 615 [66] He Y, Jiang Y, Zhang T, Huang H, Fang C and Jin Z 2019 Chin. Phys. B 28087102 [67] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 773865 [68] Wang Z 2021 Vasp2trace https://github.com/zjwang11/irvsp [69] Vergniory M Check topological mat https://www.cryst.ehu.es/cgibin/cryst/programs/magnetictopo.pl?tipog=gesp |
| No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|