|
Special Issue:
Featured Column — DATA PAPER
|
|
|
|
Curation and featurization of multiple topological materials databases |
| Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡ |
1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; 2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium; 3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium; 4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China |
|
|
|
|
Abstract The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.
|
Received: 27 October 2025
Revised: 11 December 2025
Accepted manuscript online: 12 December 2025
|
|
PACS:
|
07.05.Kf
|
(Data analysis: algorithms and implementation; data management)
|
|
| Fund: We extend our gratitude to N. Regnault for providing the data from the Topological Materials Database, which was instrumental in the data curation process. Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000). H. W. also acknowledges support from the New Cornerstone Science Foundation through the XPLORER PRIZE. |
Corresponding Authors:
Gian-Marco Rignanese, Hongming Weng
E-mail: gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn
|
Cite this article:
Yuqing He(贺雨晴), Matteo Giantomassi, Gian-Marco Rignanese, and Hongming Weng(翁红明) Curation and featurization of multiple topological materials databases 2026 Chin. Phys. B 35 050701
|
[1] Thouless D J, Kohmoto M, NightingaleMP and den Nijs M 1982 Phys. Rev. Lett. 49 405 [2] Tsui D C, Stormer H L and Gossard A C 1982 Phys. Rev. Lett. 48 1559 [3] Laughlin R B 1983 Phys. Rev. Lett. 50 1395 [4] Kane C L and Mele E J 2005 Phys. Rev. Lett. 95 146802 [5] Hasan M Z and Kane C L 2010 Rev. Mod. Phys. 82 3045 [6] Qi X L and Zhang S C 2011 Rev. Mod. Phys. 83 1057 [7] Bansil A, Lin H and Das T 2016 Rev. Mod. Phys. 88 021004 [8] Kitaev A 2009 Periodic table for topological insulators and superconductors, AIP conference proceedings (American Institute of Physics) Vol. 1134 pp. 22-30 [9] Konig M, Wiedmann S, Brune C, Roth A, Buhmann H, Molenkamp L W, Qi X L and Zhang S C 2007 Science 318 766 [10] Fu L, Kane C L and Mele E J 2007 Phys. Rev. Lett. 98 106803 [11] Zhang H, Liu C X, Qi X L, Dai X, Fang Z and Zhang S C 2009 Nat. Phys. 5 438 [12] Xia Y, Qian D, Hsieh D, Wray L, Pal A, Lin H, Bansil A, Grauer D, Hor Y S, Cava R J, et al. 2009 Nat. Phys. 5 398 [13] Chen Y, Analytis J G, Chu J H, Liu Z, Mo S K, Qi X L, Zhang H, Lu D, Dai X, Fang Z, et al. 2009 Science 325 178 [14] Benalcazar W A, Bernevig B A and Hughes T L 2017 Science 357 61 [15] Wan X, Turner A M, Vishwanath A and Savrasov S Y 2011 Phys. Rev. B 83 205101 [16] Wang Z, Sun Y, Chen X Q, Franchini C, Xu G, Weng H, Dai X and Fang Z 2012 Phys. Rev. B 85 195320 [17] Young S M, Zaheer S, Teo J C, Kane C L, Mele E J and Rappe A M 2012 Phys. Rev. Lett. 108 140405 [18] Wang Z, Weng H, Wu Q, Dai X and Fang Z 2013 Phys. Rev. B 88 125427 [19] Liu Z, Jiang J, Zhou B, Wang Z, Zhang Y, Weng H, Prabhakaran D, Mo S K, Peng H, Dudin P, et al. 2014 Nat. Mater. 13 677 [20] Lv B, Muff S, Qian T, Song Z, Nie S, Xu N, Richard P, Matt C E, Plumb N C, Zhao L, et al. 2015 Phys. Rev. Lett. 115 217601 [21] Hohenberg P and Kohn W 1964 Phys. Rev. 136 B864 [22] Kohn W and Sham L J 1965 Phys. Rev. 140 A1133 [23] Bradlyn B, Elcoro L, Cano J, Vergniory M G,Wang Z, Felser C, Aroyo M I and Bernevig B A 2017 Nature 547 298 [24] Hellenbrandt M 2004 Crystallogr. Rev. 10 17 [25] Jain A, Ong S, Hautier G, Chen W, Richards W, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K 2013 APL Mater. 1 011002 [26] Zhang T, Jiang Y, Song Z, Huang H, He Y, Fang Z, Weng H and Fang C 2019 Nature 566 475 [27] Vergniory M G, Elcoro L, Felser C, Regnault N, Bernevig B A and Wang Z 2019 Nature 566 480 [28] Tang F, Po H C, Vishwanath A and Wan X 2019 Nature 566 486 [29] Vergniory M G, Wieder B J, Elcoro L, Parkin S S, Felser C, Bernevig B A and Regnault N 2022 Science 376 eabg9094 [30] Watanabe H, Po H C and Vishwanath A 2018 Science Advances 4 aat8685 [31] Elcoro L,Wieder B J, Song Z, Xu Y, Bradlyn B and Bernevig B A 2021 Nat. Commun. 12 5965 [32] Peng B, Jiang Y, Fang Z, Weng H and Fang C 2022 Phys. Rev. B 105 235138 [33] Samuel A L 1959 IBM J. Res. Dev. 3 210 [34] Zhang Y and Kim E A 2017 Phys. Rev. Lett. 118 216401 [35] Zhang Y, Ginsparg P and Kim E A 2020 Phys. Rev. Res. 2 023283 [36] Zhang P, Shen H and Zhai H 2018 Phys. Rev. Lett. 120 066401 [37] Scheurer M S and Slager R J 2020 Phys. Rev. Lett. 124 226401 [38] Donoho D 2006 IEEE Transactions on Information Theory 52 1289 [39] Acosta C M, Ouyang R, Fazzio A, Scheffler M, Ghiringhelli L M and Carbogno C 2018 arXiv 1805.10950 [40] Ouyang R, Curtarolo S, Ahmetcik E, SchefflerMand Ghiringhelli LM 2018 Phys. Rev. Mater. 2 083802 [41] Cao G, Ouyang R, Ghiringhelli L M, Scheffler M, Liu H, Carbogno C and Zhang Z 2020 Phys. Rev. Mater. 4 034204 [42] Liu J, Cao G, Zhou Z and Liu H 2021 J. Phys. Condens. Matter 33 325501 [43] Claussen N, Bernevig B A and Regnault N 2020 Phys. Rev. B 101 245117 [44] Marrazzo A, Gibertini M, Campi D, Mounet N and Marzari N 2019 Nano Lett. 19 8431 [45] Haastrup S, Strange M, Pandey M, Deilmann T, Schmidt P S, Hinsche N F, Gjerding M N, Torelli D, Larsen P M, Riis-Jensen A C, et al. 2018 2D Mater. 5 042002 [46] Zhou J, Shen L, Costa M D, Persson K A, Ong S P, Huck P, Lu Y, Ma X, Chen Y, Tang H, et al. 2019 Scientific Data 6 86 [47] Olsen T, Andersen E, Okugawa T, Torelli D, Deilmann T and Thygesen K S 2019 Phys. Rev. Mater. 3 024005 [48] Wang D, Tang F, Ji J, Zhang W, Vishwanath A, Po H C and Wan X 2019 Phys. Rev. B 100 195108 [49] Schleder G R, Focassio B and Fazzio A 2021 Appl. Phys. Rev. 8 031409 [50] Andrejevic N, Andrejevic J, Bernevig B A, Regnault N, Han F, Fabbris G, Nguyen T, Drucker N C, Rycroft C H and Li M 2022 Adv. Mater. 34 2204113 [51] Ma A, Zhang Y, Christensen T, Po H C, Jing L, Fu L and Soljacic M 2023 Nano Lett. 23 772 [52] Wu Q S, Autes G, Mounet N and Yazyev O V 2019 TopoMat: A Database of High-Throughput First-Principles Calculations of Topological Materials, Materials Cloud Archive 2019.0019/v1 [53] He Y, De Breuck P P, Weng H, Giantomassi M and Rignanese G M 2025 npj Computational Materials 11 181 [54] Ong S P, RichardsWD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Comput. Mater. Sci. 68 314 [55] Ho T K 1995 Random decision forests, Proceedings of 3rd International Conference on Document Analysis and Recognition (Montreal, Canada, 14-16 August 1995) (IEEE) pp. 278-282 [56] McCulloch W S and Pitts W 1943 The bulletin of mathematical biophysics 5 115 [57] Knobbe A J, Siebes A, van der Wallen D 1999 Multi-relational Decision Tree Induction In: · Zytkow J M, Rauch J (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999 Lecture Notes in Computer Science, Vol. 1704 (Berlin: Springer) [58] Barnard J and Meng X L 1999 Statistical methods in medical research 8 17 [59] Ward L, Dunn A, Faghaninia A, Zimmermann N E, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M, Chard K, Asta M, Persson K A, Snyder G J, Foster I and Jain A 2018 Comput. Mater. Sci. 152 60 [60] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput. Mater. 7 83 [61] De Breuck P P 2025 Vibrational properties of solids: a machine learning approach (UCLouvain) [62] Talirz L, Kumbhar S, Passaro E, Yakutovich A V, Granata V, Gargiulo F, Borelli M, Uhrin M, Huber S P, Zoupanos S, Adorf C S, Andersen C W, Schütt O, Pignedoli C A, Passerone D, VandeVondele J, Schulthess T C, Smit B, Pizzi G and Marzari N 2020 Scientific Data 7 [63] Fraux G, Cersonsky R and Ceriotti M 2020 Journal of Open Source Software 5 2117 [64] Ye C, Wang Y, Xie X, Zhu T, Liu J, He Y, Zhang L, Zhang J, Fang Z, Wang L, et al. 2025 npj Computational Materials 12 63 [65] Kresse G and Furthmüller J 1996 Computational Materials Science 6 15 [66] He Y, Jiang Y, Zhang T, Huang H, Fang C and Jin Z 2019 Chin. Phys. B 28 087102 [67] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 77 3865 [68] Wang Z 2021 Vasp2trace https://github.com/zjwang11/irvsp [69] Vergniory M Check topological mat https://www.cryst.ehu.es/cgibin/cryst/programs/magnetictopo.pl?tipog=gesp |
| No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|