Please wait a minute...
Chin. Phys. B, 2026, Vol. 35(5): 050701    DOI: 10.1088/1674-1056/ae2bf2
DATA PAPER Prev   Next  

Curation and featurization of multiple topological materials databases

Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡
1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China;
2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium;
3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium;
4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
Abstract  The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.
Keywords:  data      topological materials      machine learning  
Received:  27 October 2025      Revised:  11 December 2025      Accepted manuscript online:  12 December 2025
PACS:  07.05.Kf (Data analysis: algorithms and implementation; data management)  
Fund: Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000).
Corresponding Authors:  Gian-Marco Rignanese,E-mail:gian-marco.rignanese@uclouvain.be;Hongming Weng,E-mail:hmweng@iphy.ac.cn     E-mail:  gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn

Cite this article: 

Yuqing He(贺雨晴), Matteo Giantomassi, Gian-Marco Rignanese, and Hongming Weng(翁红明) Curation and featurization of multiple topological materials databases 2026 Chin. Phys. B 35 050701

[1] Thouless D J, Kohmoto M, Nightingale M P and den Nijs M 1982 Phys. Rev. Lett. 49405
[2] Tsui D C, Stormer H L and Gossard A C 1982 Phys. Rev. Lett. 481559
[3] Laughlin R B 1983 Phys. Rev. Lett. 501395
[4] Kane C L and Mele E J 2005 Phys. Rev. Lett. 95146802
[5] Hasan M Z and Kane C L 2010 Rev. Mod. Phys. 823045
[6] Qi X L and Zhang S C 2011 Rev. Mod. Phys. 831057
[7] Bansil A, Lin H and Das T 2016 Rev. Mod. Phys. 88021004
[8] Kitaev A 2009 Periodic table for topological insulators and superconductors, AIP conference proceedings (American Institute of Physics) Vol. 1134 pp. 22–30
[9] Konig M, Wiedmann S, Brune C, Roth A, Buhmann H, Molenkamp L W, Qi X L and Zhang S C 2007 Science 318766
[10] Fu L, Kane C L and Mele E J 2007 Phys. Rev. Lett. 98106803
[11] Zhang H, Liu C X, Qi X L, Dai X, Fang Z and Zhang S C 2009 Nat. Phys. 5438
[12] Xia Y, Qian D, Hsieh D, Wray L, Pal A, Lin H, Bansil A, Grauer D, Hor Y S, Cava R J, et al. 2009 Nat. Phys. 5398
[13] Chen Y, Analytis J G, Chu J H, Liu Z, Mo S K, Qi X L, Zhang H, Lu D, Dai X, Fang Z, et al. 2009 Science 325178
[14] Benalcazar W A, Bernevig B A and Hughes T L 2017 Science 35761
[15] Wan X, Turner A M, Vishwanath A and Savrasov S Y 2011 Phys. Rev. B 83205101
[16] Wang Z, Sun Y, Chen X Q, Franchini C, Xu G, Weng H, Dai X and Fang Z 2012 Phys. Rev. B 85195320
[17] Young S M, Zaheer S, Teo J C, Kane C L, Mele E J and Rappe A M 2012 Phys. Rev. Lett. 108140405
[18] Wang Z, Weng H, Wu Q, Dai X and Fang Z 2013 Phys. Rev. B 88125427
[19] Liu Z, Jiang J, Zhou B, Wang Z, Zhang Y, Weng H, Prabhakaran D, Mo S K, Peng H, Dudin P, et al. 2014 Nat. Mater. 13677
[20] Lv B, Muff S, Qian T, Song Z, Nie S, Xu N, Richard P, Matt C E, Plumb N C, Zhao L, et al. 2015 Phys. Rev. Lett. 115217601
[21] Hohenberg P and Kohn W 1964 Phys. Rev. 136 B864
[22] Kohn W and Sham L J 1965 Phys. Rev. 140 A1133
[23] Bradlyn B, Elcoro L, Cano J, Vergniory M G, Wang Z, Felser C, Aroyo M I and Bernevig B A 2017 Nature 547298
[24] Hellenbrandt M 2004 Crystallogr. Rev. 1017
[25] Jain A, Ong S, Hautier G, Chen W, Richards W, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K 2013 APL Mater. 1011002
[26] Zhang T, Jiang Y, Song Z, Huang H, He Y, Fang Z, Weng H and Fang C 2019 Nature 566475
[27] Vergniory M G, Elcoro L, Felser C, Regnault N, Bernevig B A and Wang Z 2019 Nature 566480
[28] Tang F, Po H C, Vishwanath A and Wan X 2019 Nature 566486
[29] Vergniory M G, Wieder B J, Elcoro L, Parkin S S, Felser C, Bernevig B A and Regnault N 2022 Science 376 eabg9094
[30] Watanabe H, Po H C and Vishwanath A 2018 Science Advances 4 aat8685
[31] Elcoro L, Wieder B J, Song Z, Xu Y, Bradlyn B and Bernevig B A 2021 Nat. Commun. 125965
[32] Peng B, Jiang Y, Fang Z, Weng H and Fang C 2022 Phys. Rev. B 105235138
[33] Samuel A L 1959 IBM J. Res. Dev. 3210
[34] Zhang Y and Kim E A 2017 Phys. Rev. Lett. 118216401
[35] Zhang Y, Ginsparg P and Kim E A 2020 Phys. Rev. Res. 2023283
[36] Zhang P, Shen H and Zhai H 2018 Phys. Rev. Lett. 120066401
[37] Scheurer M S and Slager R J 2020 Phys. Rev. Lett. 124226401
[38] Donoho D 2006 IEEE Transactions on Information Theory 521289
[39] Acosta C M, Ouyang R, Fazzio A, Scheffler M, Ghiringhelli L M and Carbogno C 2018 arXiv 1805.10950
[40] Ouyang R, Curtarolo S, Ahmetcik E, Scheffler M and Ghiringhelli L M 2018 Phys. Rev. Mater. 2083802
[41] Cao G, Ouyang R, Ghiringhelli L M, Scheffler M, Liu H, Carbogno C and Zhang Z 2020 Phys. Rev. Mater. 4034204
[42] Liu J, Cao G, Zhou Z and Liu H 2021 J. Phys. Condens. Matter 33325501
[43] Claussen N, Bernevig B A and Regnault N 2020 Phys. Rev. B 101245117
[44] Marrazzo A, Gibertini M, Campi D, Mounet N and Marzari N 2019 Nano Lett. 198431
[45] Haastrup S, Strange M, Pandey M, Deilmann T, Schmidt P S, Hinsche N F, Gjerding M N, Torelli D, Larsen P M, Riis-Jensen A C, et al. 20182D Mater. 5042002
[46] Zhou J, Shen L, Costa M D, Persson K A, Ong S P, Huck P, Lu Y, Ma X, Chen Y, Tang H, et al. 2019 Scientific Data 686
[47] Olsen T, Andersen E, Okugawa T, Torelli D, Deilmann T and Thygesen K S 2019 Phys. Rev. Mater. 3024005
[48] Wang D, Tang F, Ji J, Zhang W, Vishwanath A, Po H C and Wan X 2019 Phys. Rev. B 100195108
[49] Schleder G R, Focassio B and Fazzio A 2021 Appl. Phys. Rev. 8031409
[50] Andrejevic N, Andrejevic J, Bernevig B A, Regnault N, Han F, Fabbris G, Nguyen T, Drucker N C, Rycroft C H and Li M 2022 Adv. Mater. 342204113
[51] Ma A, Zhang Y, Christensen T, Po H C, Jing L, Fu L and Soljacic M 2023 Nano Lett. 23772
[52] Wu Q S, Autes G, Mounet N and Yazyev O V 2019 TopoMat: A Database of High-Throughput First-Principles Calculations of Topological Materials, Materials Cloud Archive 2019.0019/v1
[53] He Y, De Breuck P P, Weng H, Giantomassi M and Rignanese G M 2025 npj Computational Materials 11181
[54] Ong S P, Richards W D, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Comput. Mater. Sci. 68314
[55] Ho T K 1995 Random decision forests, Proceedings of 3rd International Conference on Document Analysis and Recognition (Montreal, Canada, 14–16 August 1995) (IEEE) pp. 278–282
[56] McCulloch W S and Pitts W 1943 The bulletin of mathematical biophysics 5115
[57] Knobbe A J, Siebes A, van der Wallen D 1999 Multi-relational Decision Tree Induction In: Zytkow J M, Rauch J (eds) · Principles of Data Mining and Knowledge Discovery. PKDD 1999 Lecture Notes in Computer Science, Vol. 1704(Berlin: Springer)
[58] Barnard J and Meng X L 1999 Statistical methods in medical research 817
[59] Ward L, Dunn A, Faghaninia A, Zimmermann N E, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M, Chard K, Asta M, Persson K A, Snyder G J, Foster I and Jain A 2018 Comput. Mater. Sci. 15260
[60] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput. Mater. 783
[61] De Breuck P P 2025 Vibrational properties of solids: a machine learning approach (UCLouvain)
[62] Talirz L, Kumbhar S, Passaro E, Yakutovich A V, Granata V, Gargiulo F, Borelli M, Uhrin M, Huber S P, Zoupanos S, Adorf C S, Andersen C W, Schutt O, Pignedoli C A, Passerone D, VandeVondele J, Schulthess T C, Smit B, Pizzi G and Marzari N 2020 Scientific Data 7
[63] Fraux G, Cersonsky R and Ceriotti M 2020 Journal of Open Source Software 52117
[64] Ye C, Wang Y, Xie X, Zhu T, Liu J, He Y, Zhang L, Zhang J, Fang Z, Wang L, et al. 2025 npj Computational Materials 1263
[65] Kresse G and Furthmuller J 1996 Computational Materials Science 615
[66] He Y, Jiang Y, Zhang T, Huang H, Fang C and Jin Z 2019 Chin. Phys. B 28087102
[67] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 773865
[68] Wang Z 2021 Vasp2trace https://github.com/zjwang11/irvsp
[69] Vergniory M Check topological mat https://www.cryst.ehu.es/cgibin/cryst/programs/magnetictopo.pl?tipog=gesp
[1] Unveiling stable and efficient antiperovskite semiconductors via high-throughput computation and interpretable machine learning
Hao Qu(瞿浩), Tao Hu(胡涛), Mingjun Li(李明军), Jiangyu Yang(杨江渝), Yunyi Zhou(周云逸), Shichang Li(李世长), Dengfeng Li(李登峰), Gang Tang(唐刚), and Chunbao Feng(冯春宝). Chin. Phys. B, 2026, 35(4): 046102.
[2] Machine learning prediction of HSE06-level band gaps in two-dimensional semiconductors with reference-guided graph neural networks
Zhen Wan(万振), Shun-Bo Jiang(姜顺波), Yuan Li(李圆), Hui Wang(王辉), Zong-Liang Li(李宗良), and Guang-Ping Zhang(张广平). Chin. Phys. B, 2026, 35(3): 037102.
[3] Formation of phosphorus monobromide (PBr) and phosphorus monoiodide (PI) radicals through direct radiative association: Prospects for astrochemical environments
Qinghui Wei(魏庆卉), Yang Chen(陈扬), Amaury A. de Almeida, Carmen M. Andreazza, Hongjing Liang(梁红静), and Bing Yan(闫冰). Chin. Phys. B, 2026, 35(3): 033301.
[4] GranuSAS: Software of rapid particle size distribution analysis from small angle scattering data
Qiaoyu Guo(郭桥雨), Fei Xie(谢飞), Xuefei Feng(冯雪飞), Zhe Sun(孙喆), Changda Wang(王昌达), and Xuechen Jiao(焦学琛). Chin. Phys. B, 2026, 35(2): 027802.
[5] Machine learning of chaotic characteristics in classical nonlinear dynamics using variational quantum circuit
Sheng-Chen Bai(白生辰) and Shi-Ju Ran(冉仕举). Chin. Phys. B, 2026, 35(2): 020303.
[6] Machine learning-assisted optimization of MTO basis sets
Zhiqiang Li(李志强) and Lei Wang(王蕾). Chin. Phys. B, 2026, 35(1): 016301.
[7] Review of machine learning tight-binding models: Route to accurate and scalable electronic simulations
Jijie Zou(邹暨捷), Zhanghao Zhouyin(周寅张皓), Shishir Kumar Pandey, and Qiangqiang Gu(顾强强). Chin. Phys. B, 2026, 35(1): 017101.
[8] EDIS: A simulation software for dynamic ion intercalation/deintercalation processes in electrode materials
Liqi Wang(王力奇), Ruijuan Xiao(肖睿娟), and Hong Li(李泓). Chin. Phys. B, 2026, 35(1): 018201.
[9] Revealing the dynamic responses of Pb under shock loading based on DFT-accuracy machine learning potential
Enze Hou(侯恩则), Xiaoyang Wang(王啸洋), and Han Wang(王涵). Chin. Phys. B, 2026, 35(1): 018701.
[10] Machine learning approach to reconstruct dephasing time from solid HHG spectra
Jiahao Liu(刘佳豪), Xi Zhao(赵曦), Jun Wang(王俊), and Songbin Zhang(张松斌). Chin. Phys. B, 2025, 34(9): 097804.
[11] Hyperparameter optimization and force error correction of neuroevolution potential for predicting thermal conductivity of wurtzite GaN
Zhuo Chen(陈卓), Yuejin Yuan(袁越锦), Wenyang Ding(丁文扬), Shouhang Li(李寿航), Meng An(安盟), and Gang Zhang(张刚). Chin. Phys. B, 2025, 34(8): 086110.
[12] Three-dimensional ResNet for efficient prediction of ground state phases in multicomponent dipolar spinor BECs
Chengji Liao(廖承继), Tiantian Li(李甜甜), Xiao-Dong Bai(柏小东), and Yunbo Zhang(张云波). Chin. Phys. B, 2025, 34(7): 076701.
[13] Significant increase in thermal conductivity of cathode material LiFePO4 by Na substitution: A machine learning interatomic potential-assisted investigation
Shi-Yi Li(李诗怡), Qian Liu(刘骞), Yu-Jia Zeng(曾育佳), Guofeng Xie(谢国锋), and Wu-Xing Zhou(周五星). Chin. Phys. B, 2025, 34(2): 028201.
[14] MaterialsGalaxy: A platform fusing experimental and theoretical data in condensed matter physics
Tiannian Zhu(朱天念), Zhong Fang(方忠), Quansheng Wu(吴泉生), and Hongming Weng(翁红明). Chin. Phys. B, 2025, 34(12): 120702.
[15] Quantum algorithm for marginal Fisher analysis
Jing Li(李静), Yanqi Song(宋燕琪), Sujuan Qin(秦素娟), Wenmin Li(李文敏), and Fei Gao(高飞). Chin. Phys. B, 2025, 34(12): 120302.
No Suggested Reading articles found!