Please wait a minute...
Chin. Phys. B, 2026, Vol. 35(5): 050701    DOI: 10.1088/1674-1056/ae2bf2
Special Issue: Featured Column — DATA PAPER
DATA PAPER Prev   Next  

Curation and featurization of multiple topological materials databases

Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡
1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China;
2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium;
3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium;
4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
Abstract  The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.
Keywords:  data      topological materials      machine learning  
Received:  27 October 2025      Revised:  11 December 2025      Accepted manuscript online:  12 December 2025
PACS:  07.05.Kf (Data analysis: algorithms and implementation; data management)  
Fund: We extend our gratitude to N. Regnault for providing the data from the Topological Materials Database, which was instrumental in the data curation process. Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000). H. W. also acknowledges support from the New Cornerstone Science Foundation through the XPLORER PRIZE.
Corresponding Authors:  Gian-Marco Rignanese, Hongming Weng     E-mail:  gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn

Cite this article: 

Yuqing He(贺雨晴), Matteo Giantomassi, Gian-Marco Rignanese, and Hongming Weng(翁红明) Curation and featurization of multiple topological materials databases 2026 Chin. Phys. B 35 050701

[1] Thouless D J, Kohmoto M, NightingaleMP and den Nijs M 1982 Phys. Rev. Lett. 49 405
[2] Tsui D C, Stormer H L and Gossard A C 1982 Phys. Rev. Lett. 48 1559
[3] Laughlin R B 1983 Phys. Rev. Lett. 50 1395
[4] Kane C L and Mele E J 2005 Phys. Rev. Lett. 95 146802
[5] Hasan M Z and Kane C L 2010 Rev. Mod. Phys. 82 3045
[6] Qi X L and Zhang S C 2011 Rev. Mod. Phys. 83 1057
[7] Bansil A, Lin H and Das T 2016 Rev. Mod. Phys. 88 021004
[8] Kitaev A 2009 Periodic table for topological insulators and superconductors, AIP conference proceedings (American Institute of Physics) Vol. 1134 pp. 22-30
[9] Konig M, Wiedmann S, Brune C, Roth A, Buhmann H, Molenkamp L W, Qi X L and Zhang S C 2007 Science 318 766
[10] Fu L, Kane C L and Mele E J 2007 Phys. Rev. Lett. 98 106803
[11] Zhang H, Liu C X, Qi X L, Dai X, Fang Z and Zhang S C 2009 Nat. Phys. 5 438
[12] Xia Y, Qian D, Hsieh D, Wray L, Pal A, Lin H, Bansil A, Grauer D, Hor Y S, Cava R J, et al. 2009 Nat. Phys. 5 398
[13] Chen Y, Analytis J G, Chu J H, Liu Z, Mo S K, Qi X L, Zhang H, Lu D, Dai X, Fang Z, et al. 2009 Science 325 178
[14] Benalcazar W A, Bernevig B A and Hughes T L 2017 Science 357 61
[15] Wan X, Turner A M, Vishwanath A and Savrasov S Y 2011 Phys. Rev. B 83 205101
[16] Wang Z, Sun Y, Chen X Q, Franchini C, Xu G, Weng H, Dai X and Fang Z 2012 Phys. Rev. B 85 195320
[17] Young S M, Zaheer S, Teo J C, Kane C L, Mele E J and Rappe A M 2012 Phys. Rev. Lett. 108 140405
[18] Wang Z, Weng H, Wu Q, Dai X and Fang Z 2013 Phys. Rev. B 88 125427
[19] Liu Z, Jiang J, Zhou B, Wang Z, Zhang Y, Weng H, Prabhakaran D, Mo S K, Peng H, Dudin P, et al. 2014 Nat. Mater. 13 677
[20] Lv B, Muff S, Qian T, Song Z, Nie S, Xu N, Richard P, Matt C E, Plumb N C, Zhao L, et al. 2015 Phys. Rev. Lett. 115 217601
[21] Hohenberg P and Kohn W 1964 Phys. Rev. 136 B864
[22] Kohn W and Sham L J 1965 Phys. Rev. 140 A1133
[23] Bradlyn B, Elcoro L, Cano J, Vergniory M G,Wang Z, Felser C, Aroyo M I and Bernevig B A 2017 Nature 547 298
[24] Hellenbrandt M 2004 Crystallogr. Rev. 10 17
[25] Jain A, Ong S, Hautier G, Chen W, Richards W, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K 2013 APL Mater. 1 011002
[26] Zhang T, Jiang Y, Song Z, Huang H, He Y, Fang Z, Weng H and Fang C 2019 Nature 566 475
[27] Vergniory M G, Elcoro L, Felser C, Regnault N, Bernevig B A and Wang Z 2019 Nature 566 480
[28] Tang F, Po H C, Vishwanath A and Wan X 2019 Nature 566 486
[29] Vergniory M G, Wieder B J, Elcoro L, Parkin S S, Felser C, Bernevig B A and Regnault N 2022 Science 376 eabg9094
[30] Watanabe H, Po H C and Vishwanath A 2018 Science Advances 4 aat8685
[31] Elcoro L,Wieder B J, Song Z, Xu Y, Bradlyn B and Bernevig B A 2021 Nat. Commun. 12 5965
[32] Peng B, Jiang Y, Fang Z, Weng H and Fang C 2022 Phys. Rev. B 105 235138
[33] Samuel A L 1959 IBM J. Res. Dev. 3 210
[34] Zhang Y and Kim E A 2017 Phys. Rev. Lett. 118 216401
[35] Zhang Y, Ginsparg P and Kim E A 2020 Phys. Rev. Res. 2 023283
[36] Zhang P, Shen H and Zhai H 2018 Phys. Rev. Lett. 120 066401
[37] Scheurer M S and Slager R J 2020 Phys. Rev. Lett. 124 226401
[38] Donoho D 2006 IEEE Transactions on Information Theory 52 1289
[39] Acosta C M, Ouyang R, Fazzio A, Scheffler M, Ghiringhelli L M and Carbogno C 2018 arXiv 1805.10950
[40] Ouyang R, Curtarolo S, Ahmetcik E, SchefflerMand Ghiringhelli LM 2018 Phys. Rev. Mater. 2 083802
[41] Cao G, Ouyang R, Ghiringhelli L M, Scheffler M, Liu H, Carbogno C and Zhang Z 2020 Phys. Rev. Mater. 4 034204
[42] Liu J, Cao G, Zhou Z and Liu H 2021 J. Phys. Condens. Matter 33 325501
[43] Claussen N, Bernevig B A and Regnault N 2020 Phys. Rev. B 101 245117
[44] Marrazzo A, Gibertini M, Campi D, Mounet N and Marzari N 2019 Nano Lett. 19 8431
[45] Haastrup S, Strange M, Pandey M, Deilmann T, Schmidt P S, Hinsche N F, Gjerding M N, Torelli D, Larsen P M, Riis-Jensen A C, et al. 2018 2D Mater. 5 042002
[46] Zhou J, Shen L, Costa M D, Persson K A, Ong S P, Huck P, Lu Y, Ma X, Chen Y, Tang H, et al. 2019 Scientific Data 6 86
[47] Olsen T, Andersen E, Okugawa T, Torelli D, Deilmann T and Thygesen K S 2019 Phys. Rev. Mater. 3 024005
[48] Wang D, Tang F, Ji J, Zhang W, Vishwanath A, Po H C and Wan X 2019 Phys. Rev. B 100 195108
[49] Schleder G R, Focassio B and Fazzio A 2021 Appl. Phys. Rev. 8 031409
[50] Andrejevic N, Andrejevic J, Bernevig B A, Regnault N, Han F, Fabbris G, Nguyen T, Drucker N C, Rycroft C H and Li M 2022 Adv. Mater. 34 2204113
[51] Ma A, Zhang Y, Christensen T, Po H C, Jing L, Fu L and Soljacic M 2023 Nano Lett. 23 772
[52] Wu Q S, Autes G, Mounet N and Yazyev O V 2019 TopoMat: A Database of High-Throughput First-Principles Calculations of Topological Materials, Materials Cloud Archive 2019.0019/v1
[53] He Y, De Breuck P P, Weng H, Giantomassi M and Rignanese G M 2025 npj Computational Materials 11 181
[54] Ong S P, RichardsWD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Comput. Mater. Sci. 68 314
[55] Ho T K 1995 Random decision forests, Proceedings of 3rd International Conference on Document Analysis and Recognition (Montreal, Canada, 14-16 August 1995) (IEEE) pp. 278-282
[56] McCulloch W S and Pitts W 1943 The bulletin of mathematical biophysics 5 115
[57] Knobbe A J, Siebes A, van der Wallen D 1999 Multi-relational Decision Tree Induction In: · Zytkow J M, Rauch J (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999 Lecture Notes in Computer Science, Vol. 1704 (Berlin: Springer)
[58] Barnard J and Meng X L 1999 Statistical methods in medical research 8 17
[59] Ward L, Dunn A, Faghaninia A, Zimmermann N E, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M, Chard K, Asta M, Persson K A, Snyder G J, Foster I and Jain A 2018 Comput. Mater. Sci. 152 60
[60] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput. Mater. 7 83
[61] De Breuck P P 2025 Vibrational properties of solids: a machine learning approach (UCLouvain)
[62] Talirz L, Kumbhar S, Passaro E, Yakutovich A V, Granata V, Gargiulo F, Borelli M, Uhrin M, Huber S P, Zoupanos S, Adorf C S, Andersen C W, Schütt O, Pignedoli C A, Passerone D, VandeVondele J, Schulthess T C, Smit B, Pizzi G and Marzari N 2020 Scientific Data 7
[63] Fraux G, Cersonsky R and Ceriotti M 2020 Journal of Open Source Software 5 2117
[64] Ye C, Wang Y, Xie X, Zhu T, Liu J, He Y, Zhang L, Zhang J, Fang Z, Wang L, et al. 2025 npj Computational Materials 12 63
[65] Kresse G and Furthmüller J 1996 Computational Materials Science 6 15
[66] He Y, Jiang Y, Zhang T, Huang H, Fang C and Jin Z 2019 Chin. Phys. B 28 087102
[67] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 77 3865
[68] Wang Z 2021 Vasp2trace https://github.com/zjwang11/irvsp
[69] Vergniory M Check topological mat https://www.cryst.ehu.es/cgibin/cryst/programs/magnetictopo.pl?tipog=gesp
[1] HiFAST: An H I data calibration and imaging pipeline for the FAST IV: The stray-radiation correction
Qing-Ze Chen(陈箐泽), Jie Wang(王杰), Ying-Jie Jing(景英杰), Li-Gang Hou(侯立刚), Chen Xu(徐晨), Tian-Tian Liang(梁甜甜), Xu-Yang Gao(高旭阳), Jin-Lin Han(韩金林), Zi-Ming Liu(刘孜铭), Bin Liu(刘彬), Chuan-Peng Zhang(张传鹏), Heng-Qian Gan(甘恒谦), Ming Zhu(朱明), Yan Zhu(朱岩), and Peng Jiang(姜鹏). Chin. Phys. B, 2026, 35(5): 059501.
[2] A residual-based adaptive refinement physics-informed neural networks (RAR-PINNs) method for fifth-order KdV equation
Shi-Fang Tian(田十方), Ya-Xuan Yu(于亚璇), and Biao Li(李彪). Chin. Phys. B, 2026, 35(5): 050201.
[3] Unveiling stable and efficient antiperovskite semiconductors via high-throughput computation and interpretable machine learning
Hao Qu(瞿浩), Tao Hu(胡涛), Mingjun Li(李明军), Jiangyu Yang(杨江渝), Yunyi Zhou(周云逸), Shichang Li(李世长), Dengfeng Li(李登峰), Gang Tang(唐刚), and Chunbao Feng(冯春宝). Chin. Phys. B, 2026, 35(4): 046102.
[4] Machine learning prediction of HSE06-level band gaps in two-dimensional semiconductors with reference-guided graph neural networks
Zhen Wan(万振), Shun-Bo Jiang(姜顺波), Yuan Li(李圆), Hui Wang(王辉), Zong-Liang Li(李宗良), and Guang-Ping Zhang(张广平). Chin. Phys. B, 2026, 35(3): 037102.
[5] Formation of phosphorus monobromide (PBr) and phosphorus monoiodide (PI) radicals through direct radiative association: Prospects for astrochemical environments
Qinghui Wei(魏庆卉), Yang Chen(陈扬), Amaury A. de Almeida, Carmen M. Andreazza, Hongjing Liang(梁红静), and Bing Yan(闫冰). Chin. Phys. B, 2026, 35(3): 033301.
[6] GranuSAS: Software of rapid particle size distribution analysis from small angle scattering data
Qiaoyu Guo(郭桥雨), Fei Xie(谢飞), Xuefei Feng(冯雪飞), Zhe Sun(孙喆), Changda Wang(王昌达), and Xuechen Jiao(焦学琛). Chin. Phys. B, 2026, 35(2): 027802.
[7] Machine learning of chaotic characteristics in classical nonlinear dynamics using variational quantum circuit
Sheng-Chen Bai(白生辰) and Shi-Ju Ran(冉仕举). Chin. Phys. B, 2026, 35(2): 020303.
[8] Machine learning-assisted optimization of MTO basis sets
Zhiqiang Li(李志强) and Lei Wang(王蕾). Chin. Phys. B, 2026, 35(1): 016301.
[9] Review of machine learning tight-binding models: Route to accurate and scalable electronic simulations
Jijie Zou(邹暨捷), Zhanghao Zhouyin(周寅张皓), Shishir Kumar Pandey, and Qiangqiang Gu(顾强强). Chin. Phys. B, 2026, 35(1): 017101.
[10] EDIS: A simulation software for dynamic ion intercalation/deintercalation processes in electrode materials
Liqi Wang(王力奇), Ruijuan Xiao(肖睿娟), and Hong Li(李泓). Chin. Phys. B, 2026, 35(1): 018201.
[11] Revealing the dynamic responses of Pb under shock loading based on DFT-accuracy machine learning potential
Enze Hou(侯恩则), Xiaoyang Wang(王啸洋), and Han Wang(王涵). Chin. Phys. B, 2026, 35(1): 018701.
[12] Machine learning approach to reconstruct dephasing time from solid HHG spectra
Jiahao Liu(刘佳豪), Xi Zhao(赵曦), Jun Wang(王俊), and Songbin Zhang(张松斌). Chin. Phys. B, 2025, 34(9): 097804.
[13] Hyperparameter optimization and force error correction of neuroevolution potential for predicting thermal conductivity of wurtzite GaN
Zhuo Chen(陈卓), Yuejin Yuan(袁越锦), Wenyang Ding(丁文扬), Shouhang Li(李寿航), Meng An(安盟), and Gang Zhang(张刚). Chin. Phys. B, 2025, 34(8): 086110.
[14] Three-dimensional ResNet for efficient prediction of ground state phases in multicomponent dipolar spinor BECs
Chengji Liao(廖承继), Tiantian Li(李甜甜), Xiao-Dong Bai(柏小东), and Yunbo Zhang(张云波). Chin. Phys. B, 2025, 34(7): 076701.
[15] Significant increase in thermal conductivity of cathode material LiFePO4 by Na substitution: A machine learning interatomic potential-assisted investigation
Shi-Yi Li(李诗怡), Qian Liu(刘骞), Yu-Jia Zeng(曾育佳), Guofeng Xie(谢国锋), and Wu-Xing Zhou(周五星). Chin. Phys. B, 2025, 34(2): 028201.
No Suggested Reading articles found!