中国物理B ›› 2026, Vol. 35 ›› Issue (5): 50701-050701.doi: 10.1088/1674-1056/ae2bf2

• • 上一篇    下一篇

Curation and featurization of multiple topological materials databases

Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡   

  1. 1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China;
    2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium;
    3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium;
    4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
  • 收稿日期:2025-10-27 修回日期:2025-12-11 接受日期:2025-12-12 发布日期:2026-04-29
  • 通讯作者: Gian-Marco Rignanese,E-mail:gian-marco.rignanese@uclouvain.be;Hongming Weng,E-mail:hmweng@iphy.ac.cn E-mail:gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn
  • 基金资助:
    Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000).

Curation and featurization of multiple topological materials databases

Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡   

  1. 1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China;
    2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium;
    3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium;
    4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
  • Received:2025-10-27 Revised:2025-12-11 Accepted:2025-12-12 Published:2026-04-29
  • Contact: Gian-Marco Rignanese,E-mail:gian-marco.rignanese@uclouvain.be;Hongming Weng,E-mail:hmweng@iphy.ac.cn E-mail:gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn
  • Supported by:
    Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000).

摘要: The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.

关键词: data, topological materials, machine learning

Abstract: The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.

Key words: data, topological materials, machine learning

中图分类号:  (Data analysis: algorithms and implementation; data management)

  • 07.05.Kf