中国物理B ›› 2026, Vol. 35 ›› Issue (5): 50701-050701.doi: 10.1088/1674-1056/ae2bf2

所属专题: Featured Column — DATA PAPER

• • 上一篇    下一篇

Curation and featurization of multiple topological materials databases

Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡   

  1. 1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China;
    2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium;
    3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium;
    4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
  • 收稿日期:2025-10-27 修回日期:2025-12-11 接受日期:2025-12-12 出版日期:2026-04-24 发布日期:2026-04-29
  • 通讯作者: Gian-Marco Rignanese, Hongming Weng E-mail:gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn
  • 基金资助:
    We extend our gratitude to N. Regnault for providing the data from the Topological Materials Database, which was instrumental in the data curation process. Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000). H. W. also acknowledges support from the New Cornerstone Science Foundation through the XPLORER PRIZE.

Curation and featurization of multiple topological materials databases

Yuqing He(贺雨晴)1, Matteo Giantomassi2, Gian-Marco Rignanese2,3,4,†, and Hongming Weng(翁红明)1,‡   

  1. 1 Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China;
    2 Institute of Condensed Matter and Nanosciences, UCLouvain, Chemin des Étoiles 8, 1348 Louvain-la-Neuve, Belgium;
    3 WEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium;
    4 School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
  • Received:2025-10-27 Revised:2025-12-11 Accepted:2025-12-12 Online:2026-04-24 Published:2026-04-29
  • Contact: Gian-Marco Rignanese, Hongming Weng E-mail:gian-marco.rignanese@uclouvain.be;hmweng@iphy.ac.cn
  • Supported by:
    We extend our gratitude to N. Regnault for providing the data from the Topological Materials Database, which was instrumental in the data curation process. Y. H. and H. W. acknowledge financial support from the National Key Research and Development Program of China (Grant No. 2022YFA1403800), the National Natural Science Foundation of China (Grant Nos. 12188101 and 11925408), and the Chinese Academy of Sciences (Grant No. XDB33000000). H. W. also acknowledges support from the New Cornerstone Science Foundation through the XPLORER PRIZE.

摘要: The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.

关键词: data, topological materials, machine learning

Abstract: The discovery of topological materials has advanced rapidly due to high-throughput computation and machine learning, but research progress is hampered by inconsistent classification standards and fragmented data resources. Existing databases differ in computational methods, material coverage, and labeling criteria, making it difficult to compare findings across studies. To overcome these challenges, we present a unified topological materials dataset that systematically combines and reconciles two major databases: Materiae and the Topological Materials Database. This dataset provides consistent topological classifications for 35608 materials, accessible through the Materials Galaxy platform for interactive exploration and available for bulk download via MatElab. We describe the featurization methodology that converts crystal structures into 4710 machine-learning-ready descriptors and present a comprehensive analysis of topological material distributions. This work serves as a complete guide for accessing, utilizing, and interpreting this unified resource, designed to enable reproducible machine learning applications and accelerate the discovery of topological materials.

Key words: data, topological materials, machine learning

中图分类号:  (Data analysis: algorithms and implementation; data management)

  • 07.05.Kf