中国物理B ›› 2022, Vol. 31 ›› Issue (5): 56302-056302.doi: 10.1088/1674-1056/ac5d2d

• • 上一篇    下一篇

Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials

Ruoting Zhao(赵若廷)1, Bangyu Xing(邢邦昱)1, Huimin Mu(穆慧敏)2, Yuhao Fu(付钰豪)2,3, and Lijun Zhang(张立军)1,3,†   

  1. 1 State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE, Jilin Provincial International Cooperation Key Laboratory of High-Efficiency Clean Energy Materials, Electron Microscopy Center, and School of Materials Science and Engineering, Jilin University, Changchun 130012, China;
    2 State Key Laboratory of Superhard Materials, College of Physics, Jilin University, Changchun 130012, China;
    3 International Center of Computational Method and Software, Jilin University, Changchun 130012, China
  • 收稿日期:2022-02-19 修回日期:2022-03-12 出版日期:2022-05-14 发布日期:2022-05-05
  • 通讯作者: Lijun Zhang,E-mail:lijun_zhang@jlu.edu.cn E-mail:lijun_zhang@jlu.edu.cn
  • 基金资助:
    The project supported by the National Natural Science Foundation of China (Grants Nos.62125402 and 92061113).Calculations were performed in part at the high-performance computing center of Jilin University.

Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials

Ruoting Zhao(赵若廷)1, Bangyu Xing(邢邦昱)1, Huimin Mu(穆慧敏)2, Yuhao Fu(付钰豪)2,3, and Lijun Zhang(张立军)1,3,†   

  1. 1 State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE, Jilin Provincial International Cooperation Key Laboratory of High-Efficiency Clean Energy Materials, Electron Microscopy Center, and School of Materials Science and Engineering, Jilin University, Changchun 130012, China;
    2 State Key Laboratory of Superhard Materials, College of Physics, Jilin University, Changchun 130012, China;
    3 International Center of Computational Method and Software, Jilin University, Changchun 130012, China
  • Received:2022-02-19 Revised:2022-03-12 Online:2022-05-14 Published:2022-05-05
  • Contact: Lijun Zhang,E-mail:lijun_zhang@jlu.edu.cn E-mail:lijun_zhang@jlu.edu.cn
  • About author:2022-3-14
  • Supported by:
    The project supported by the National Natural Science Foundation of China (Grants Nos.62125402 and 92061113).Calculations were performed in part at the high-performance computing center of Jilin University.

摘要: With the rapid development of artificial intelligence and machine learning (ML) methods, materials science is rapidly entering the era of data-driven materials informatics. ML models serve as the most crucial component, closely bridging material structure and material properties. There is a considerable difference in the prediction performance of different ML methods for material systems. Herein, we evaluated three categories (linear, kernel, and nonlinear methods) of models, with twelve ML algorithms commonly used in the materials field. In addition, halide perovskite was chosen as an example to evaluate the fitting performance of different models. We constructed a total dataset of 540 halide perovskites and 72 features, with formation energy and bandgap as target properties. We found that different categories of ML models show similar trends for different target properties. Among them, the difference between the models is enormous for the formation energy, with the coefficient of determination (R2) range 0.69-0.953. The fitting performance between the models is closer for bandgap, with the R2 range 0.941-0.997. The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap. It shows that the nonlinear-ensemble model, constructed by combining multiple weak learners, effectively describes the nonlinear relationship between material features and target property. In addition, the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap. Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems. The dataset that supported the findings of this study is available in Science Data Bank, with the link https://www.doi.org/10.11922/sciencedb.01611.

关键词: machine learning, material informatics, first-principles calculations, halide perovskites

Abstract: With the rapid development of artificial intelligence and machine learning (ML) methods, materials science is rapidly entering the era of data-driven materials informatics. ML models serve as the most crucial component, closely bridging material structure and material properties. There is a considerable difference in the prediction performance of different ML methods for material systems. Herein, we evaluated three categories (linear, kernel, and nonlinear methods) of models, with twelve ML algorithms commonly used in the materials field. In addition, halide perovskite was chosen as an example to evaluate the fitting performance of different models. We constructed a total dataset of 540 halide perovskites and 72 features, with formation energy and bandgap as target properties. We found that different categories of ML models show similar trends for different target properties. Among them, the difference between the models is enormous for the formation energy, with the coefficient of determination (R2) range 0.69-0.953. The fitting performance between the models is closer for bandgap, with the R2 range 0.941-0.997. The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap. It shows that the nonlinear-ensemble model, constructed by combining multiple weak learners, effectively describes the nonlinear relationship between material features and target property. In addition, the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap. Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems. The dataset that supported the findings of this study is available in Science Data Bank, with the link https://www.doi.org/10.11922/sciencedb.01611.

Key words: machine learning, material informatics, first-principles calculations, halide perovskites

中图分类号:  (First-principles theory)

  • 63.20.dk
71.15.Mb (Density functional theory, local density approximation, gradient and other corrections) 73.22.-f (Electronic structure of nanoscale materials and related systems)