Please wait a minute...
Chin. Phys. B, 2022, Vol. 31(5): 056302    DOI: 10.1088/1674-1056/ac5d2d
DATA PAPER Prev   Next  

Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials

Ruoting Zhao(赵若廷)1, Bangyu Xing(邢邦昱)1, Huimin Mu(穆慧敏)2, Yuhao Fu(付钰豪)2,3, and Lijun Zhang(张立军)1,3,†
1 State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE, Jilin Provincial International Cooperation Key Laboratory of High-Efficiency Clean Energy Materials, Electron Microscopy Center, and School of Materials Science and Engineering, Jilin University, Changchun 130012, China;
2 State Key Laboratory of Superhard Materials, College of Physics, Jilin University, Changchun 130012, China;
3 International Center of Computational Method and Software, Jilin University, Changchun 130012, China
Abstract  With the rapid development of artificial intelligence and machine learning (ML) methods, materials science is rapidly entering the era of data-driven materials informatics. ML models serve as the most crucial component, closely bridging material structure and material properties. There is a considerable difference in the prediction performance of different ML methods for material systems. Herein, we evaluated three categories (linear, kernel, and nonlinear methods) of models, with twelve ML algorithms commonly used in the materials field. In addition, halide perovskite was chosen as an example to evaluate the fitting performance of different models. We constructed a total dataset of 540 halide perovskites and 72 features, with formation energy and bandgap as target properties. We found that different categories of ML models show similar trends for different target properties. Among them, the difference between the models is enormous for the formation energy, with the coefficient of determination (R2) range 0.69-0.953. The fitting performance between the models is closer for bandgap, with the R2 range 0.941-0.997. The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap. It shows that the nonlinear-ensemble model, constructed by combining multiple weak learners, effectively describes the nonlinear relationship between material features and target property. In addition, the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap. Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems. The dataset that supported the findings of this study is available in Science Data Bank, with the link
Keywords:  machine learning      material informatics      first-principles calculations      halide perovskites  
Received:  19 February 2022      Revised:  12 March 2022      Accepted manuscript online: 
PACS: (First-principles theory)  
  71.15.Mb (Density functional theory, local density approximation, gradient and other corrections)  
  73.22.-f (Electronic structure of nanoscale materials and related systems)  
Fund: The project supported by the National Natural Science Foundation of China (Grants Nos.62125402 and 92061113).Calculations were performed in part at the high-performance computing center of Jilin University.
Corresponding Authors:  Lijun Zhang,     E-mail:
About author:  2022-3-14

Cite this article: 

Ruoting Zhao(赵若廷), Bangyu Xing(邢邦昱), Huimin Mu(穆慧敏), Yuhao Fu(付钰豪), and Lijun Zhang(张立军) Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials 2022 Chin. Phys. B 31 056302

[1] Gasteiger J and Zupan J 1993 Angewandte Chemie International Edition in English 32 503
[2] Mater A C and Coote M L 2019 J. Chem. Inf. Model. 59 2545
[3] Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L and Zdeborová L 2019 Rev. Mod. Phys. 91 045002
[4] Wei J, Chu X, Sun X Y, Xu K, Deng H X, Chen J, Wei Z and Lei M 2019 InfoMat 1 338
[5] Chen A, Zhang X and Zhou Z 2020 InfoMat 2 553
[6] Lyu R, Moore C E, Liu T, Yu Y and Wu Y 2021 J. Am. Chem. Soc. 143 12766
[7] Jablonka K M, Ongari D, Moosavi S M and Smit B 2020 Chem. Rev. 120 8066
[8] Behler J 2021 Chem. Rev. 121 10037
[9] Chen C, Zuo Y, Ye W, Li X, Deng Z and Ong S P 2020 Adv. Energy Mater. 10 1903242
[10] Batra R, Song L and Ramprasad R 2021 Nat. Rev. Mater. 6 655
[11] Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K A 2013 APL Materials 1 011002
[12] Calderon C E, Plata J J, Toher C, Oses C, Levy O, Fornari M, Natan A, Mehl M J, Hart G, Buongiorno Nardelli M and Curtarolo S 2015 Computational Materials Science 108 233
[13] Xie T and Grossman J C 2018 Phys. Rev. Lett. 120 145301
[14] Bartók A P and Csányi G 2015 International Journal of Quantum Chemistry 115 1051
[15] Bartók A P, Kermode J, Bernstein N and Csányi G 2018 Phys. Rev. X 8 041048
[16] Unke O T, Chmiela S, Sauceda H E, Gastegger M, Poltavsky I, Schütt K T, Tkatchenko A and Müller K R 2021 Chem. Rev. 121 10142
[17] Musil F, Grisafi A, Bartók A P, Ortner C, Csányi G and Ceriotti M 2021 Chem. Rev. 121 9759
[18] Kireeva N and Pervov V S 2017 Phys. Chem. Chem. Phys. 19 20904
[19] Bajusz D, Rácz A and Héberger K 2015 Journal of Cheminformatics 7 20
[20] Tao Q, Xu P, Li M and Lu W 2021 npj Computational Materials 7 1
[21] Pollice R, dos Passos Gomes G, Aldeghi M, Hickman R J, Krenn M, Lavigne C, Lindner-D'Addario M, Nigam A, Ser C T, Yao Z and Aspuru-Guzik A 2021 Acc. Chem. Res. 54 849
[22] Gao C, Min X, Fang M, Tao T, Zheng X, Liu Y, Wu X and Huang Z 2022 Advanced Functional Materials 32 2108044
[23] Schmidt J, Marques M R G, Botti S and Marques M A L 2019 npj Comput. Mater. 5 1
[24] Liu Z, Na G, Tian F, Yu L, Li J and Zhang L 2020 InfoMat 2 879
[25] Luo S, Li T, Wang X, Faizan M and Zhang L 2021 WIREs Computational Molecular Science 11 e1489
[26] Curtarolo S, Ahmetcik E, Scheffler M, Ghiringhelli L M and Ouyang R 2018 Phys. Rev. Materials 2 083802
[27] Santosa F and Symes W W 1986 SIAM J. Sci. Stat. Comput. 7 1307
[28] Tibshirani R 1996 Journal of the Royal Statistical Society: Series B (Methodological) 58 267
[29] Shi D, Adinolfi V, Comin R, Yuan M, Alarousu E, Buin A, Chen Y, Hoogland S, Rothenberger A, Katsiev K, Losovyj Y, Zhang X, Dowben P A, Mohammed O F, Sargent E H and Bakr O M 2015 Science 347 519
[30] Dong Q, Fang Y, Shao Y, Mulligan P, Qiu J, Cao L and Huang J 2015 Science 347 967
[31] Stranks S D, Eperon G E, Grancini G, Menelaou C, Alcocer M J P, Leijtens T, Herz L M, Petrozza A and Snaith H J 2013 Science 342 341
[32] Tsai H, Nie W, Blancon J C, Stoumpos C C, Asadpour R, Harutyunyan B, Neukirch A J, Verduzco R, Crochet J J, Tretiak S, Pedesseau L, Even J, Alam M A, Gupta G, Lou J, Ajayan P M, Bedzyk M J, Kanatzidis M G and Mohite A D 2016 Nature 536 312
[33] Yin W J, Shi T and Yan Y 2017 J. Am. Chem. Soc. 139 2630
[35] Zhang C X, Shen T, Guo D, Tang L M, Yang K and Deng H X 2020 InfoMat 2 1034

[36] Zhao Y, Li C and Shen L 2019 InfoMat 1164
[37] Li T, Luo S, Wang X and Zhang L 2021 Adv. Mater. 332008574
[38] Xu Q, Yang D, Lv J, Sun Y Y and Zhang L 2018 Small Methods 21700316
[39] Wang X, Li T, Xing B, Faizan M, Biswas K and Zhang L 2021 J. Phys. Chem. Lett. 1210532
[40] Liu Z, Zhao X, Zunger A and Zhang L 2019 Advanced Electronic Materials 5 1900234
[41] Yang D, Lv J, Zhao X, Xu Q, Fu Y, Zhan Y, Zunger A and Zhang L 2017 Chem. Mater. 29 524
[42] Zhao X G, Yang D, Ren J C, Sun Y, Xiao Z and Zhang L 2018 Joule 2 1662
[43] Zhao D, Li T, Xu Q, Wang X and Zhang L 2019 Chin. Opt. 12 964
[44] Yang J X, Zhang P, Wang J P and Wei S H 2020 Chin. Phys. B 29 108401
[45] Jiang X W and Yin W J 2020 Chin. Phys. B 29 028803
[46] Lu S, Zhou Q, Ouyang Y, Guo Y, Li Q and Wang J 2018 Nat. Commun. 9 3405
[47] Li Z, Xu Q, Sun Q, Hou Z and Yin W J 2019 Advanced Functional Materials 29 1807280
[48] Ali A, Park H, Mall R, Aïssa B, Sanvito S, Bensmail H, Belaidi A and El-Mellouhi F 2020 Chem. Mater. 32 2998
[49] Talapatra A, Uberuaga B P, Stanek C R and Pilania G 2021 Chem. Mater. 33 845
[50] Weng B, Song Z, Zhu R, Yan Q, Sun Q, Grice C G, Yan Y and Yin W J 2020 Nat. Commun. 11 1
[51] Pimachev A K and Neogi S 2021 npj Comput. Mater. 7 1
[52] Ye W, Chen C, Wang Z, Chu I H and Ong S P 2018 Nat. Commun. 9 1
[53] Ma X Y, Lyu H Y, Hao K R, Zhao Y M, Qian X, Yan Q B and Su G 2021 Science Bulletin 66 233
[54] Kresse G and Furthmüller J 1996 Phys. Rev. B 54 11169
[55] Kresse G and Furthmüller J 1996 Computational Materials Science 6 15
[56] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 77 3865
[57] Grimme S 2006 Journal of Computational Chemistry 27 1787
[58] Yang D, Fu Y, Sun Y, Li Y, Wang K, Xiao Z, Biswas K and Zhang L 2021 Phys. Rev. Mater. 5 054603
[59] Tian F, Feng W, Xing B, He X, Saidi W A and Zhang L 2021 Advanced Energy and Sustainability Research 2 2100087
[60] Xu Q, Stroppa A, Lv J, Zhao X, Yang D, Biswas K and Zhang L 2019 Phys. Rev. Mater. 3 125401
[61] Zhao G, Xie J, Zhou K, Xing B, Wang X, Tian F, He X and Zhang L 2022 Chin. Phys. B 31 037104
[62] Wang X, Fu Y, Na G, Li H and Zhang L 2019 Acta Phys. Sin. 68 157101 (in Chinese)
[63] Li Y, Na G, Luo S, He X and Zhang L 2020 Acta Phys. Chim. Sin. 37 2007015
[64] Zhao X G, Zhou K, Xing B, Zhao R, Luo S, Li T, Sun Y, Na G, Xie J, Yang X, Wang X, Wang X, He X, Lv J, Fu Y and Zhang L 2021 Science Bulletin 66 1973
[65] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M and Duchesnay é 2011 Journal of Machine Learning Research 12 2825
[66] Chen T and Guestrin C 2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 785-794
[67] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q and Liu T Y 2017 Advances in Neural Information Processing Systems (Vol. 30)
[68] Bergstra J, Yamins D and Cox D 2013 Proceedings of the 30th International Conference on Machine Learning pp. 115-23
[69] Ong S P, Richards W D, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Computational Materials Science 68 314
[70] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput Mater 7 83
[71] Cai X, Zhang Y, Shi Z, Chen Y, Xia Y, Yu A, Xu Y, Xie F, Shao H, Zhu H, Fu D, Zhan Y and Zhang H 2022 Advanced Science 9 2103648
[72] Cai X, Zhang Y, Shi Z, Chen Y, Xia Y, Yu A, Xu Y, Xie F, Shao H, Zhu H, Fu D, Zhan Y and Zhang H Advanced Science n/a 2103648
[73] Gao W and Zhou Z H 2013 Artificial Intelligence 203 1
[74] Breiman L 1996 Mach Learn 24 123
[75] Rokach L 2010 Artif. Intell. Rev. 33 1
[76] Hecht-nielsen R 1992 Neural Networks for Perception (ed H Wechsler) pp. 65-93
[1] Prediction of lattice thermal conductivity with two-stage interpretable machine learning
Jinlong Hu(胡锦龙), Yuting Zuo(左钰婷), Yuzhou Hao(郝昱州), Guoyu Shu(舒国钰), Yang Wang(王洋), Minxuan Feng(冯敏轩), Xuejie Li(李雪洁), Xiaoying Wang(王晓莹), Jun Sun(孙军), Xiangdong Ding(丁向东), Zhibin Gao(高志斌), Guimei Zhu(朱桂妹), Baowen Li(李保文). Chin. Phys. B, 2023, 32(4): 046301.
[2] Prediction of one-dimensional CrN nanostructure as a promising ferromagnetic half-metal
Wenyu Xiang(相文雨), Yaping Wang(王亚萍), Weixiao Ji(纪维霄), Wenjie Hou(侯文杰),Shengshi Li(李胜世), and Peiji Wang(王培吉). Chin. Phys. B, 2023, 32(3): 037103.
[3] Rational design of Fe/Co-based diatomic catalysts for Li-S batteries by first-principles calculations
Xiaoya Zhang(张晓雅), Yingjie Cheng(程莹洁), Chunyu Zhao(赵春宇), Jingwan Gao(高敬莞), Dongxiao Kan(阚东晓), Yizhan Wang(王义展), Duo Qi(齐舵), and Yingjin Wei(魏英进). Chin. Phys. B, 2023, 32(3): 036803.
[4] Single-layer intrinsic 2H-phase LuX2 (X = Cl, Br, I) with large valley polarization and anomalous valley Hall effect
Chun-Sheng Hu(胡春生), Yun-Jing Wu(仵允京), Yuan-Shuo Liu(刘元硕), Shuai Fu(傅帅),Xiao-Ning Cui(崔晓宁), Yi-Hao Wang(王易昊), and Chang-Wen Zhang(张昌文). Chin. Phys. B, 2023, 32(3): 037306.
[5] Li2NiSe2: A new-type intrinsic two-dimensional ferromagnetic semiconductor above 200 K
Li-Man Xiao(肖丽蔓), Huan-Cheng Yang(杨焕成), and Zhong-Yi Lu(卢仲毅). Chin. Phys. B, 2023, 32(3): 037501.
[6] First-principles prediction of quantum anomalous Hall effect in two-dimensional Co2Te lattice
Yuan-Shuo Liu(刘元硕), Hao Sun(孙浩), Chun-Sheng Hu(胡春生), Yun-Jing Wu(仵允京), and Chang-Wen Zhang(张昌文). Chin. Phys. B, 2023, 32(2): 027101.
[7] The coupled deep neural networks for coupling of the Stokes and Darcy-Forchheimer problems
Jing Yue(岳靖), Jian Li(李剑), Wen Zhang(张文), and Zhangxin Chen(陈掌星). Chin. Phys. B, 2023, 32(1): 010201.
[8] Variational quantum simulation of thermal statistical states on a superconducting quantum processer
Xue-Yi Guo(郭学仪), Shang-Shu Li(李尚书), Xiao Xiao(效骁), Zhong-Cheng Xiang(相忠诚), Zi-Yong Ge(葛自勇), He-Kang Li(李贺康), Peng-Tao Song(宋鹏涛), Yi Peng(彭益), Zhan Wang(王战), Kai Xu(许凯), Pan Zhang(张潘), Lei Wang(王磊), Dong-Ning Zheng(郑东宁), and Heng Fan(范桁). Chin. Phys. B, 2023, 32(1): 010307.
[9] Data-driven modeling of a four-dimensional stochastic projectile system
Yong Huang(黄勇) and Yang Li(李扬). Chin. Phys. B, 2022, 31(7): 070501.
[10] Machine learning potential aided structure search for low-lying candidates of Au clusters
Tonghe Ying(应通和), Jianbao Zhu(朱健保), and Wenguang Zhu(朱文光). Chin. Phys. B, 2022, 31(7): 078402.
[11] Bandgap evolution of Mg3N2 under pressure: Experimental and theoretical studies
Gang Wu(吴刚), Lu Wang(王璐), Kuo Bao(包括), Xianli Li(李贤丽), Sheng Wang(王升), and Chunhong Xu(徐春红). Chin. Phys. B, 2022, 31(6): 066205.
[12] Quantum algorithm for neighborhood preserving embedding
Shi-Jie Pan(潘世杰), Lin-Chun Wan(万林春), Hai-Ling Liu(刘海玲), Yu-Sen Wu(吴宇森), Su-Juan Qin(秦素娟), Qiao-Yan Wen(温巧燕), and Fei Gao(高飞). Chin. Phys. B, 2022, 31(6): 060304.
[13] High-throughput computational material screening of the cycloalkane-based two-dimensional Dion—Jacobson halide perovskites for optoelectronics
Guoqi Zhao(赵国琪), Jiahao Xie(颉家豪), Kun Zhou(周琨), Bangyu Xing(邢邦昱), Xinjiang Wang(王新江), Fuyu Tian(田伏钰), Xin He(贺欣), and Lijun Zhang(张立军). Chin. Phys. B, 2022, 31(3): 037104.
[14] First-principles study of stability of point defects and their effects on electronic properties of GaAs/AlGaAs superlattice
Shan Feng(冯山), Ming Jiang(姜明), Qi-Hang Qiu(邱启航), Xiang-Hua Peng(彭祥花), Hai-Yan Xiao(肖海燕), Zi-Jiang Liu(刘子江), Xiao-Tao Zu(祖小涛), and Liang Qiao(乔梁). Chin. Phys. B, 2022, 31(3): 036104.
[15] Quantum partial least squares regression algorithm for multiple correlation problem
Yan-Yan Hou(侯艳艳), Jian Li(李剑), Xiu-Bo Chen(陈秀波), and Yuan Tian(田源). Chin. Phys. B, 2022, 31(3): 030304.
No Suggested Reading articles found!