|
|
Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials |
Ruoting Zhao(赵若廷)1, Bangyu Xing(邢邦昱)1, Huimin Mu(穆慧敏)2, Yuhao Fu(付钰豪)2,3, and Lijun Zhang(张立军)1,3,† |
1 State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE, Jilin Provincial International Cooperation Key Laboratory of High-Efficiency Clean Energy Materials, Electron Microscopy Center, and School of Materials Science and Engineering, Jilin University, Changchun 130012, China; 2 State Key Laboratory of Superhard Materials, College of Physics, Jilin University, Changchun 130012, China; 3 International Center of Computational Method and Software, Jilin University, Changchun 130012, China |
|
|
Abstract With the rapid development of artificial intelligence and machine learning (ML) methods, materials science is rapidly entering the era of data-driven materials informatics. ML models serve as the most crucial component, closely bridging material structure and material properties. There is a considerable difference in the prediction performance of different ML methods for material systems. Herein, we evaluated three categories (linear, kernel, and nonlinear methods) of models, with twelve ML algorithms commonly used in the materials field. In addition, halide perovskite was chosen as an example to evaluate the fitting performance of different models. We constructed a total dataset of 540 halide perovskites and 72 features, with formation energy and bandgap as target properties. We found that different categories of ML models show similar trends for different target properties. Among them, the difference between the models is enormous for the formation energy, with the coefficient of determination (R2) range 0.69-0.953. The fitting performance between the models is closer for bandgap, with the R2 range 0.941-0.997. The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap. It shows that the nonlinear-ensemble model, constructed by combining multiple weak learners, effectively describes the nonlinear relationship between material features and target property. In addition, the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap. Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems. The dataset that supported the findings of this study is available in Science Data Bank, with the link https://www.doi.org/10.11922/sciencedb.01611.
|
Received: 19 February 2022
Revised: 12 March 2022
Accepted manuscript online:
|
PACS:
|
63.20.dk
|
(First-principles theory)
|
|
71.15.Mb
|
(Density functional theory, local density approximation, gradient and other corrections)
|
|
73.22.-f
|
(Electronic structure of nanoscale materials and related systems)
|
|
Fund: The project supported by the National Natural Science Foundation of China (Grants Nos.62125402 and 92061113).Calculations were performed in part at the high-performance computing center of Jilin University. |
Corresponding Authors:
Lijun Zhang,E-mail:lijun_zhang@jlu.edu.cn
E-mail: lijun_zhang@jlu.edu.cn
|
About author: 2022-3-14 |
Cite this article:
Ruoting Zhao(赵若廷), Bangyu Xing(邢邦昱), Huimin Mu(穆慧敏), Yuhao Fu(付钰豪), and Lijun Zhang(张立军) Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials 2022 Chin. Phys. B 31 056302
|
[1] Gasteiger J and Zupan J 1993 Angewandte Chemie International Edition in English 32 503 [2] Mater A C and Coote M L 2019 J. Chem. Inf. Model. 59 2545 [3] Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L and Zdeborová L 2019 Rev. Mod. Phys. 91 045002 [4] Wei J, Chu X, Sun X Y, Xu K, Deng H X, Chen J, Wei Z and Lei M 2019 InfoMat 1 338 [5] Chen A, Zhang X and Zhou Z 2020 InfoMat 2 553 [6] Lyu R, Moore C E, Liu T, Yu Y and Wu Y 2021 J. Am. Chem. Soc. 143 12766 [7] Jablonka K M, Ongari D, Moosavi S M and Smit B 2020 Chem. Rev. 120 8066 [8] Behler J 2021 Chem. Rev. 121 10037 [9] Chen C, Zuo Y, Ye W, Li X, Deng Z and Ong S P 2020 Adv. Energy Mater. 10 1903242 [10] Batra R, Song L and Ramprasad R 2021 Nat. Rev. Mater. 6 655 [11] Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K A 2013 APL Materials 1 011002 [12] Calderon C E, Plata J J, Toher C, Oses C, Levy O, Fornari M, Natan A, Mehl M J, Hart G, Buongiorno Nardelli M and Curtarolo S 2015 Computational Materials Science 108 233 [13] Xie T and Grossman J C 2018 Phys. Rev. Lett. 120 145301 [14] Bartók A P and Csányi G 2015 International Journal of Quantum Chemistry 115 1051 [15] Bartók A P, Kermode J, Bernstein N and Csányi G 2018 Phys. Rev. X 8 041048 [16] Unke O T, Chmiela S, Sauceda H E, Gastegger M, Poltavsky I, Schütt K T, Tkatchenko A and Müller K R 2021 Chem. Rev. 121 10142 [17] Musil F, Grisafi A, Bartók A P, Ortner C, Csányi G and Ceriotti M 2021 Chem. Rev. 121 9759 [18] Kireeva N and Pervov V S 2017 Phys. Chem. Chem. Phys. 19 20904 [19] Bajusz D, Rácz A and Héberger K 2015 Journal of Cheminformatics 7 20 [20] Tao Q, Xu P, Li M and Lu W 2021 npj Computational Materials 7 1 [21] Pollice R, dos Passos Gomes G, Aldeghi M, Hickman R J, Krenn M, Lavigne C, Lindner-D'Addario M, Nigam A, Ser C T, Yao Z and Aspuru-Guzik A 2021 Acc. Chem. Res. 54 849 [22] Gao C, Min X, Fang M, Tao T, Zheng X, Liu Y, Wu X and Huang Z 2022 Advanced Functional Materials 32 2108044 [23] Schmidt J, Marques M R G, Botti S and Marques M A L 2019 npj Comput. Mater. 5 1 [24] Liu Z, Na G, Tian F, Yu L, Li J and Zhang L 2020 InfoMat 2 879 [25] Luo S, Li T, Wang X, Faizan M and Zhang L 2021 WIREs Computational Molecular Science 11 e1489 [26] Curtarolo S, Ahmetcik E, Scheffler M, Ghiringhelli L M and Ouyang R 2018 Phys. Rev. Materials 2 083802 [27] Santosa F and Symes W W 1986 SIAM J. Sci. Stat. Comput. 7 1307 [28] Tibshirani R 1996 Journal of the Royal Statistical Society: Series B (Methodological) 58 267 [29] Shi D, Adinolfi V, Comin R, Yuan M, Alarousu E, Buin A, Chen Y, Hoogland S, Rothenberger A, Katsiev K, Losovyj Y, Zhang X, Dowben P A, Mohammed O F, Sargent E H and Bakr O M 2015 Science 347 519 [30] Dong Q, Fang Y, Shao Y, Mulligan P, Qiu J, Cao L and Huang J 2015 Science 347 967 [31] Stranks S D, Eperon G E, Grancini G, Menelaou C, Alcocer M J P, Leijtens T, Herz L M, Petrozza A and Snaith H J 2013 Science 342 341 [32] Tsai H, Nie W, Blancon J C, Stoumpos C C, Asadpour R, Harutyunyan B, Neukirch A J, Verduzco R, Crochet J J, Tretiak S, Pedesseau L, Even J, Alam M A, Gupta G, Lou J, Ajayan P M, Bedzyk M J, Kanatzidis M G and Mohite A D 2016 Nature 536 312 [33] Yin W J, Shi T and Yan Y 2017 J. Am. Chem. Soc. 139 2630 [35] Zhang C X, Shen T, Guo D, Tang L M, Yang K and Deng H X 2020 InfoMat 2 1034
[36] Zhao Y, Li C and Shen L 2019 InfoMat 1164 [37] Li T, Luo S, Wang X and Zhang L 2021 Adv. Mater. 332008574 [38] Xu Q, Yang D, Lv J, Sun Y Y and Zhang L 2018 Small Methods 21700316 [39] Wang X, Li T, Xing B, Faizan M, Biswas K and Zhang L 2021 J. Phys. Chem. Lett. 1210532 [40] Liu Z, Zhao X, Zunger A and Zhang L 2019 Advanced Electronic Materials 5 1900234 [41] Yang D, Lv J, Zhao X, Xu Q, Fu Y, Zhan Y, Zunger A and Zhang L 2017 Chem. Mater. 29 524 [42] Zhao X G, Yang D, Ren J C, Sun Y, Xiao Z and Zhang L 2018 Joule 2 1662 [43] Zhao D, Li T, Xu Q, Wang X and Zhang L 2019 Chin. Opt. 12 964 [44] Yang J X, Zhang P, Wang J P and Wei S H 2020 Chin. Phys. B 29 108401 [45] Jiang X W and Yin W J 2020 Chin. Phys. B 29 028803 [46] Lu S, Zhou Q, Ouyang Y, Guo Y, Li Q and Wang J 2018 Nat. Commun. 9 3405 [47] Li Z, Xu Q, Sun Q, Hou Z and Yin W J 2019 Advanced Functional Materials 29 1807280 [48] Ali A, Park H, Mall R, Aïssa B, Sanvito S, Bensmail H, Belaidi A and El-Mellouhi F 2020 Chem. Mater. 32 2998 [49] Talapatra A, Uberuaga B P, Stanek C R and Pilania G 2021 Chem. Mater. 33 845 [50] Weng B, Song Z, Zhu R, Yan Q, Sun Q, Grice C G, Yan Y and Yin W J 2020 Nat. Commun. 11 1 [51] Pimachev A K and Neogi S 2021 npj Comput. Mater. 7 1 [52] Ye W, Chen C, Wang Z, Chu I H and Ong S P 2018 Nat. Commun. 9 1 [53] Ma X Y, Lyu H Y, Hao K R, Zhao Y M, Qian X, Yan Q B and Su G 2021 Science Bulletin 66 233 [54] Kresse G and Furthmüller J 1996 Phys. Rev. B 54 11169 [55] Kresse G and Furthmüller J 1996 Computational Materials Science 6 15 [56] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett. 77 3865 [57] Grimme S 2006 Journal of Computational Chemistry 27 1787 [58] Yang D, Fu Y, Sun Y, Li Y, Wang K, Xiao Z, Biswas K and Zhang L 2021 Phys. Rev. Mater. 5 054603 [59] Tian F, Feng W, Xing B, He X, Saidi W A and Zhang L 2021 Advanced Energy and Sustainability Research 2 2100087 [60] Xu Q, Stroppa A, Lv J, Zhao X, Yang D, Biswas K and Zhang L 2019 Phys. Rev. Mater. 3 125401 [61] Zhao G, Xie J, Zhou K, Xing B, Wang X, Tian F, He X and Zhang L 2022 Chin. Phys. B 31 037104 [62] Wang X, Fu Y, Na G, Li H and Zhang L 2019 Acta Phys. Sin. 68 157101 (in Chinese) [63] Li Y, Na G, Luo S, He X and Zhang L 2020 Acta Phys. Chim. Sin. 37 2007015 [64] Zhao X G, Zhou K, Xing B, Zhao R, Luo S, Li T, Sun Y, Na G, Xie J, Yang X, Wang X, Wang X, He X, Lv J, Fu Y and Zhang L 2021 Science Bulletin 66 1973 [65] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M and Duchesnay é 2011 Journal of Machine Learning Research 12 2825 [66] Chen T and Guestrin C 2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 785-794 [67] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q and Liu T Y 2017 Advances in Neural Information Processing Systems (Vol. 30) [68] Bergstra J, Yamins D and Cox D 2013 Proceedings of the 30th International Conference on Machine Learning pp. 115-23 [69] Ong S P, Richards W D, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Computational Materials Science 68 314 [70] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput Mater 7 83 [71] Cai X, Zhang Y, Shi Z, Chen Y, Xia Y, Yu A, Xu Y, Xie F, Shao H, Zhu H, Fu D, Zhan Y and Zhang H 2022 Advanced Science 9 2103648 [72] Cai X, Zhang Y, Shi Z, Chen Y, Xia Y, Yu A, Xu Y, Xie F, Shao H, Zhu H, Fu D, Zhan Y and Zhang H Advanced Science n/a 2103648 [73] Gao W and Zhou Z H 2013 Artificial Intelligence 203 1 [74] Breiman L 1996 Mach Learn 24 123 [75] Rokach L 2010 Artif. Intell. Rev. 33 1 [76] Hecht-nielsen R 1992 Neural Networks for Perception (ed H Wechsler) pp. 65-93 |
No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|