1 State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE, Jilin Provincial International Cooperation Key Laboratory of High-Efficiency Clean Energy Materials, Electron Microscopy Center, and School of Materials Science and Engineering, Jilin University, Changchun 130012, China; 2 State Key Laboratory of Superhard Materials, College of Physics, Jilin University, Changchun 130012, China; 3 International Center of Computational Method and Software, Jilin University, Changchun 130012, China
Abstract With the rapid development of artificial intelligence and machine learning (ML) methods, materials science is rapidly entering the era of data-driven materials informatics. ML models serve as the most crucial component, closely bridging material structure and material properties. There is a considerable difference in the prediction performance of different ML methods for material systems. Herein, we evaluated three categories (linear, kernel, and nonlinear methods) of models, with twelve ML algorithms commonly used in the materials field. In addition, halide perovskite was chosen as an example to evaluate the fitting performance of different models. We constructed a total dataset of 540 halide perovskites and 72 features, with formation energy and bandgap as target properties. We found that different categories of ML models show similar trends for different target properties. Among them, the difference between the models is enormous for the formation energy, with the coefficient of determination (R2) range 0.69-0.953. The fitting performance between the models is closer for bandgap, with the R2 range 0.941-0.997. The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap. It shows that the nonlinear-ensemble model, constructed by combining multiple weak learners, effectively describes the nonlinear relationship between material features and target property. In addition, the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap. Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems. The dataset that supported the findings of this study is available in Science Data Bank, with the link https://www.doi.org/10.11922/sciencedb.01611.
(Electronic structure of nanoscale materials and related systems)
Fund: The project supported by the National Natural Science Foundation of China (Grants Nos.62125402 and 92061113).Calculations were performed in part at the high-performance computing center of Jilin University.
Ruoting Zhao(赵若廷), Bangyu Xing(邢邦昱), Huimin Mu(穆慧敏), Yuhao Fu(付钰豪), and Lijun Zhang(张立军) Evaluation of performance of machine learning methods in mining structure—property data of halide perovskite materials 2022 Chin. Phys. B 31 056302
[1] Gasteiger J and Zupan J 1993 Angewandte Chemie International Edition in English32 503 [2] Mater A C and Coote M L 2019 J. Chem. Inf. Model.59 2545 [3] Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L and Zdeborová L 2019 Rev. Mod. Phys.91 045002 [4] Wei J, Chu X, Sun X Y, Xu K, Deng H X, Chen J, Wei Z and Lei M 2019 InfoMat1 338 [5] Chen A, Zhang X and Zhou Z 2020 InfoMat2 553 [6] Lyu R, Moore C E, Liu T, Yu Y and Wu Y 2021 J. Am. Chem. Soc.143 12766 [7] Jablonka K M, Ongari D, Moosavi S M and Smit B 2020 Chem. Rev.120 8066 [8] Behler J 2021 Chem. Rev.121 10037 [9] Chen C, Zuo Y, Ye W, Li X, Deng Z and Ong S P 2020 Adv. Energy Mater.10 1903242 [10] Batra R, Song L and Ramprasad R 2021 Nat. Rev. Mater.6 655 [11] Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K A 2013 APL Materials1 011002 [12] Calderon C E, Plata J J, Toher C, Oses C, Levy O, Fornari M, Natan A, Mehl M J, Hart G, Buongiorno Nardelli M and Curtarolo S 2015 Computational Materials Science108 233 [13] Xie T and Grossman J C 2018 Phys. Rev. Lett.120 145301 [14] Bartók A P and Csányi G 2015 International Journal of Quantum Chemistry115 1051 [15] Bartók A P, Kermode J, Bernstein N and Csányi G 2018 Phys. Rev. X8 041048 [16] Unke O T, Chmiela S, Sauceda H E, Gastegger M, Poltavsky I, Schütt K T, Tkatchenko A and Müller K R 2021 Chem. Rev.121 10142 [17] Musil F, Grisafi A, Bartók A P, Ortner C, Csányi G and Ceriotti M 2021 Chem. Rev.121 9759 [18] Kireeva N and Pervov V S 2017 Phys. Chem. Chem. Phys.19 20904 [19] Bajusz D, Rácz A and Héberger K 2015 Journal of Cheminformatics7 20 [20] Tao Q, Xu P, Li M and Lu W 2021 npj Computational Materials7 1 [21] Pollice R, dos Passos Gomes G, Aldeghi M, Hickman R J, Krenn M, Lavigne C, Lindner-D'Addario M, Nigam A, Ser C T, Yao Z and Aspuru-Guzik A 2021 Acc. Chem. Res.54 849 [22] Gao C, Min X, Fang M, Tao T, Zheng X, Liu Y, Wu X and Huang Z 2022 Advanced Functional Materials32 2108044 [23] Schmidt J, Marques M R G, Botti S and Marques M A L 2019 npj Comput. Mater.5 1 [24] Liu Z, Na G, Tian F, Yu L, Li J and Zhang L 2020 InfoMat2 879 [25] Luo S, Li T, Wang X, Faizan M and Zhang L 2021 WIREs Computational Molecular Science11 e1489 [26] Curtarolo S, Ahmetcik E, Scheffler M, Ghiringhelli L M and Ouyang R 2018 Phys. Rev. Materials2 083802 [27] Santosa F and Symes W W 1986 SIAM J. Sci. Stat. Comput.7 1307 [28] Tibshirani R 1996 Journal of the Royal Statistical Society: Series B (Methodological)58 267 [29] Shi D, Adinolfi V, Comin R, Yuan M, Alarousu E, Buin A, Chen Y, Hoogland S, Rothenberger A, Katsiev K, Losovyj Y, Zhang X, Dowben P A, Mohammed O F, Sargent E H and Bakr O M 2015 Science347 519 [30] Dong Q, Fang Y, Shao Y, Mulligan P, Qiu J, Cao L and Huang J 2015 Science347 967 [31] Stranks S D, Eperon G E, Grancini G, Menelaou C, Alcocer M J P, Leijtens T, Herz L M, Petrozza A and Snaith H J 2013 Science342 341 [32] Tsai H, Nie W, Blancon J C, Stoumpos C C, Asadpour R, Harutyunyan B, Neukirch A J, Verduzco R, Crochet J J, Tretiak S, Pedesseau L, Even J, Alam M A, Gupta G, Lou J, Ajayan P M, Bedzyk M J, Kanatzidis M G and Mohite A D 2016 Nature536 312 [33] Yin W J, Shi T and Yan Y 2017 J. Am. Chem. Soc.139 2630 [35] Zhang C X, Shen T, Guo D, Tang L M, Yang K and Deng H X 2020 InfoMat2 1034
[36] Zhao Y, Li C and Shen L 2019 InfoMat 1164 [37] Li T, Luo S, Wang X and Zhang L 2021 Adv. Mater. 332008574 [38] Xu Q, Yang D, Lv J, Sun Y Y and Zhang L 2018 Small Methods 21700316 [39] Wang X, Li T, Xing B, Faizan M, Biswas K and Zhang L 2021 J. Phys. Chem. Lett. 1210532 [40] Liu Z, Zhao X, Zunger A and Zhang L 2019 Advanced Electronic Materials 5 1900234 [41] Yang D, Lv J, Zhao X, Xu Q, Fu Y, Zhan Y, Zunger A and Zhang L 2017 Chem. Mater. 29 524 [42] Zhao X G, Yang D, Ren J C, Sun Y, Xiao Z and Zhang L 2018 Joule2 1662 [43] Zhao D, Li T, Xu Q, Wang X and Zhang L 2019 Chin. Opt.12 964 [44] Yang J X, Zhang P, Wang J P and Wei S H 2020 Chin. Phys. B29 108401 [45] Jiang X W and Yin W J 2020 Chin. Phys. B29 028803 [46] Lu S, Zhou Q, Ouyang Y, Guo Y, Li Q and Wang J 2018 Nat. Commun.9 3405 [47] Li Z, Xu Q, Sun Q, Hou Z and Yin W J 2019 Advanced Functional Materials 29 1807280 [48] Ali A, Park H, Mall R, Aïssa B, Sanvito S, Bensmail H, Belaidi A and El-Mellouhi F 2020 Chem. Mater. 32 2998 [49] Talapatra A, Uberuaga B P, Stanek C R and Pilania G 2021 Chem. Mater.33 845 [50] Weng B, Song Z, Zhu R, Yan Q, Sun Q, Grice C G, Yan Y and Yin W J 2020 Nat. Commun.11 1 [51] Pimachev A K and Neogi S 2021 npj Comput. Mater.7 1 [52] Ye W, Chen C, Wang Z, Chu I H and Ong S P 2018 Nat. Commun.9 1 [53] Ma X Y, Lyu H Y, Hao K R, Zhao Y M, Qian X, Yan Q B and Su G 2021 Science Bulletin66 233 [54] Kresse G and Furthmüller J 1996 Phys. Rev. B54 11169 [55] Kresse G and Furthmüller J 1996 Computational Materials Science6 15 [56] Perdew J P, Burke K and Ernzerhof M 1996 Phys. Rev. Lett.77 3865 [57] Grimme S 2006 Journal of Computational Chemistry27 1787 [58] Yang D, Fu Y, Sun Y, Li Y, Wang K, Xiao Z, Biswas K and Zhang L 2021 Phys. Rev. Mater.5 054603 [59] Tian F, Feng W, Xing B, He X, Saidi W A and Zhang L 2021 Advanced Energy and Sustainability Research 2 2100087 [60] Xu Q, Stroppa A, Lv J, Zhao X, Yang D, Biswas K and Zhang L 2019 Phys. Rev. Mater. 3 125401 [61] Zhao G, Xie J, Zhou K, Xing B, Wang X, Tian F, He X and Zhang L 2022 Chin. Phys. B31 037104 [62] Wang X, Fu Y, Na G, Li H and Zhang L 2019 Acta Phys. Sin. 68 157101 (in Chinese) [63] Li Y, Na G, Luo S, He X and Zhang L 2020 Acta Phys. Chim. Sin. 37 2007015 [64] Zhao X G, Zhou K, Xing B, Zhao R, Luo S, Li T, Sun Y, Na G, Xie J, Yang X, Wang X, Wang X, He X, Lv J, Fu Y and Zhang L 2021 Science Bulletin66 1973 [65] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M and Duchesnay é 2011 Journal of Machine Learning Research12 2825 [66] Chen T and Guestrin C 2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 785-794 [67] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q and Liu T Y 2017 Advances in Neural Information Processing Systems (Vol. 30) [68] Bergstra J, Yamins D and Cox D 2013 Proceedings of the 30th International Conference on Machine Learning pp. 115-23 [69] Ong S P, Richards W D, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier V L, Persson K A and Ceder G 2013 Computational Materials Science68 314 [70] De Breuck P P, Hautier G and Rignanese G M 2021 npj Comput Mater7 83 [71] Cai X, Zhang Y, Shi Z, Chen Y, Xia Y, Yu A, Xu Y, Xie F, Shao H, Zhu H, Fu D, Zhan Y and Zhang H 2022 Advanced Science9 2103648 [72] Cai X, Zhang Y, Shi Z, Chen Y, Xia Y, Yu A, Xu Y, Xie F, Shao H, Zhu H, Fu D, Zhan Y and Zhang H Advanced Sciencen/a 2103648 [73] Gao W and Zhou Z H 2013 Artificial Intelligence203 1 [74] Breiman L 1996 Mach Learn24 123 [75] Rokach L 2010 Artif. Intell. Rev.33 1 [76] Hecht-nielsen R 1992 Neural Networks for Perception (ed H Wechsler) pp. 65-93
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.