Special Issue:
Featured Column — COMPUTATIONAL PROGRAMS FOR PHYSICS
|
COMPUTATIONAL PROGRAMS FOR PHYSICS |
Prev
Next
|
|
|
MatChat: A large language model and application service platform for materials science |
Zi-Yi Chen(陈子逸)1,2,†, Fan-Kai Xie(谢帆恺)3,4,†, Meng Wan(万萌)1,†, Yang Yuan(袁扬)1,2, Miao Liu(刘淼)3,5,6,‡, Zong-Guo Wang(王宗国)1,2,§, Sheng Meng(孟胜)3,5, and Yan-Gang Wang(王彦棡)1,2 |
1 Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China; 2 University of Chinese Academy of Sciences, Beijing 100049, China; 3 Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; 4 School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100190, China; 5 Songshan Lake Materials Laboratory, Dongguan 523808, China; 6 Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract The prediction of chemical synthesis pathways plays a pivotal role in materials science research. Challenges, such as the complexity of synthesis pathways and the lack of comprehensive datasets, currently hinder our ability to predict these chemical processes accurately. However, recent advancements in generative artificial intelligence (GAI), including automated text generation and question-answering systems, coupled with fine-tuning techniques, have facilitated the deployment of large-scale AI models tailored to specific domains. In this study, we harness the power of the LLaMA2-7B model and enhance it through a learning process that incorporates 13878 pieces of structured material knowledge data. This specialized AI model, named MatChat, focuses on predicting inorganic material synthesis pathways. MatChat exhibits remarkable proficiency in generating and reasoning with knowledge in materials science. Although MatChat requires further refinement to meet the diverse material design needs, this research undeniably highlights its impressive reasoning capabilities and innovative potential in materials science. MatChat is now accessible online and open for use, with both the model and its application framework available as open source. This study establishes a robust foundation for collaborative innovation in the integration of generative AI in materials science.
|
Received: 11 October 2023
Revised: 18 October 2023
Accepted manuscript online: 19 October 2023
|
PACS:
|
81.05.Zx
|
(New materials: theory, design, and fabrication)
|
|
01.50.hv
|
(Computer software and software reviews)
|
|
81.16.Be
|
(Chemical synthesis methods)
|
|
Fund: This work was supported by the Informatization Plan of the Chinese Academy of Sciences (Grant No. CASWX2023SF-0101), the Key Research Program of Frontier Sciences, CAS (Grant No. ZDBS-LY-7025), the Youth Innovation Promotion Association CAS (Grant No. 2021167), and the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDB33020000). |
Corresponding Authors:
Miao Liu, Zong-Guo Wang
E-mail: mliu@iphy.ac.cn;wangzg@cnic.cn
|
Cite this article:
Zi-Yi Chen(陈子逸), Fan-Kai Xie(谢帆恺), Meng Wan(万萌), Yang Yuan(袁扬), Miao Liu(刘淼), Zong-Guo Wang(王宗国), Sheng Meng(孟胜), and Yan-Gang Wang(王彦棡) MatChat: A large language model and application service platform for materials science 2023 Chin. Phys. B 32 118104
|
[1] OpenAI 2023 arXiv:2303.08774[cs.CL] [2] Du Z X, Qian Y J, Liu X, Ding M, Qiu J Z, Yang Z L and Tang J 2022 Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers) (Dublin, Association for Computational Linguistics) pp. 320-335 [3] Zeng A H, Liu X, Du Z X, Wang Z H, Lai H Y, Ding M, Yang Z Y, Xu Y F, Zheng W D, Xia X, Weng L T, Ma Z X, Xue Y F, Zhai J D, Chen W G, Liu Z Y, Zhang P, Dong Y X and Tang J 2022 arXiv:2210.02414[cs.CL] [4] Sun Y, Wang S H, Li Y K, Feng S K, Chen X Y, Zhang H, Tian X, Zhu D X, Tian H and Wu H 2019 arXiv:1904.09223[cs.CL] [5] Sun Y, Wang S H, Li Y K, Feng S K, Tian H, Wu H and Wang H F 2020 The Thirty-Fourth AAAI Conference on Artificial Intelligence (California:AAAI Press, Palo Alto) pp. 8968-8975 [6] Sun Y, Wang S H, Feng S K, Ding S Y, Pang C, Shang J Y, Liu J Y, Chen X Y, Zhang H, Zhao Y B, Lu Y X, Liu W X, Wu Z H, Gong W B, Liang J Z, Shang Z Z, Sun P, Liu W, Ouyang X, Yu D H, Tian H, etc. 2021 arXiv:2107.02137[cs.CL] [7] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Roziére B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E and Lample G 2023 arXiv:2302.13971[cs.CL] [8] Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, Bikel D, Blecher L, Ferrer C C, Chen M Y, Cucurull G, Esiobu D, Fernandes J, Fu J, Fu W Y and Fuller B 2023 arXiv:2307.09288[cs.CL] [9] Yang P, Wang J J, Gan R Y, Zhu X Y, Zhang L, Wu Z W, Gao X Y, Zhang J X and Sakai T 2022 arXiv:2210.08590[cs.CL] [10] Zhang H B, Chen J Y, Jiang F, Yu F, Chen Z H, Li J Q, Chen G M, Wu X B, Zhang Z Y, Xiao Q Y, Wan X, Wang B Y and Li H Z 2023 arXiv:2305.15075[cs.CL] [11] Xiong H L, Wang S, Zhu Y T, Zhao Z H, Liu Y X, Wang Q and Shen D G 2023 arXiv:2304.01097[cs.CL] [12] Zhang X Y, Yang Q and Xu D L 2023 arXiv:2305.12002[cs.CL] [13] Dan Y H, Lei Z K, Gu Y Y, Li Y, Yin J H, Lin J J, Ye L H, Tie Z Y, Zhou Y G, Wang Y L, Zhou A M, Zhou Z, Chen Q, Zhou J, He L and Qiu X P 2023 arXiv:2308.02773[cs.CL] [14] Wang J J, Zhang Y X, Zhang L, Yang P, Gao X Y, Wu Z W, Dong X Q, and He J Q, Zhuo J H, Yang Q, Huang Y F, Li X Y, Wu, Y H, Lu J Y, Zhu X Y, Chen W F, Han T, Pan K H, Wang R, Wang H, et al. 2022 CoRR abs/2209.02970 [15] Xie F K, Lu T L, Yu Z, Wang Y X, Wang Z G, Meng S and Liu M 2023 Chin. Phys. Lett. 40 057401 [16] Xie F K, Lu T L, Yu Z, Wang Y X, Wang Z G, Meng S and Liu M 2023 Chin. Phys. Lett. 40 117101 [17] Jiang Y T, Yu Z, Wang Y X, Lu T L, Meng S, Jiang K, and Liu M. 2022 Chin. Phys. Lett. 39 047402 [18] Cheng Z and Yu Z H 2021 Chin. Phys. Lett. 38 070302 [19] Bai S C, Tang Y C and Ran S J 2022 Chin. Phys. Lett. 39 100701 [20] Ren H B, Wang L and Dai X 2021 Chin. Phys. Lett. 38 050701 [21] Jia H X, Horton M, Wang Y N, Zhang S J, Persson K A, Meng S and Liu M 2022 Adv. Sci. 9 2202756 [22] Liu M and Meng S 2023 Scientia Sinica Chimica 53 19 [23] Saal J E, Kirklin S, Aykol M, Meredig B and Wolverton C 2013 JOM 65 1501 [24] Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G and Persson K A 2013 APL Mater. 1 011002 [25] Liang Y Z, Chen M W, Wang Y N, Jia H X, Lu T L, Xie F K, Cai G H, Wang Z G, Meng S and Liu M 2023 Sci. China. Mater. 66 343 [26] Liu Z W, Guo J L, Chen Z Y, Wang Z G, Sun Z N, Li X W and Wang Y G 2022 Comp. Mater. Sci. 214 111699 [27] Guo J L, Chen Z Y, Liu Z W, Li X W, Xie Z Y, Wang Z G and Wang Y G 2022 Sci. Rep. 12 15326 [28] Gupta T, Zaki M, Krishnan N A and Mausam 2022 npj Comput. Mater. 8 102 [29] Devlin J, Chang M W, Lee K and Toutanova K 2019 Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis:Association for Computational Linguistics) pp. 4171-4186 [30] Wang Z R, Kononova O, Cruse K, He T J, Huo H Y, Fei Y X, Zeng Y, Sun Y Z, Cai Z J, Sun W H and Ceder G 2022 Sci. Data 9 231 [31] Hu Edward J., Shen Y L, Phillip Wallis, Allen-Zhu Z Y, Li Y Z, Wang S A, Wang L and Chen W Z 2021 arXiv:2106.09685[cs.CL] [32] Lucacel R C, Ponta O, Licarete, E, Radu T and Simon V 2016 J. Non-Crystalline Solids 439 67 [33] Annapurna K, Dwivedi RN, Kundu P and Buddhudu S 2003 Mater. Lett. 57 2095 |
No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|