Please wait a minute...
Chin. Phys. B, 2026, Vol. 35(1): 010301    DOI: 10.1088/1674-1056/ae1118
SPECIAL TOPIC — AI + Physical Science Prev   Next  

Unveiling the physical meaning of transformer attention in neural network quantum states: A conditional mutual information perspective

Tianyu Ruan(阮天雨)1,2,†, Bowen Kan(阚博文)3,4,†, Yixuan Sun(孙艺轩)5,†, Honghui Shang(商红慧)5,‡, Shihua Zhang(张世华)1,2,§, and Jinlong Yang(杨金龙)5,¶
1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
2 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
3 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
4 University of Chinese Academy of Sciences, Beijing 101408, China;
5 State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Anhui 230026, China
Abstract  Transformer-based neural-network quantum states (NNQS) have shown great promise in representing quantum many-body ground states, offering high flexibility and accuracy. However, the interpretability of such models remains limited, especially in terms of connecting network components to physically meaningful quantities. We propose that the attention mechanism — a central module in transformer architectures — explicitly models the conditional information flow between orbitals. Intuitively, as the transformer learns to predict orbital configurations by optimizing an energy functional, it approximates the conditional probability distribution $p(x_n|x_1,\ldots,x_{n-1})$, implicitly encoding conditional mutual information (CMI) among orbitals. This suggests a natural correspondence between attention maps and CMI structures in quantum systems. To probe this idea, we compare weighted attention scores from trained transformer wavefunction ansatze with CMI matrices across several representative small molecules. In most cases, we observe a positive rank-level correlation (Kendall's tau) between attention and CMI, suggesting that the learned attention can reflect physically relevant orbital dependencies. This study provides a quantitative link between transformer attention and conditional mutual information in the NNQS setting. Our results provide a step toward explainable deep learning in quantum chemistry, pointing to opportunities in interpreting attention as a proxy for physical correlations.
Keywords:  attention mechanism      quantum chemistry      many-body Schr¨odinger equation      entanglement entropy  
Received:  30 July 2025      Revised:  05 October 2025      Accepted manuscript online:  09 October 2025
PACS:  03.65.Ud (Entanglement and quantum nonlocality)  
  03.67.-a (Quantum information)  
  07.05.Mh (Neural networks, fuzzy logic, artificial intelligence)  
Fund: This work was supported by the National Natural Science Foundation of China (Grant No. T2222026), the CAS Project for Young Scientists in Basic Research (Grant No. YSBR-034) and the Robotic AIScientist Platform of the Chinese Academy of Sciences.
Corresponding Authors:  Honghui Shang, Shihua Zhang, Jinlong Yang     E-mail:  shh@ustc.edu.cn;zsh@amss.ac.cn;jlyang@ustc.edu.cn

Cite this article: 

Tianyu Ruan(阮天雨), Bowen Kan(阚博文), Yixuan Sun(孙艺轩), Honghui Shang(商红慧), Shihua Zhang(张世华), and Jinlong Yang(杨金龙) Unveiling the physical meaning of transformer attention in neural network quantum states: A conditional mutual information perspective 2026 Chin. Phys. B 35 010301

[1] Carleo G and Troyer M 2017 Science 355 602
[2] Pfau D, Spencer J S, Matthews A G and Foulkes W M C 2020 Phys. Rev. Res. 2 033429
[3] Hermann J, Schatzle Z and Noe F 2020 Nat. Chem. 12 891
[4] Choo K, Neupert T and Carleo G 2019 Phys. Rev. B 100 125124
[5] Choo K, Mezzacapo A and Carleo G 2020 Nat. Commun. 11 2368
[6] Pescia G, Nys J, Kim J, Lovato A and Carleo G 2024 Phys. Rev. B 110 035108
[7] Barrett T D, Malyshev A and Lvovsky A 2022 Nature Machine Intelligence 4 351
[8] Szabo A and Ostlund N S 1996 Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory (Courier Corporation, Mineola, NY)
[9] Roos B O, Taylor P R and Sigbahn P E 1980 Chemical Physics 48 157
[10] Bartlett R J and Musiał M 2007 Rev. Mod. Phys. 79 291
[11] Wang T, Chen J, Teng J, Shi J, Zeng X and Snoussi H 2023 Chin. Phys. B 32 090703
[12] Yang P, Lu P and Zhang T 2023 Chin. Phys. B 32 058902
[13] Chen Q H, Ji Y X, Wang K H, Ma H Y and Ji N H 2024 Chin. Phys. B 33 060314
[14] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I 2017 Advances in Neural Information Processing Systems 30
[15] von Glehn I, Spencer J S and Pfau D 2022 arXiv preprint arXiv:2211.13672
[16] Wu Y, Guo C, Fan Y, Zhou P and Shang H 2023 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23)
[17] Geier M, Nazaryan K, Zaklama T and Fu L 2025 arXiv preprint arXiv:2502.05383
[18] Gu Y, Li W, Lin H, Zhan B, Li R, Huang Y, He D, Wu Y, Xiang T, Qin M, et al. 2025 arXiv preprint arXiv:2507.02644
[19] Legeza O and S ølyom J 2003 Phys. Rev. B 68 195116
[20] Rissler Jr m, Noack R M and White S R 2006 Chem. Phys. 323 519
[21] Boguslawski K, Tecmer P, Barcza G, Ors Legeza and Reiher M 2013 Journal of Chemical Theory and Computation 7 2959
[22] Szalay S, Pfeffer M, Murg V, Barcza G, Verstraete F, Schneider R and Legeza O 2015 International Journal of Quantum Chemistry 115 1342
[23] von Neumann J 1955 Mathematical Foundations of Quantum Mechanics (Princeton, NJ: Princeton University Press)
[24] Shannon C E 1948 The Bell system Technical Journal 27 379
[25] Boguslawski K and Tecmer P 2015 International Journal of Quantum Chemistry 19 1289
[26] Jetley S, Lord N A, Lee N and Torr P H S 2018 arXiv preprint arXiv:1804.02391
[27] Zhou B, Khosla A, Lapedriza A, Oliva A and Torralba A 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929
[28] Chefer H, Gur S and Wolf L 2021 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 782– 791
[29] Kendall M G 1938 Biometrika 30 81
[1] Decoding topological XYZ2 codes with reinforcement learning based on attention mechanisms
Qing-Hui Chen(陈庆辉), Yu-Xin Ji(姬宇欣), Ke-Han Wang(王柯涵), Hong-Yang Ma(马鸿洋), and Nai-Hua Ji(纪乃华). Chin. Phys. B, 2024, 33(6): 060314.
[2] Quantum-mechanical understanding on structure dependence of image potentials of single-walled boron nitride nanotubes
Yu Zhang(张煜), Zhiman Zhang(张芷蔓), Weiliang Wang(王伟良), Shaolin Zhang(张绍林), and Haiming Huang(黄海鸣). Chin. Phys. B, 2024, 33(12): 128501.
[3] A simple semiempirical model for the static polarizability of ions
Alexander S Sharipov and Boris I Loukhovitski. Chin. Phys. B, 2023, 32(8): 083301.
[4] Corrigendum to “Peptide backbone-copper ring structure: A molecular insight into copper-induced amyloid toxicity”
Jing Wang(王静), Xiankai Jiang(姜先凯), Xiurong Su(苏秀榕), Xingfei Zhou(周星飞), Yu Wang(王宇), Geng Wang(王耿), Heping Geng(耿和平), Zheng Jiang(姜政), Fang Huang(黄方), Gang Chen(陈刚), Chunlei Wang(王春雷), and Haiping Fang(方海平). Chin. Phys. B, 2023, 32(6): 069901.
[5] A simple semiempirical model for the static polarizability of electronically excited atoms and molecules
Alexander S Sharipov, Alexey V Pelevkin, and Boris I Loukhovitski. Chin. Phys. B, 2023, 32(4): 043301.
[6] Variational quantum eigensolvers by variance minimization
Dan-Bo Zhang(张旦波), Bin-Lin Chen(陈彬琳), Zhan-Hao Yuan(原展豪), and Tao Yin(殷涛). Chin. Phys. B, 2022, 31(12): 120301.
[7] Accurate theoretical evaluation of strain energy of all-carboatomic ring (cyclo[2n]carbon), boron nitride ring, and cyclic polyacetylene
Tian Lu(卢天), Zeyu Liu(刘泽玉), and Qinxue Chen(陈沁雪). Chin. Phys. B, 2022, 31(12): 126101.
[8] Peptide backbone-copper ring structure: A molecular insight into copper-induced amyloid toxicity
Jing Wang(王静), Hua Li(李华), Xiankai Jiang(姜先凯), Bin Wu(吴斌), Jun Guo(郭俊), Xiurong Su(苏秀榕), Xingfei Zhou(周星飞), Yu Wang(王宇), Geng Wang(王耿), Heping Geng(耿和平), Zheng Jiang(姜政), Fang Huang(黄方), Gang Chen(陈刚), Chunlei Wang(王春雷), Haiping Fang(方海平), and Chenqi Xu(许琛琦). Chin. Phys. B, 2022, 31(10): 108702.
[9] Differentiable programming and density matrix based Hartree-Fock method
Hong-Bin Ren(任宏斌), Lei Wang(王磊), and Xi Dai(戴希). Chin. Phys. B, 2021, 30(6): 060701.
[10] Symmetry and size effects on energy and entanglement of an exciton in coupled quantum dots
Shen Man (沈曼), Bai Yan-Kui (白彦魁), An Xing-Tao (安兴涛), Liu Jian-Jun (刘建军). Chin. Phys. B, 2013, 22(4): 047101.
[11] Effects of the structural order of canthaxanthin on the Raman scattering cross section in various solvents: A study by Raman spectroscopy and ab initio calculation
Wu Nan-Nan (吴楠楠), Li Zuo-Wei (里佐威), Liu Jing-Yao (刘靖尧), Ou Yang Shun-Li (欧阳顺利). Chin. Phys. B, 2012, 21(10): 103101.
[12] Structure formation of entanglement entropy in a system of two superconducting qubits coupled with an LC-resonator
Ge Guo-Qin(葛国勤), Qin Cui(覃翠), Yin Miao(尹淼), and Huang Yong-Hua(黄勇华). Chin. Phys. B, 2011, 20(8): 080304.
[13] Calculating the dielectric anisotropy of nematic liquid crystals: a reinvestigation of the Maier--Meier theory
Zhang Ran(张然), He Jun(何军), Peng Zeng-Hui(彭增辉), and Xuan Li(宣丽). Chin. Phys. B, 2009, 18(7): 2885-2892.
No Suggested Reading articles found!