| SPECIAL TOPIC — AI + Physical Science |
Prev
Next
|
|
|
Unveiling the physical meaning of transformer attention in neural network quantum states: A conditional mutual information perspective |
| Tianyu Ruan(阮天雨)1,2,†, Bowen Kan(阚博文)3,4,†, Yixuan Sun(孙艺轩)5,†, Honghui Shang(商红慧)5,‡, Shihua Zhang(张世华)1,2,§, and Jinlong Yang(杨金龙)5,¶ |
1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; 2 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; 3 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 4 University of Chinese Academy of Sciences, Beijing 101408, China; 5 State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Anhui 230026, China |
|
|
|
|
Abstract Transformer-based neural-network quantum states (NNQS) have shown great promise in representing quantum many-body ground states, offering high flexibility and accuracy. However, the interpretability of such models remains limited, especially in terms of connecting network components to physically meaningful quantities. We propose that the attention mechanism — a central module in transformer architectures — explicitly models the conditional information flow between orbitals. Intuitively, as the transformer learns to predict orbital configurations by optimizing an energy functional, it approximates the conditional probability distribution $p(x_n|x_1,\ldots,x_{n-1})$, implicitly encoding conditional mutual information (CMI) among orbitals. This suggests a natural correspondence between attention maps and CMI structures in quantum systems. To probe this idea, we compare weighted attention scores from trained transformer wavefunction ansatze with CMI matrices across several representative small molecules. In most cases, we observe a positive rank-level correlation (Kendall's tau) between attention and CMI, suggesting that the learned attention can reflect physically relevant orbital dependencies. This study provides a quantitative link between transformer attention and conditional mutual information in the NNQS setting. Our results provide a step toward explainable deep learning in quantum chemistry, pointing to opportunities in interpreting attention as a proxy for physical correlations.
|
Received: 30 July 2025
Revised: 05 October 2025
Accepted manuscript online: 09 October 2025
|
|
PACS:
|
03.65.Ud
|
(Entanglement and quantum nonlocality)
|
| |
03.67.-a
|
(Quantum information)
|
| |
07.05.Mh
|
(Neural networks, fuzzy logic, artificial intelligence)
|
|
| Fund: This work was supported by the National Natural Science Foundation of China (Grant No. T2222026), the CAS Project for Young Scientists in Basic Research (Grant No. YSBR-034) and the Robotic AIScientist Platform of the Chinese Academy of Sciences. |
Corresponding Authors:
Honghui Shang, Shihua Zhang, Jinlong Yang
E-mail: shh@ustc.edu.cn;zsh@amss.ac.cn;jlyang@ustc.edu.cn
|
Cite this article:
Tianyu Ruan(阮天雨), Bowen Kan(阚博文), Yixuan Sun(孙艺轩), Honghui Shang(商红慧), Shihua Zhang(张世华), and Jinlong Yang(杨金龙) Unveiling the physical meaning of transformer attention in neural network quantum states: A conditional mutual information perspective 2026 Chin. Phys. B 35 010301
|
[1] Carleo G and Troyer M 2017 Science 355 602 [2] Pfau D, Spencer J S, Matthews A G and Foulkes W M C 2020 Phys. Rev. Res. 2 033429 [3] Hermann J, Schatzle Z and Noe F 2020 Nat. Chem. 12 891 [4] Choo K, Neupert T and Carleo G 2019 Phys. Rev. B 100 125124 [5] Choo K, Mezzacapo A and Carleo G 2020 Nat. Commun. 11 2368 [6] Pescia G, Nys J, Kim J, Lovato A and Carleo G 2024 Phys. Rev. B 110 035108 [7] Barrett T D, Malyshev A and Lvovsky A 2022 Nature Machine Intelligence 4 351 [8] Szabo A and Ostlund N S 1996 Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory (Courier Corporation, Mineola, NY) [9] Roos B O, Taylor P R and Sigbahn P E 1980 Chemical Physics 48 157 [10] Bartlett R J and Musiał M 2007 Rev. Mod. Phys. 79 291 [11] Wang T, Chen J, Teng J, Shi J, Zeng X and Snoussi H 2023 Chin. Phys. B 32 090703 [12] Yang P, Lu P and Zhang T 2023 Chin. Phys. B 32 058902 [13] Chen Q H, Ji Y X, Wang K H, Ma H Y and Ji N H 2024 Chin. Phys. B 33 060314 [14] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I 2017 Advances in Neural Information Processing Systems 30 [15] von Glehn I, Spencer J S and Pfau D 2022 arXiv preprint arXiv:2211.13672 [16] Wu Y, Guo C, Fan Y, Zhou P and Shang H 2023 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23) [17] Geier M, Nazaryan K, Zaklama T and Fu L 2025 arXiv preprint arXiv:2502.05383 [18] Gu Y, Li W, Lin H, Zhan B, Li R, Huang Y, He D, Wu Y, Xiang T, Qin M, et al. 2025 arXiv preprint arXiv:2507.02644 [19] Legeza O and S ølyom J 2003 Phys. Rev. B 68 195116 [20] Rissler Jr m, Noack R M and White S R 2006 Chem. Phys. 323 519 [21] Boguslawski K, Tecmer P, Barcza G, Ors Legeza and Reiher M 2013 Journal of Chemical Theory and Computation 7 2959 [22] Szalay S, Pfeffer M, Murg V, Barcza G, Verstraete F, Schneider R and Legeza O 2015 International Journal of Quantum Chemistry 115 1342 [23] von Neumann J 1955 Mathematical Foundations of Quantum Mechanics (Princeton, NJ: Princeton University Press) [24] Shannon C E 1948 The Bell system Technical Journal 27 379 [25] Boguslawski K and Tecmer P 2015 International Journal of Quantum Chemistry 19 1289 [26] Jetley S, Lord N A, Lee N and Torr P H S 2018 arXiv preprint arXiv:1804.02391 [27] Zhou B, Khosla A, Lapedriza A, Oliva A and Torralba A 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929 [28] Chefer H, Gur S and Wolf L 2021 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 782– 791 [29] Kendall M G 1938 Biometrika 30 81 |
| No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|