中国物理B ›› 2026, Vol. 35 ›› Issue (1): 10301-010301.doi: 10.1088/1674-1056/ae1118

• • 上一篇    下一篇

Unveiling the physical meaning of transformer attention in neural network quantum states: A conditional mutual information perspective

Tianyu Ruan(阮天雨)1,2,†, Bowen Kan(阚博文)3,4,†, Yixuan Sun(孙艺轩)5,†, Honghui Shang(商红慧)5,‡, Shihua Zhang(张世华)1,2,§, and Jinlong Yang(杨金龙)5,¶   

  1. 1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
    2 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    4 University of Chinese Academy of Sciences, Beijing 101408, China;
    5 State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Anhui 230026, China
  • 收稿日期:2025-07-30 修回日期:2025-10-05 接受日期:2025-10-09 发布日期:2025-12-29
  • 通讯作者: Honghui Shang, Shihua Zhang, Jinlong Yang E-mail:shh@ustc.edu.cn;zsh@amss.ac.cn;jlyang@ustc.edu.cn
  • 基金资助:
    This work was supported by the National Natural Science Foundation of China (Grant No. T2222026), the CAS Project for Young Scientists in Basic Research (Grant No. YSBR-034) and the Robotic AIScientist Platform of the Chinese Academy of Sciences.

Unveiling the physical meaning of transformer attention in neural network quantum states: A conditional mutual information perspective

Tianyu Ruan(阮天雨)1,2,†, Bowen Kan(阚博文)3,4,†, Yixuan Sun(孙艺轩)5,†, Honghui Shang(商红慧)5,‡, Shihua Zhang(张世华)1,2,§, and Jinlong Yang(杨金龙)5,¶   

  1. 1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
    2 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    4 University of Chinese Academy of Sciences, Beijing 101408, China;
    5 State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Anhui 230026, China
  • Received:2025-07-30 Revised:2025-10-05 Accepted:2025-10-09 Published:2025-12-29
  • Contact: Honghui Shang, Shihua Zhang, Jinlong Yang E-mail:shh@ustc.edu.cn;zsh@amss.ac.cn;jlyang@ustc.edu.cn
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (Grant No. T2222026), the CAS Project for Young Scientists in Basic Research (Grant No. YSBR-034) and the Robotic AIScientist Platform of the Chinese Academy of Sciences.

摘要: Transformer-based neural-network quantum states (NNQS) have shown great promise in representing quantum many-body ground states, offering high flexibility and accuracy. However, the interpretability of such models remains limited, especially in terms of connecting network components to physically meaningful quantities. We propose that the attention mechanism — a central module in transformer architectures — explicitly models the conditional information flow between orbitals. Intuitively, as the transformer learns to predict orbital configurations by optimizing an energy functional, it approximates the conditional probability distribution $p(x_n|x_1,\ldots,x_{n-1})$, implicitly encoding conditional mutual information (CMI) among orbitals. This suggests a natural correspondence between attention maps and CMI structures in quantum systems. To probe this idea, we compare weighted attention scores from trained transformer wavefunction ansatze with CMI matrices across several representative small molecules. In most cases, we observe a positive rank-level correlation (Kendall's tau) between attention and CMI, suggesting that the learned attention can reflect physically relevant orbital dependencies. This study provides a quantitative link between transformer attention and conditional mutual information in the NNQS setting. Our results provide a step toward explainable deep learning in quantum chemistry, pointing to opportunities in interpreting attention as a proxy for physical correlations.

关键词: attention mechanism, quantum chemistry, many-body Schr¨odinger equation, entanglement entropy

Abstract: Transformer-based neural-network quantum states (NNQS) have shown great promise in representing quantum many-body ground states, offering high flexibility and accuracy. However, the interpretability of such models remains limited, especially in terms of connecting network components to physically meaningful quantities. We propose that the attention mechanism — a central module in transformer architectures — explicitly models the conditional information flow between orbitals. Intuitively, as the transformer learns to predict orbital configurations by optimizing an energy functional, it approximates the conditional probability distribution $p(x_n|x_1,\ldots,x_{n-1})$, implicitly encoding conditional mutual information (CMI) among orbitals. This suggests a natural correspondence between attention maps and CMI structures in quantum systems. To probe this idea, we compare weighted attention scores from trained transformer wavefunction ansatze with CMI matrices across several representative small molecules. In most cases, we observe a positive rank-level correlation (Kendall's tau) between attention and CMI, suggesting that the learned attention can reflect physically relevant orbital dependencies. This study provides a quantitative link between transformer attention and conditional mutual information in the NNQS setting. Our results provide a step toward explainable deep learning in quantum chemistry, pointing to opportunities in interpreting attention as a proxy for physical correlations.

Key words: attention mechanism, quantum chemistry, many-body Schr¨odinger equation, entanglement entropy

中图分类号:  (Entanglement and quantum nonlocality)

  • 03.65.Ud
03.67.-a (Quantum information) 07.05.Mh (Neural networks, fuzzy logic, artificial intelligence)