1 Beijing Academy of Quantum Information Sciences, Beijing 100193, China; 2 State Key Laboratory of Low Dimensional Quantum Physics, Department of Physics, Tsinghua University, Beijing 100084, China; 3 Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China; 4 Frontier Science Center for Quantum Information, Beijing 100184, China; 5 School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100190, China; 6 CAS Center for Excellence in Topological Quantum Computation, University of Chinese Academy of Sciences, Beijing 100190, China; 7 School of Mathematical Sciences, Nankai University, Tianjin 300071, China
Abstract With the rapid advancement of quantum computing, hybrid quantum-classical machine learning has shown numerous potential applications at the current stage, with expectations of being achievable in the noisy intermediate-scale quantum (NISQ) era. Quantum reinforcement learning, as an indispensable study, has recently demonstrated its ability to solve standard benchmark environments with formally provable theoretical advantages over classical counterparts. However, despite the progress of quantum processors and the emergence of quantum computing clouds, implementing quantum reinforcement learning algorithms utilizing parameterized quantum circuits (PQCs) on NISQ devices remains infrequent. In this work, we take the first step towards executing benchmark quantum reinforcement problems on real devices equipped with at most 136 qubits on the BAQIS Quafu quantum computing cloud. The experimental results demonstrate that the policy agents can successfully accomplish objectives under modified conditions in both the training and inference phases. Moreover, we design hardware-efficient PQC architectures in the quantum model using a multi-objective evolutionary algorithm and develop a learning algorithm that is adaptable to quantum devices. We hope that the Quafu-RL can be a guiding example to show how to realize machine learning tasks by taking advantage of quantum computers on the quantum cloud platform.
Fund: This work is supported by the Beijing Academy of Quantum Information Sciences. Haifeng Yu, Meng-Jun Hu and Wei-Feng Zhuang are supported by the National Natural Science Foundation of China (Grant No. 92365206). Hong-Ze Xu acknowledges the support of the China Postdoctoral Science Foundation (Certificate Number: 2023M740272). Zheng-An Wang is supported by the National Natural Science Foundation of China (Grant No. 12247168) and China Postdoctoral Science Foundation (Certificate Number: 2022TQ0036).
Corresponding Authors:
Meng-Jun Hu
E-mail: humj@baqis.ac.cn
[1] Arute1 F, Arya1 K, Babbush R, et al. 2019 Nature 574 505 [2] Wu Y, Bao W S, Cao S, et al. 2021 Phys. Rev. Lett. 127 180501 [3] Zhu Q, Cao S, Chen F, et al. 2022 Science Bullletin 67 240 [4] Preskill J 2018 Quantum 2 79 [5] Leymann F and Barzen J 2020 Quantum Science and Technology 5 044007 [6] Wendin G 2023 arXiv: 2302.04558[quant-ph] [7] Quafu quantum computing cloud platform http://quafu.baqis.ac.cn [8] Cerezo M, Arrasmith A, Babbush R, Benjamin S C, Endo S, Fujii K, McClean J R, Mitarai K, Yuan X, Cincio L and Coles P J 2021 Nat. Rev. Phys. 3 625 [9] Cao Y, Romero J, Olson J P, Degroote M, Johnson P D, Kieferov’a M, Kivlichan I D, Menke T, Peropadre B, Sawaya N P D, Sim S, Veis L and Aspuru-Guzik A 2019 Chem. Rev. 119 10856 [10] Cao Y, Romero J and Aspuru-Guzik A 2018 IBM J. Res. Dev. 62 6 [11] O’Malley P J J, Babbush R, Kivlichan I D, et al. 2016 Phys. Rev. X 6 031007 [12] Zhou L, Wang S T, Choi S, Pichler H and Lukin M D 2020 Phys. Rev. X 10 021067 [13] Farhi E, Goldstone J and Gutmann S 2014 arXiv: 1411.4028[quant-ph] [14] Zhuang W F, Pu Y N, Xu H Z, Chai X, Gu Y, Ma Y, Qamar S, Qian C, Qian P, Xiao X, Hu M J and Liu D E 2021 arXiv: 2112.11151[quantph] [15] Mitarai K, Negoro M, Kitagawa M and Fujii K 2018 Phys. Rev. A 98 032309 [16] Zhou L, Wang S T, Choi S, Pichler H and Lukin M D 2019 Quantum Science and Technology 4 043001 [17] Schuld M, Bocharov A, Svore K M and Wiebe N 2020 Phys. Rev. A 101 032308 [18] Havlíček V, Córcoles A D, Temme K, Harrow A W, Kandala A, Chow J M and Gambetta J M 2019 Nature 567 209 [19] Farhi E and Neven H 2018 arXiv: 1802.06002[quant-ph] [20] Abohashima Z, Elhosen M, Houssein E H and Mohamed W M 2020 arXiv: 2006.12270[quant-ph] [21] Dallaire-Demers P L and Killoran N 2018 Phys. Rev. A 98 012324 [22] Lloyd S and Weedbrook C 2018 Phys. Rev. Lett. 121 040502 [23] Tian J, Sun X, Du Y, Zhao S, Liu Q, Zhang K, Yi W, Huang W, Wang C, Wu X, Hsieh M H, Liu T, Yang W and Tao D 2022 arXiv: 2206.03066[quant-ph] [24] Yin X F, Du Y, Fei Y Y, Zhang R, Liu L Z, Mao Y, Liu T, Hsieh M H, Li L, Liu N L, Tao D, Chen Y A and Pan J W 2022 Phys. Rev. Lett. 128 110501) [25] Huang H Y, Broughton M, Mohseni M, Babbush R, Boixo S, Neven H and McClean J R 2021 Nat. Commun. 12 2631 [26] Du Y, Hsieh M H, Liu T and Tao D 2020 Phys. Rev. Res. 2 033125 [27] Liu Y, Arunachalam S and Temme K 2021 Nat. Phys. 17 1013 [28] Sutton R S and Barto A G 1998 IEEE Transactions on Neural Networks 9 1054 [29] Chen S Y C, Yang C H H, Qi J, Chen P Y, Ma X and Goan H S 2020 IEEE Access 8 141007 [30] Lockwood O and Si M 2020 Proceedings of the 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, p. 245 [31] Wu S, Jin S, Wen D, Han D and Wang X 2023 arXiv: 2012.10711[quant-ph] [32] Jerbi S, Trenkwalder L M, Poulsen Nautrup H, Briegel H J and Dunjko V 2021 PRX Quantum 2 010328 [33] Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J and Zaremba W 2016 arXiv: 1606.01540[cs.LG] [34] Jerbi S, Gyurik C, Marshall S C, Briegel H J and Dunjko V 2021 arXiv: 2103.05577[quant-ph] [35] Skolik A, Jerbi S and Dunjko V 2022 Quantum 6 720 [36] Ding L and Spector L 2022 Proceedings of the Genetic and Evolutionary Computation Conference Companion, p. 2190 [37] Zhang A and Zhao S 2022 arXiv: 2212.00421[quant-ph] [38] Giovagnoli A, Ma Y and Tresp V 2023 arXiv: 2304.06981[quant-ph] [39] Rattew A G, Hu S, Pistoia M, Chen R and Wood S 2020 arXiv: 1910.09694[quant-ph] [40] Chivilikhin D, Samarin A, Ulyantsev V, Iorsh I, Oganov A R and Kyriienko O 2020 arXiv: 2007.04424[quant-ph] [41] Lu Z, Shen P X and Deng D L 2021 Phys. Rev. Appl. 16 044039 [42] Schuld M, Sweke R and Meyer J J 2021 Phys. Rev. A 103 032430 [43] Pérez-Salinas A, Cervera-Lierta A, Gil-Fuster E and Latorre J I 2020 Quantum 4 226 [44] Duan Y, Chen X, Houthooft R, Schulman J and Abbeel P 2016 Proceedings of the 33rd International Conference on Machine Learning, p. 1329 [45] Sutton R S, McAllester D, Singh S and Mansour Y 1999 Proceedings of the 12th International Conference on Neural Information Processing Systems, p. 1057 [46] Kandala A, Mezzacapo A, Temme K, Takita M, Brink M, Chow J M and Gambetta J M 2017 Nature 549 242 [47] Deb K, Pratap A, Agarwal S and Meyarivan T 2002 IEEE Transactions on Evolutionary Computation 6 182 [48] Williams R J 1992 Mach. Learn. 8 229 [49] Broughton M, Verdon G, Trevor M, et al. 2021 arXiv: 2003.02989[quant-ph]
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.