Quafu-RL: The cloud quantum computers based quantum reinforcement learning

doi:10.1088/1674-1056/ad3061

Abstract With the rapid advancement of quantum computing, hybrid quantum-classical machine learning has shown numerous potential applications at the current stage, with expectations of being achievable in the noisy intermediate-scale quantum (NISQ) era. Quantum reinforcement learning, as an indispensable study, has recently demonstrated its ability to solve standard benchmark environments with formally provable theoretical advantages over classical counterparts. However, despite the progress of quantum processors and the emergence of quantum computing clouds, implementing quantum reinforcement learning algorithms utilizing parameterized quantum circuits (PQCs) on NISQ devices remains infrequent. In this work, we take the first step towards executing benchmark quantum reinforcement problems on real devices equipped with at most 136 qubits on the BAQIS Quafu quantum computing cloud. The experimental results demonstrate that the policy agents can successfully accomplish objectives under modified conditions in both the training and inference phases. Moreover, we design hardware-efficient PQC architectures in the quantum model using a multi-objective evolutionary algorithm and develop a learning algorithm that is adaptable to quantum devices. We hope that the Quafu-RL can be a guiding example to show how to realize machine learning tasks by taking advantage of quantum computers on the quantum cloud platform.

Keywords: quantum cloud platform quantum reinforcement learning evolutionary quantum architecture search

Received: 23 December 2023
Revised: 03 March 2024
Accepted manuscript online: 06 March 2024

PACS:	03.67.Lx	(Quantum computation architectures and implementations)
	03.67.Ac	(Quantum algorithms, protocols, and simulations)

Fund: This work is supported by the Beijing Academy of Quantum Information Sciences. Haifeng Yu, Meng-Jun Hu and Wei-Feng Zhuang are supported by the National Natural Science Foundation of China (Grant No. 92365206). Hong-Ze Xu acknowledges the support of the China Postdoctoral Science Foundation (Certificate Number: 2023M740272). Zheng-An Wang is supported by the National Natural Science Foundation of China (Grant No. 12247168) and China Postdoctoral Science Foundation (Certificate Number: 2022TQ0036).

Corresponding Authors: Meng-Jun Hu
E-mail: humj@baqis.ac.cn

Cite this article:

Yu-Xin Jin(靳羽欣), Hong-Ze Xu(许宏泽), Zheng-An Wang(王正安), Wei-Feng Zhuang(庄伟峰), Kai-Xuan Huang(黄凯旋), Yun-Hao Shi(时运豪), Wei-Guo Ma(马卫国), Tian-Ming Li(李天铭), Chi-Tong Chen(陈驰通), Kai Xu(许凯), Yu-Long Feng(冯玉龙), Pei Liu(刘培), Mo Chen(陈墨), Shang-Shu Li(李尚书), Zhi-Peng Yang(杨智鹏), Chen Qian(钱辰), Yun-Heng Ma(马运恒), Xiao Xiao(肖骁), Peng Qian(钱鹏), Yanwu Gu(顾炎武), Xu-Dan Chai(柴绪丹), Ya-Nan Pu(普亚南), Yi-Peng Zhang(张翼鹏), Shi-Jie Wei(魏世杰), Jin-Feng Zeng(曾进峰), Hang Li(李行), Gui-Lu Long(龙桂鲁), Yirong Jin(金贻荣), Haifeng Yu(于海峰), Heng Fan(范桁), Dong E. Liu(刘东), and Meng-Jun Hu(胡孟军) Quafu-RL: The cloud quantum computers based quantum reinforcement learning 2024 Chin. Phys. B 33 050301

[1] Arute1 F, Arya1 K, Babbush R, et al. 2019 Nature 574 505
[2] Wu Y, Bao W S, Cao S, et al. 2021 Phys. Rev. Lett. 127 180501
[3] Zhu Q, Cao S, Chen F, et al. 2022 Science Bullletin 67 240
[4] Preskill J 2018 Quantum 2 79
[5] Leymann F and Barzen J 2020 Quantum Science and Technology 5 044007
[6] Wendin G 2023 arXiv: 2302.04558[quant-ph]
[7] Quafu quantum computing cloud platform http://quafu.baqis.ac.cn
[8] Cerezo M, Arrasmith A, Babbush R, Benjamin S C, Endo S, Fujii K, McClean J R, Mitarai K, Yuan X, Cincio L and Coles P J 2021 Nat. Rev. Phys. 3 625
[9] Cao Y, Romero J, Olson J P, Degroote M, Johnson P D, Kieferov’a M, Kivlichan I D, Menke T, Peropadre B, Sawaya N P D, Sim S, Veis L and Aspuru-Guzik A 2019 Chem. Rev. 119 10856
[10] Cao Y, Romero J and Aspuru-Guzik A 2018 IBM J. Res. Dev. 62 6
[11] O’Malley P J J, Babbush R, Kivlichan I D, et al. 2016 Phys. Rev. X 6 031007
[12] Zhou L, Wang S T, Choi S, Pichler H and Lukin M D 2020 Phys. Rev. X 10 021067
[13] Farhi E, Goldstone J and Gutmann S 2014 arXiv: 1411.4028[quant-ph]
[14] Zhuang W F, Pu Y N, Xu H Z, Chai X, Gu Y, Ma Y, Qamar S, Qian C, Qian P, Xiao X, Hu M J and Liu D E 2021 arXiv: 2112.11151[quantph]
[15] Mitarai K, Negoro M, Kitagawa M and Fujii K 2018 Phys. Rev. A 98 032309
[16] Zhou L, Wang S T, Choi S, Pichler H and Lukin M D 2019 Quantum Science and Technology 4 043001
[17] Schuld M, Bocharov A, Svore K M and Wiebe N 2020 Phys. Rev. A 101 032308
[18] Havlíček V, Córcoles A D, Temme K, Harrow A W, Kandala A, Chow J M and Gambetta J M 2019 Nature 567 209
[19] Farhi E and Neven H 2018 arXiv: 1802.06002[quant-ph]
[20] Abohashima Z, Elhosen M, Houssein E H and Mohamed W M 2020 arXiv: 2006.12270[quant-ph]
[21] Dallaire-Demers P L and Killoran N 2018 Phys. Rev. A 98 012324
[22] Lloyd S and Weedbrook C 2018 Phys. Rev. Lett. 121 040502
[23] Tian J, Sun X, Du Y, Zhao S, Liu Q, Zhang K, Yi W, Huang W, Wang C, Wu X, Hsieh M H, Liu T, Yang W and Tao D 2022 arXiv: 2206.03066[quant-ph]
[24] Yin X F, Du Y, Fei Y Y, Zhang R, Liu L Z, Mao Y, Liu T, Hsieh M H, Li L, Liu N L, Tao D, Chen Y A and Pan J W 2022 Phys. Rev. Lett. 128 110501)
[25] Huang H Y, Broughton M, Mohseni M, Babbush R, Boixo S, Neven H and McClean J R 2021 Nat. Commun. 12 2631
[26] Du Y, Hsieh M H, Liu T and Tao D 2020 Phys. Rev. Res. 2 033125
[27] Liu Y, Arunachalam S and Temme K 2021 Nat. Phys. 17 1013
[28] Sutton R S and Barto A G 1998 IEEE Transactions on Neural Networks 9 1054
[29] Chen S Y C, Yang C H H, Qi J, Chen P Y, Ma X and Goan H S 2020 IEEE Access 8 141007
[30] Lockwood O and Si M 2020 Proceedings of the 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, p. 245
[31] Wu S, Jin S, Wen D, Han D and Wang X 2023 arXiv: 2012.10711[quant-ph]
[32] Jerbi S, Trenkwalder L M, Poulsen Nautrup H, Briegel H J and Dunjko V 2021 PRX Quantum 2 010328
[33] Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J and Zaremba W 2016 arXiv: 1606.01540[cs.LG]
[34] Jerbi S, Gyurik C, Marshall S C, Briegel H J and Dunjko V 2021 arXiv: 2103.05577[quant-ph]
[35] Skolik A, Jerbi S and Dunjko V 2022 Quantum 6 720
[36] Ding L and Spector L 2022 Proceedings of the Genetic and Evolutionary Computation Conference Companion, p. 2190
[37] Zhang A and Zhao S 2022 arXiv: 2212.00421[quant-ph]
[38] Giovagnoli A, Ma Y and Tresp V 2023 arXiv: 2304.06981[quant-ph]
[39] Rattew A G, Hu S, Pistoia M, Chen R and Wood S 2020 arXiv: 1910.09694[quant-ph]
[40] Chivilikhin D, Samarin A, Ulyantsev V, Iorsh I, Oganov A R and Kyriienko O 2020 arXiv: 2007.04424[quant-ph]
[41] Lu Z, Shen P X and Deng D L 2021 Phys. Rev. Appl. 16 044039
[42] Schuld M, Sweke R and Meyer J J 2021 Phys. Rev. A 103 032430
[43] Pérez-Salinas A, Cervera-Lierta A, Gil-Fuster E and Latorre J I 2020 Quantum 4 226
[44] Duan Y, Chen X, Houthooft R, Schulman J and Abbeel P 2016 Proceedings of the 33rd International Conference on Machine Learning, p. 1329
[45] Sutton R S, McAllester D, Singh S and Mansour Y 1999 Proceedings of the 12th International Conference on Neural Information Processing Systems, p. 1057
[46] Kandala A, Mezzacapo A, Temme K, Takita M, Brink M, Chow J M and Gambetta J M 2017 Nature 549 242
[47] Deb K, Pratap A, Agarwal S and Meyarivan T 2002 IEEE Transactions on Evolutionary Computation 6 182
[48] Williams R J 1992 Mach. Learn. 8 229
[49] Broughton M, Verdon G, Trevor M, et al. 2021 arXiv: 2003.02989[quant-ph]

[1]

Quafu-Qcover: Explore combinatorial optimization problems on cloud-based quantum computers
Hong-Ze Xu(许宏泽), Wei-Feng Zhuang(庄伟峰), Zheng-An Wang(王正安), Kai-Xuan Huang(黄凯旋), Yun-Hao Shi(时运豪), Wei-Guo Ma(马卫国), Tian-Ming Li(李天铭), Chi-Tong Chen(陈驰通), Kai Xu(许凯), Yu-Long Feng(冯玉龙), Pei Liu(刘培), Mo Chen(陈墨), Shang-Shu Li(李尚书), Zhi-Peng Yang(杨智鹏), Chen Qian(钱辰), Yu-Xin Jin(靳羽欣), Yun-Heng Ma(马运恒), Xiao Xiao(肖骁), Peng Qian(钱鹏), Yanwu Gu(顾炎武), Xu-Dan Chai(柴绪丹), Ya-Nan Pu(普亚南), Yi-Peng Zhang(张翼鹏), Shi-Jie Wei(魏世杰), Jin-Feng Zeng(增进峰), Hang Li(李行), Gui-Lu Long(龙桂鲁), Yirong Jin(金贻荣), Haifeng Yu(于海峰), Heng Fan(范桁), Dong E. Liu(刘东), and Meng-Jun Hu(胡孟军). Chin. Phys. B, 2024, 33(5): 050302.

[2]

Quantum circuit-based proxy blind signatures: A novel approach and experimental evaluation on the IBM quantum cloud platform
Xiaoping Lou(娄小平), Huiru Zan(昝慧茹), and Xuejiao Xu(徐雪娇). Chin. Phys. B, 2024, 33(5): 050307.

No Suggested Reading articles found!

Viewed

Full text

Abstract

Cited

Metrics
Related Articles

Quafu-RL: The cloud quantum computers based quantum reinforcement learning

Cite this article:

Online attention