中国物理B ›› 2021, Vol. 30 ›› Issue (12): 120203-120203.doi: 10.1088/1674-1056/ac3229

所属专题: SPECIAL TOPIC — Interdisciplinary physics: Complex network dynamics and emerging technologies

• • 上一篇    下一篇

Optimal control strategy for COVID-19 concerning both life and economy based on deep reinforcement learning

Wei Deng(邓为)1, Guoyuan Qi(齐国元)1,†, and Xinchen Yu(蔚昕晨)2   

  1. 1 Tianjin Key Laboratory of Advanced Technology in Electrical Engineering and Energy, School of Control Science and Engineering, Tiangong University, Tianjin 300387, China;
    2 School of Mechanical Engineering, Tiangong University, Tianjin 300387, China
  • 收稿日期:2021-08-30 修回日期:2021-10-14 接受日期:2021-10-22 出版日期:2021-11-15 发布日期:2021-12-01
  • 通讯作者: Guoyuan Qi E-mail:guoyuanqisa@qq.com
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant No. 61873186) and the Tianjin Natural Science Foundation, China (Grant No. 17JCZDJC38300).

Optimal control strategy for COVID-19 concerning both life and economy based on deep reinforcement learning

Wei Deng(邓为)1, Guoyuan Qi(齐国元)1,†, and Xinchen Yu(蔚昕晨)2   

  1. 1 Tianjin Key Laboratory of Advanced Technology in Electrical Engineering and Energy, School of Control Science and Engineering, Tiangong University, Tianjin 300387, China;
    2 School of Mechanical Engineering, Tiangong University, Tianjin 300387, China
  • Received:2021-08-30 Revised:2021-10-14 Accepted:2021-10-22 Online:2021-11-15 Published:2021-12-01
  • Contact: Guoyuan Qi E-mail:guoyuanqisa@qq.com
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant No. 61873186) and the Tianjin Natural Science Foundation, China (Grant No. 17JCZDJC38300).

摘要: At present, the global COVID-19 is still severe. More and more countries have experienced second or even third outbreaks. The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. Inappropriate epidemic control strategies may bring catastrophic consequences. It is essential to maximize the epidemic restraining and to mitigate economic damage. However, the study on the optimal control strategy concerning both sides is rare, and no optimal model has been built. In this paper, the Susceptible-Infectious-Hospitalized-Recovered (SIHR) compartment model is expanded to simulate the epidemic's spread concerning isolation rate. An economic model affected by epidemic isolation measures is established. The effective reproduction number and the eigenvalues at the equilibrium point are introduced as the indicators of controllability and stability of the model and verified the effectiveness of the SIHR model. Based on the Deep Q Network (DQN), one of the deep reinforcement learning (RL) methods, the blocking policy is studied to maximize the economic output under the premise of controlling the number of infections in different stages. The epidemic control strategies given by deep RL under different learning strategies are compared for different reward coefficients. The study demonstrates that optimal policies may differ in various countries depending on disease spread and anti-economic risk ability. The results show that the more economical strategy, the less economic loss in the short term, which can save economically fragile countries from economic crises. In the second or third outbreak stage, the earlier the government adopts the control strategy, the smaller the economic loss. We recommend the method of deep RL to specify a policy which can control the epidemic while making quarantine economically viable.

关键词: COVID-19, SIHR model, deep reinforcement learning, DQN, secondary outbreak, economy

Abstract: At present, the global COVID-19 is still severe. More and more countries have experienced second or even third outbreaks. The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. Inappropriate epidemic control strategies may bring catastrophic consequences. It is essential to maximize the epidemic restraining and to mitigate economic damage. However, the study on the optimal control strategy concerning both sides is rare, and no optimal model has been built. In this paper, the Susceptible-Infectious-Hospitalized-Recovered (SIHR) compartment model is expanded to simulate the epidemic's spread concerning isolation rate. An economic model affected by epidemic isolation measures is established. The effective reproduction number and the eigenvalues at the equilibrium point are introduced as the indicators of controllability and stability of the model and verified the effectiveness of the SIHR model. Based on the Deep Q Network (DQN), one of the deep reinforcement learning (RL) methods, the blocking policy is studied to maximize the economic output under the premise of controlling the number of infections in different stages. The epidemic control strategies given by deep RL under different learning strategies are compared for different reward coefficients. The study demonstrates that optimal policies may differ in various countries depending on disease spread and anti-economic risk ability. The results show that the more economical strategy, the less economic loss in the short term, which can save economically fragile countries from economic crises. In the second or third outbreak stage, the earlier the government adopts the control strategy, the smaller the economic loss. We recommend the method of deep RL to specify a policy which can control the epidemic while making quarantine economically viable.

Key words: COVID-19, SIHR model, deep reinforcement learning, DQN, secondary outbreak, economy

中图分类号:  (Computational techniques; simulations)

  • 02.70.-c
05.45.-a (Nonlinear dynamics and chaos)