中国物理B ›› 2015, Vol. 24 ›› Issue (9): 90504-090504.doi: 10.1088/1674-1056/24/9/090504

• GENERAL • 上一篇    下一篇

Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

魏庆来a, 宋睿卓b, 孙秋野c, 肖文栋b   

  1. a The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    b School of Automation and Electrical Engineering, University of Science and Technology, Beijing 100083, China;
    c School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • 收稿日期:2014-12-18 修回日期:2015-03-28 出版日期:2015-09-05 发布日期:2015-09-05
  • 基金资助:

    Project supported by the National Natural Science Foundation of China (Grant Nos. 61304079 and 61374105), the Beijing Natural Science Foundation, China (Grant Nos. 4132078 and 4143065), the China Postdoctoral Science Foundation (Grant No. 2013M530527), the Fundamental Research Funds for the Central Universities, China (Grant No. FRF-TP-14-119A2), and the Open Research Project from State Key Laboratory of Management and Control for Complex Systems, China (Grant No. 20150104).

Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

Wei Qing-Lai (魏庆来)a, Song Rui-Zhuo (宋睿卓)b, Sun Qiu-Ye (孙秋野)c, Xiao Wen-Dong (肖文栋)b   

  1. a The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    b School of Automation and Electrical Engineering, University of Science and Technology, Beijing 100083, China;
    c School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2014-12-18 Revised:2015-03-28 Online:2015-09-05 Published:2015-09-05
  • Contact: Song Rui-Zhuo E-mail:ruizhuosong@ustb.edu.cn
  • Supported by:

    Project supported by the National Natural Science Foundation of China (Grant Nos. 61304079 and 61374105), the Beijing Natural Science Foundation, China (Grant Nos. 4132078 and 4143065), the China Postdoctoral Science Foundation (Grant No. 2013M530527), the Fundamental Research Funds for the Central Universities, China (Grant No. FRF-TP-14-119A2), and the Open Research Project from State Key Laboratory of Management and Control for Complex Systems, China (Grant No. 20150104).

摘要:

This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton-Jacobi-Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method.

关键词: adaptive dynamic programming, approximate dynamic programming, chaotic system, optimal tracking control

Abstract:

This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton-Jacobi-Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method.

Key words: adaptive dynamic programming, approximate dynamic programming, chaotic system, optimal tracking control

中图分类号:  (Control of chaos, applications of chaos)

  • 05.45.Gg