|
|
Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems |
Wei Qing-Lai (魏庆来)a, Song Rui-Zhuo (宋睿卓)b, Sun Qiu-Ye (孙秋野)c, Xiao Wen-Dong (肖文栋)b |
a The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
b School of Automation and Electrical Engineering, University of Science and Technology, Beijing 100083, China;
c School of Information Science and Engineering, Northeastern University, Shenyang 110004, China |
|
|
Abstract This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton-Jacobi-Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method.
|
Received: 18 December 2014
Revised: 28 March 2015
Accepted manuscript online:
|
PACS:
|
05.45.Gg
|
(Control of chaos, applications of chaos)
|
|
Fund: Project supported by the National Natural Science Foundation of China (Grant Nos. 61304079 and 61374105), the Beijing Natural Science Foundation, China (Grant Nos. 4132078 and 4143065), the China Postdoctoral Science Foundation (Grant No. 2013M530527), the Fundamental Research Funds for the Central Universities, China (Grant No. FRF-TP-14-119A2), and the Open Research Project from State Key Laboratory of Management and Control for Complex Systems, China (Grant No. 20150104). |
Corresponding Authors:
Song Rui-Zhuo
E-mail: ruizhuosong@ustb.edu.cn
|
Cite this article:
Wei Qing-Lai (魏庆来), Song Rui-Zhuo (宋睿卓), Sun Qiu-Ye (孙秋野), Xiao Wen-Dong (肖文栋) Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems 2015 Chin. Phys. B 24 090504
|
[1] |
Lü J and Lu J 2003 Chaos Soliton. Fract. 17 127
|
[2] |
Xu C and Wu Y 2015 Appl. Math. Model. 39 2295
|
[3] |
Ma T, Zhang H and Fu J 2008 Chin. Phys. B 17 4407
|
[4] |
Ma T and Fu J 2011 Chin. Phys. B 20 050511
|
[5] |
Yang D 2014 Chin. Phys. B 23 010504
|
[6] |
Song R, Xiao W, Sun C and Wei Q 2013 Chin. Phys. B 22 090502
|
[7] |
Song R, Xiao W and Wei Q 2014 Chin. Phys. B 23 050504
|
[8] |
Gao S, Dong H, Sun X and Ning B 2015 Chin. Phys. B 24 010501
|
[9] |
Wei Q and Liu D 2014 IEEE Trans. Autom. Sci. Eng. 11 1020
|
[10] |
Wei Q and Liu D 2015 Neurocomputing 149 106
|
[11] |
Zhang H, Song R, Wei Q and Zhang T 2011 IEEE Trans. Neural Netw. 22 1851
|
[12] |
Heydari A and Balakrishnan S 2013 IEEE Trans. Neural Netw. Learn. Syst. 24 145
|
[13] |
Song R, Zhang H, Luo Y and Wei Q 2010 Neurocomputing 73 3020
|
[14] |
Xu X, Hou Z, Lian C and He H 2013 IEEE Trans. Neural Netw. Learn. Syst. 24 762
|
[15] |
Zhang H, Wei Q and Liu D 2011 Automatica 47 207
|
[16] |
Luo B, Wu H, Huang T and Liu D 2014 Automatica 50 3281
|
[17] |
Luo B, Wu H and Huang T 2015 IEEE Trans. Cybernetics 45 65
|
[18] |
Wei Q and Liu D 2013 IET Control Theory and Applications 7 1472
|
[19] |
Wei Q, Liu D and Xu Y 2015 Chin. Phys. B 24 030502
|
[20] |
Dierks T and Jagannathan S 2012 IEEE Trans. Neural Netw. Learn. Syst. 23 1118
|
[21] |
Song R, Xiao W and Zhang H 2013 Neurocomputing 119 212
|
[22] |
Huang Y and Liu D 2014 Neurocomputing 125 46
|
[23] |
Xu H and Jagannathan S 2013 IEEE Trans. Neural Netw. Learn. Syst. 24 471
|
[24] |
Jiang Y and Jiang Z 2012 IEEE Trans. Circ. Syst. II: Express Briefs 59 693
|
[25] |
Lü J and Chen G 2002 Int. J. Bifurc. Chaos 12 659
|
[26] |
Chen G and Ueta T 1999 Int. J. Bifurc. Chaos 9 1465
|
[27] |
Lorenz E 1963 J. Atmospheric Sci. 20 130
|
[28] |
Chua L, Komuro M and Matsumoto T 1986 IEEE Trans. Circ. Syst. 33 1072
|
[29] |
Wiggins S 1987 Phys. Lett. A 124 138
|
[30] |
Jiang Y and Jiang Z 2012 Automatica 48 2699
|
[31] |
Lü J, Chen G and Zhang S 2002 Int. J. Bifurc. Chaos 12 1001
|
[32] |
Lü J, Chen G and Zhang S 2002 Chaos Soliton. Fract. 14 669
|
No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|