中国物理B ›› 2018, Vol. 27 ›› Issue (1): 10203-010203.doi: 10.1088/1674-1056/27/1/010203

• GENERAL • 上一篇    下一篇

A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems

Yan-Hua Qu(曲延华), An-Na Wang(王安娜), Sheng Lin(林盛)   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • 收稿日期:2017-07-04 修回日期:2017-10-11 出版日期:2018-01-05 发布日期:2018-01-05
  • 通讯作者: Yan-Hua Qu E-mail:quyanhuawang@sina.com

A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems

Yan-Hua Qu(曲延华), An-Na Wang(王安娜), Sheng Lin(林盛)   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2017-07-04 Revised:2017-10-11 Online:2018-01-05 Published:2018-01-05
  • Contact: Yan-Hua Qu E-mail:quyanhuawang@sina.com

摘要: The convergence and stability of a value-iteration-based adaptive dynamic programming (ADP) algorithm are considered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More importantly than sufficing to achieve a good approximate structure, the iterative feedback control law must guarantee the closed-loop stability. Specifically, it is firstly proved that the iterative value function sequence will precisely converge to the optimum. Secondly, the necessary and sufficient condition of the optimal value function serving as a Lyapunov function is investigated. We prove that for the case of infinite horizon, there exists a finite horizon length of which the iterative feedback control law will provide stability, and this increases the practicability of the proposed value iteration algorithm. Neural networks (NNs) are employed to approximate the value functions and the optimal feedback control laws, and the approach allows the implementation of the algorithm without knowing the internal dynamics of the system. Finally, a simulation example is employed to demonstrate the effectiveness of the developed optimal control method.

关键词: adaptive dynamic programming (ADP), convergence, stability, discounted quadric performance index

Abstract: The convergence and stability of a value-iteration-based adaptive dynamic programming (ADP) algorithm are considered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More importantly than sufficing to achieve a good approximate structure, the iterative feedback control law must guarantee the closed-loop stability. Specifically, it is firstly proved that the iterative value function sequence will precisely converge to the optimum. Secondly, the necessary and sufficient condition of the optimal value function serving as a Lyapunov function is investigated. We prove that for the case of infinite horizon, there exists a finite horizon length of which the iterative feedback control law will provide stability, and this increases the practicability of the proposed value iteration algorithm. Neural networks (NNs) are employed to approximate the value functions and the optimal feedback control laws, and the approach allows the implementation of the algorithm without knowing the internal dynamics of the system. Finally, a simulation example is employed to demonstrate the effectiveness of the developed optimal control method.

Key words: adaptive dynamic programming (ADP), convergence, stability, discounted quadric performance index

中图分类号:  (Algorithms for functional approximation)

  • 02.60.Gf
02.30.Jr (Partial differential equations) 02.30.Yy (Control theory)