Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach<xref ref-type="fn" rid="cpb141797fn1">*</xref>

Abstract
Keyword
1. Introduction
2. Problem formulation and preliminaries
3. Optimal chaotic tracking control scheme based on policy iteration ADP algorithm
3.1. The policy iteration algorithm
3.2. Properties of the algorithm
3.3. Constructing the iterative performance index function
4. Simulation
Figure1
Figure2
Figure3
Figure4
Figure5
Figure6
Figure7
Figure8
5. Conclusion
Reference

Cite this Article

Wei Qing-Lai†, Liu De-Rong, Xu Yan-Cai. Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach* . Chinese Physics B, 2015, 24(3): 030502 Copy to clipboard

Permissions

Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach*

Wei Qing-Lai†, Liu De-Rong, Xu Yan-Cai

The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

^†Corresponding author. E-mail: qinglai.wei@ia.ac.cn

^*Project supported by the National Natural Science Foundation of China (Grant Nos. 61034002, 61233001, 61273140, 61304086, and 61374105) and the Beijing Natural Science Foundation, China (Grant No. 4132078).

Abstract

A policy iteration algorithm of adaptive dynamic programming (ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then, the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks, the developed optimal tracking control scheme for chaotic systems is verified by a simulation.

Keyword: 05.45.Gg; adaptive critic designs; adaptive dynamic programming; approximate dynamic programming; neuro-dynamic programming

Show Figures

1. Introduction

The control of chaotic systems has been the focus of control research in the past decade.^{[1– 7]} For most control methods of chaotic systems, such as impulsive control methods^{[2– 4, 6]} and adaptive synchronization control methods, ^{[8– 10]} only the stability properties of the chaotic systems are considered. The optimality property is an important performance of chaotic control systems. As is well known, dynamic programming is a very useful tool in solving optimal control problems.^[11] However, due to the difficulties related to solving the time-varying Hamilton– Jacobi– Bellman (HJB) equations, the closed-loop optimal feedback control can hardly be obtained. Approximate optimal solutions of the optimal problem have consequently attracted much attention.^{[12– 16]} Among those approximate approaches, the adaptive dynamic programming (ADP) algorithm, proposed by Werbos, ^{[17, 18]} has played an important role in seeking the approximate solutions of dynamic programming problems.^{[19– 21]} There are several synonyms used for ADP, including adaptive critic designs, ^[22] adaptive dynamic programming, ^{[23– 27]} approximate dynamic programming, ^[28] neural dynamic programming, ^[29] and reinforcement learning.^[30] In Refs. [22] and [28], ADP approaches were classified into several main schemes, which are heuristic dynamic programming (HDP), action-dependent HDP (ADHDP), dual heuristic dynamic programming (DHP), action-dependent DHP (ADDHP), which is also called Q-learning, ^[31] globalized DHP (GDHP), and action-dependent GDHP (ADGDHP). Iterative methods are also used in ADP to obtain the solution of the HJB equation indirectly and they have received increasing attention.^{[32– 40]} There are two main iterative ADP algorithms, ^[41] which are based on value and policy iterations, respectively.

Value iteration algorithms for the optimal control of discrete-time nonlinear systems were given in Ref. [42]. Al-Tamimi and Lewis^[43] studied the deterministic discrete-time affine nonlinear systems and a value iteration algorithm, which is referred to as HDP and proposed for finding the optimal control law. Starting from J^[0](x_k) ≡ 0, the value iteration algorithm is updated by

for i = 0, 1, 2, … c, where x_{k+ 1} = f(x_k) + g(x_k)u^[i](x_k). In Ref. [43], it was proved that J^[i](x_k) is a nondecreasing sequence and bounded, and hence converges to J* (x_k). Zhang et al. applied a value iteration algorithm for optimal tracking problems.^[36] Song et al. successfully implemented value iteration algorithms to find the optimal tracking control law for Hé non chaotic systems.^[44] For most previous value iteration algorithms of ADP, the initial performance index function is chosen as zero.^{[36, 43, 44]} For each of the iterative controls u^[i](x_k), i = 0, 1, … , the stability of the system cannot be guaranteed. This means that only the converged u* (x_k) can be used to control the nonlinear system, and all the iterative controls u^[i](x_k), i = 0, 1, … may be invalid. So the computation efficiency of the value iteration ADP method is very low.

Policy iteration algorithms for the optimal control of continuous-time systems with continuous state and action spaces were given in Refs. [12] and [32]. In Ref. [12], Murray et al. proved that for continuous-time affine nonlinear systems, each of the iterative controls stabilizes the nonlinear systems by using the policy iteration algorithms. This is the great merit of the policy iteration algorithms. However, almost all of the discussions on the policy iteration algorithms are for continuous-time control systems.^[35] In Ref. [45], the policy iteration algorithm for discrete-time systems was displayed and the stability and convergence properties were discussed. To the best of our knowledge, there is still no discussion focused on the policy iteration algorithms for the optimal tracking controls of discrete-time chaotic systems. This motives our research.

In this paper, inspired by Ref. [45], we develop a policy iteration ADP algorithm for the first time to solve the optimal tracking controls of discrete-time chaotic systems. First, by system transformations, the optimal tracking problem is transformed into an optimal regulation problem. Next, a policy iteration algorithm for discrete-time chaotic systems is described. Then, the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and simultaneously guarantee the iterative performance index function to converge to the optimum. By employing neural networks, the developed policy iteration algorithm is implemented. To justify the effectiveness of the developed algorithm, the simulation results using the policy iteration algorithm are compared with the results obtained by the traditional value iteration algorithm.

This paper is organized as follows. In Section 2, we present the problem formulation and preliminaries. In Section 3, the optimal chaotic tracking control scheme based on the policy iteration ADP algorithm is given, where the corresponding convergence and admissibility properties will also be presented. In Section 4, a simulation example is given to demonstrate the effectiveness of the developed optimal tracking control scheme. In Section 5, the conclusion is given.

2. Problem formulation and preliminaries

Consider the following MIMO chaotic dynamic system:

where x_k = [x_{1, k}, … , x_{n, k}]^T ∈ ℝ ⁿ is the system state vector which is assumed to be available from measurement. Let u_k = [u_{1, k}, … , u_{m, k}]^T∈ ℝ ^m be the control input and g_ij (i = 1, … , n, j = 1, … , m) be the constant control gain. If we denote f(x_k) = [f₁(x_k), … , f_n(x_k)]^T and , then the chaotic dynamic system (2) can be rewritten as

It should be pointed out that system (3) denotes a large class of chaotic systems, such as the Hé non systems^[46] and the new discrete chaotic system proposed in Ref. [47]. In an optimal tracking problem, the control objective is to design optimal control u(x_k) for system (2) such that the system state x_k tracks the specified desired trajectory η _k ∈ ℝ ⁿ, k = 0, 1, … , where η _k satisfies

and σ is a given function. We define the tracking error as

and define the following quadratic performance index:

where Q ∈ ℝ ^{n× n} and R ∈ ℝ ^{m× m} are positive definite matrices and u₀ = (u₀, u₁, … ). Let

be the utility function, where w_k = u_k − u_{e, k}. Let u_{e, k} denote the expected control introduced for analytical purpose, which can be given as

where g^{− 1}(η _k)g(η _k) = I and I ∈ ℝ ^{m× m} is the identity matrix. Combining Eqs. (2) and (3), we obtain

We will study optimal tracking control problems for system (2). The goal is to find an optimal tracking control scheme which tracks the desired trajectory η _k and simultaneously minimizes the performance index function (6). The optimal performance index function is defined as

where w_k = (w_k, w_{k+ 1}, … ). According to Bellman’ s principle of optimality, J* (z_k) satisfies the discrete-time HJB equation

Then, the law of optimal single control vector can be expressed as

Hence, the HJB equation (11) can be written as

From Eqs. (4)– (13), we can see that the optimal tracking problem for chaotic system (3) is transformed into an optimal regulation problem for chaotic system (9). The objective of this paper is to construct an optimal tracking controller such that the chaotic system state x_k tracks the reference signal η _k, i.e., z_k→ 0, as k→ 0. For convenience of analysis, the results of this paper are based on the following assumptions.

Assumption 1 The system (9) is controllable and the function 𝓕 (z_k, w_k) is Lipschitz continuous for ∀ z_k, w_k.

Assumption 2 The system state z_k = 0 is an equilibrium state of system (9) under the control v = 0, i.e., F(0, 0) = 0.

Assumption 3The feed-back control w_k = w(z_k) satisfies w(z_k) = 0 for z_k = 0.

Assumption 4 The utility function U(z_k, w_k) is a continuous positive definite function of z_k and w_k.

Generally, J* (z_k) is unknown before the control error w_k ∈ ℝ ^m is considered. If we adopt the traditional dynamic programming method to obtain the optimal performance index function, then we have to face “ the curse of dimensionality” . This makes the optimal control nearly impossible to obtain by the HJB equation. So, in the next section, an effective policy iteration ADP algorithm will be developed to obtain the optimal chaotic tracking control iteratively.

3. Optimal chaotic tracking control scheme based on policy iteration ADP algorithm

In this section, a new discrete-time policy iteration algorithm is developed to obtain the optimal tracking control law for chaotic systems (2). The goal of the present iterative ADP algorithm is to construct an optimal control law w* (z_k), k = 0, 1, … , which moves an arbitrary initial state x₀ to the desired trajectory η _k, and simultaneously minimizes the performance index function. Convergence and admissibility properties will be analyzed.

3.1. The policy iteration algorithm

For optimal control problems, the developed control scheme must not only stabilize the control systems but also make the performance index function finite, i.e., the control law must be admissible.^[43] We can define Ω _w as the set of the admissible control laws, which is expressed as

Based on the above definition, we can introduce our policy iteration algorithm. Let i be the iteration index that increases from 0 to infinity. Let w^[0](z_k) ∈ Ω _w be an arbitrary admissible control law. For i = 0, let J^[0](z_k) be the iterative performance index function constructed by w^[0](z_k), which satisfies the following generalized HJB (GHJB) equation:

For ∀ i = 1, 2, … , the developed policy iteration algorithm will iterate between

and

From Eqs. (15)– (17), we can see that the procedure of policy iteration is obviously different from value iteration (1). Thus, the convergence analysis for the value iteration algorithms in Refs. [36] and [43] is invalid for the policy iteration. Therefore, a new analysis will be given in this paper.

3.2. Properties of the algorithm

In Ref. [12], the convergence and admissibility properties of the continuous-time policy iteration algorithm are discussed. This shows that any of the iterative control laws can stabilize the transformed chaotic system. For the discrete-time systems, the proofs of the continuous-time policy iteration algorithms are invalid. Hence, a new analysis will be developed. We will show that for the discrete-time policy iteration algorithm, any of the iterative control laws can also stabilize the transformed system.

Theorem 1 For i = 0, 1, … , let J^[i](z_k) and w^[i](z_k) be obtained by Eqs. (15)– (17), where w^[0](z_k)∈ Ω _w is an arbitrary admissible control law. Then, for ∀ i = 0, 1, … , the iterative control law w^[i](z_k) stabilizes the chaotic system (9).

Proof According to Eq. (17), we have

Let z_k = 0, according to Assumption 3, we have w^[i](z_k) = 0. According to Assumption 2, we also have z_{k+ 1} = 𝓕 (z_k, w^[i](z_k)) = 0 and furthermore w^[i](z_{k+ 1}) = 0. Then according to the mathematical induction and Assumption 4, we have ∀ U(z_{k+ j}, w^[i](z_{k+ j})) = 0, j = 0, 1, … , which shows J^[i](z_k) = 0 for z_k = 0.

On the other hand, according to Assumption 4, the utility function U(z_k, w_k) is positive definite for ∀ z_k, w_k. Then we can get U(z_k, w_k) → ∞ as z_k→ ∞ , which means J^[i](z_k) → ∞ . Hence, J^[i](z_k) is positive definite for ∀ i = 0, 1, … .

According to Eqs. (15) and (17), for ∀ i = 0, 1, … ,

holds. Then for ∀ i = 0, 1, … , J^[i](z_k) is a Lyapunov function. Thus w^[i](z_k) is a stable control law. The proof is completed.

Theorem 2 For i = 0, 1, … , let J^[i](z_k) and w^[i](z_k) be obtained by Eqs. (15)– (17). For ∀ z_k ∈ ℝ ⁿ, the iterative performance index function J^[i](z_k) is a monotonically non-increasing sequence for ∀ i ≥ 0, i.e.,

Proof For i = 0, 1, … , define a new performance index function ϒ ^{[i+ 1]}(z_k) as

where w^{[i+ 1]} (z_k) is obtained by Eq. (16). According to Eq. (21), for ∀ z_k, we can obtain

The inequality (20) will be proven by the mathematical induction. According to Theorem 1, for i = 0, 1, … , w^{[i+ 1]}(z_k) is a stable control law. Then, z_k→ 0, for ∀ k → ∞ . Without loss of generality, let z_N = 0, where N→ ∞ . We can obtain

First, we let k = N – 1. According to Eq. (16), we have

According to Eqs. (17) and (24), we can obtain

So, the conclusion holds for k = N – 1. Assume that the conclusion holds for k = l + 1, l = 0, 1, … . For k = l, we have

According to Eq. (22), for ∀ z_l, we have

According to inequalities (26) and (27), for i = 0, 1, … , inequality (20) holds for ∀ z_k. The proof is completed.

Corollary 1 For i = 0, 1, … , let J^[i](z_k) and w^[i](z_k) be obtained by Eqs. (15)– (17), where w^[0](z_k) is an arbitrary admissible control law. Then for ∀ i = 0, 1, … , the iterative control law w^[i](z_k) is admissible.

From Theorem 2, we can see that the iterative performance index function J^[i](z_k) is monotonically non-increasing and lower bounded. Hence as i→ ∞ , the limit of J^[i](z_k) exists, i.e.,

We can then derive the following theorem.

Theorem 3 For i = 0, 1, … , let J^[i](z_k) and w^[i](z_k) be obtained by Eqs. (15)– (17). The iterative performance index function J^[i](z_k) then converges to the optimal performance index function J* (z_k) as i → ∞ , i.e.,

which satisfies the HJB equation (11).

Proof The theorem can be proven in two steps. Let μ (z_k) ∈ Ω _w be an arbitrary admissible control law. Define a new performance index function P(z_k), which satisfies

For i = 0, 1, … , let J^[i](z_k) and w^[i](z_k) be obtained by Eqs. (15) and (16). We will then prove

The statement (31) can be proven by a mathematical induction. As μ (z_k) is an admissible control law, we have z_k→ 0 as k → ∞ . Without loss of generality, let z_N = 0, where N → ∞ . According to Eq. (30), we have

where z_N = 0. According to Eq. (28), the iterative performance index function J^{[∞ ]} (z_k) can be expressed as

Since w^{[∞ ]}(z_k) is an admissible control law, we can obtain z_N = 0 when N → ∞ , which means J^{[∞ ]}(z_N) = P(z_N) = 0 as N → ∞ . For N– 1, according to Eq. (43), we can obtain

Assume that the statement holds for k = l+ 1, l = 0, 1, … , then for k = l, we have

Hence for ∀ z_k, k = 0, 1, … , the inequality (31) holds. The mathematical induction is completed.

Next, we will prove that the iterative performance index function J^[i](z_k) converges to the optimal performance index function J* (z_k) as i → ∞ , i.e.,

which satisfies the HJB equation (11).

According to the definition of ϒ ^{[i+ 1]}(z_k) in Eq. (21), we have

According to Eq. (26), we obtain

for ∀ z_k. Let i→ ∞ . We obtain

Thus we can obtain

Let ε > 0 be an arbitrary positive number. Since J^[i](z_k) is non-increasing for i ≥ 1 and , there exists a positive integer p such that

Hence, we can obtain

Since ε is arbitrary, we have

Combining Eqs. (40) and (42), we can obtain

According to the definition of J* (z_k) in Eq. (40), for ∀ i = 0, 1, … , we have

Let i → ∞ , we can then obtain

On the other hand, as μ (z_k)∈ Ω _w is arbitrary, if we let w^[i](z_k) = w* (z_k), then according to inequality (31), we have

According to inequalities (45) and (46), we can obtain Eq. (36). The proof is completed.

Corollary 2 Let z_k ∈ ℝ ⁿ be an arbitrary state vector. If Theorem 3 holds, then the iterative control law w^[i](z_k) converges to the optimal control law as i → ∞ , i.e.,

3.3. Constructing the iterative performance index function

In the policy iteration algorithm (15)– (17), for ∀ i = 0, 1, … , we should construct an iterative performance index function J^[i](z_k) to satisfy Eq. (17). In this subsection, we will give an effective method to construct the iterative performance index function.

Let Ψ (z_k) be a positive semi-definite function. Introducing a new iteration index j = 0, 1, … . Define a new performance index function . For i = 0 and j = 0, let . For i = 0 and j = 0, 1, … , let

For i = 1, 2, … and j = 0, 1… , let

where . Then we can derive the following theorem.

Theorem 4 Let Ψ (z_k) ≥ 0 be an arbitrary semi-positive definite function. Let , i = 0, 1, … , j = 0, 1… be the iterative performance index function updated by Eqs. (47) and (48), respectively. Then, for ∀ i = 0, 1, … , we have

where J^[i](z_k) satisfies Eqs. (15) and (17), respectively.

Proof According to Eq. (47), we have

We can then obtain

Let j → ∞ . We can obtain

Since w^[0] (z_k) is an admissible control law, is finite. For j→ ∞ , we have z_{k+ j+ 1} → 0 and hence Ψ (z_{k+ i+ 1}) = 0. As for an arbitrary finite z_k, Ψ (z_{k+ j+ 1}) is finite, then for ∀ j = 0, 1, 2, … , is finite. Hence is finite, which means that Equation (49) holds for i = 0. According to the mathematical induction, Equation (49) holds for ∀ i = 0, 1, … . The proof is completed.

4. Simulation

To evaluate the performance of our policy iteration algorithm, we choose an example with quadratic utility functions for numerical experiments. Consider the following chaotic system:^[4]

where h(x₁) = m₁x₁ + (m₀ − m₁)(| x₁ + θ ₃| − | x₁ − θ ₃| )/2, θ ₁ = 9, θ ₂ = 14.28, θ ₃ = 1, m₀ = − 1/7, and m₁ = 2/7. The state trajectory of the chaotic system is given in Fig. 1. According to Euler’ s discretization method, the continuous time chaotic system can be represented as follows:

where Δ T = 0.1. Let the desired orbit be η _k = [sin(k), 0.5cos(k), 0.6sin(k)]^T and the initial state be selected as [1, − 1, 1.5]^T. The utility function is defined as , where Q = R = I₃. Neural networks are used to implement the developed policy iteration algorithm.

	Figure Option View Download New Window
	Fig. 1. The chaotic attractor of system (54).

The critic network and the action network are chosen as three-layer BP neural networks with the structures of 3– 10– 1 and 3– 10– 3, respectively. For each iteration step, the critic network and the action network are trained for 1000 steps using a learning rate of α = 0.02 so that the neural network training error becomes less than 10^{− 5}. The training methods of the neural networks have been described in Refs. [30] and [45], which are omitted here. The performance index function is shown in Fig. 2. The system states are shown in Fig. 3. The state errors are given in Fig. 4. The iterative controls are shown in Fig. 5. In the simulation results, we let “ initial” and “ limiting” denote the initial iteration and the limiting iteration, respectively.

	Figure Option View Download New Window
	Fig. 2. The performance index function trajectory in the policy iteration algorithm.

	Figure Option View Download New Window
	Fig. 3. The trajectories of system states x₁, x₂, and x₃ in the policy iteration algorithm.

	Figure Option View Download New Window
	Fig. 4. The trajectories of tracking errors z₁, z₂, and z₃ in the policy iteration algorithm.

	Figure Option View Download New Window
	Fig. 5. The trajectories of controls u₁, u₂, and u₃ in the policy iteration algorithm.

From the simulation results, we can see that the iterative performance index function J^[i](z_k) is monotonically non-increasing and converges to the optimal performance index function. For ∀ i = 0, 1, … , the regulation system (9) is stable under the iterative control law w^[i](z_k). To show the effectiveness of the developed algorithm, the results of our developed policy iteration algorithm are compared with the results using a value iteration algorithm.^[36] Let the value iteration algorithm be implemented by Eq. (1), where the initial performance index function J^[0](z_k) ≡ 0. All of the parameters are set the same as those for implementing the polity iteration algorithm. The performance index function is shown in Fig. 6. The system states are shown in Fig. 7. The iterative controls are shown in Fig. 8.

	Figure Option View Download New Window
	Fig. 6. The performance index function trajectory in the value iteration algorithm.

	Figure Option View Download New Window
	Fig. 7. The trajectories of system states x₁, x₂, and x₃ in the value iteration algorithm.

	Figure Option View Download New Window
	Fig. 8. The trajectories of controls u₁, u₂, and u₃ in the value iteration algorithm.

From Fig. 6, we can see that the iterative performance index function is monotonically non-decreasing and converges to the optimum. The convergence properties of value and policy iteration algorithms are inherently different. On the other hand, from Figs. 7 and 8, we can see that there exit unstable system states and control laws in the value iteration algorithm. From Figs. 3 and 5, we can see that the chaotic system (54) can track the desired trajectories with the policy iteration algorithm, which illustrates the effectiveness of the developed algorithm.

5. Conclusion

We have developed an optimal tracking control method for chaotic systems by policy iteration. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. Next, the policy iteration algorithm for the transformed chaotic systems are presented. The convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under the present iterative control law and the iterative performance index function simultaneously converges to the optimum. Finally, the effectiveness of the developed optimal tracking control scheme for chaotic systems is verified by a simulation.

Reference

View Option

1	Liu H, Yu H and Xiang W 2012 Chin. Phys. B 21 120505 DOI:10.1088/1674-1056/21/12/120505 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
2	Ma T and Fu J 2011 Chin. Phys. B 20 050511 DOI:10.1088/1674-1056/20/5/050511 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
3	Ma T, Fu J and Sun Y 2010 Chin. Phys. B 19 090502 DOI:10.1088/1674-1056/19/9/090502 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
4	Ma T, Zhang H and Fu J 2008 Chin. Phys. B 17 4407 DOI:10.1088/1674-1056/17/12/013 [Cited within:2] [JCR: 1.148] [CJCR: 1.2429]
5	Yu X, Ren Z and Xu C 2014 Chin. Phys. B 23 040201 DOI:10.1088/1674-1056/23/4/040201 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
6	Zhang H, Ma T, Fu J and Tong S 2009 Chin. Phys. B 18 3751 DOI:10.1088/1674-1056/18/9/023 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
7	Zheng Z, Zheng P and Ge H 2014 Chin. Phys. B 23 020503 DOI:10.1088/1674-1056/23/2/020503 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
8	Chen S and Lu J 2002 Chaos Soliton. Fract. 14 643 DOI:10.1016/S0960-0779(02)00006-1 [Cited within:1] [JCR: 1.246]
9	Zhang H, Huang W, Wang Z and Chai T 2006 Phys. Lett. A 350 363 DOI:10.1016/j.physleta.2005.10.033 [Cited within:1] [JCR: 1.11]
10	Zhang H, Wang Z and Liu D 2004 Int. J. Bifur. Chaos 14 3505 DOI:10.1142/S0218127404011442 [Cited within:1]
11	Li X, Yang K and Wang Y 2011 Chin. Phys. B 20 074301 DOI:10.1088/1674-1056/20/7/074301 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
12	Murray J, Cox C, Lendaris G and Saeks R 2002 IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 32 140 DOI:10.1109/TSMCC.2002.801727 [Cited within:4] [JCR: 2.548]
13	Wei Q and Liu D 2012 Neural Networks 32 236 DOI:10.1016/j.neunet.2012.02.027 [Cited within:1] [JCR: 1.927]
14	Wei Q, Wang D and Zhang D 2013 Neural Computing & Applications 23 1851 DOI:10.1016/j.neuroimage.2014.12.059 [Cited within:1]
15	Liu D and Wei Q 2013 IEEE Transactions on Cybernetics 43 779 DOI:10.1109/TSMCB.2012.2216523 [Cited within:1]
16	Kiumarsi B, Lewis F, Modares H, Karimpour A and Naghibi-Sistani MB 2014 Automatica 50 1167 DOI:10.1016/j.automatica.2014.02.015 [Cited within:1] [JCR: 2.919]
17	Werbos P 1977 General Systems Yearbook 22 25 [Cited within:1]
18	Werbos P 1991 Neural Networks for ControlMiller W Sutton R Werbos P Boston MIT Press 67 [Cited within:1]
19	Heydari A and Balakrishnan S 2013 IEEE Transactions on Neural Networks and Learning Systems 24 145 DOI:10.1109/TNNLS.2012.2227339 [Cited within:1]
20	Wei Q and Liu D 2013 IET Control Theory & Applications 7 1472 DOI:10.1049/iet-syb.2013.0012 [Cited within:1]
21	Wei Q and Liu D 2014 Neural Computing & Applications 24 1355 DOI:10.1016/j.neuroimage.2014.12.059 [Cited within:1]
22	Prokhorov D and Wunsch D 1997 IEEE Transactions on Neural Networks 8 997 DOI:10.1109/72.623201 [Cited within:2] [JCR: 2.952]
23	Wei Q, Wang F, Liu D and Yang X 2014 IEEE Transactions on Cybernetics 44 2820 DOI:10.1109/TCYB.2014.2354377 [Cited within:1]
24	Wei Q and Liu D 2015 Neurocomputing 149 106 DOI:10.1016/j.neucom.2013.09.069 [Cited within:1] [JCR: 1.634]
25	Wei Q, Zhang H, Liu D and Zhao Y 2010 Acta Automatica Sinica 36 121 DOI:10.3724/SP.J.1004.2010.00121 [Cited within:1] [CJCR: 0.572]
26	Wei Q, Liu D, Shi G and Liu Y 2015 IEEE Transactions on Industrial Electronicsaccept article in press DOI:10.1109TIE.2014.2388198 [Cited within:1] [JCR: 5.165]
27	Wei Q, Liu D and Xu Y 2015 Soft Computingarticle in press DOI:10.1007s00500-014-1533-0 [Cited within:1] [JCR: 1.124]
28	Werbos P 1992 Hand book of Intelligent Control: Neural, Fuzzy, and Adaptive ApproachesWhite D and Sofge D New York Van Nostrand Reinholdchapter 13 [Cited within:2]
29	Enns R and Si J 2003 IEEE Transactions on Neural Networks 14 929 DOI:10.1109/TNN.2003.813839 [Cited within:1] [JCR: 2.952]
30	Si J and Wang Y 2001 IEEE Transactions on Neural Networks 12 264 DOI:10.1109/72.914523 [Cited within:2] [JCR: 2.952]
31	Wei Q, Liu D and Shi G 2015 IEEE Transactions on Industrial Electronicsaccept article in press DOI:10.1109TIE.2014.2361485 [Cited within:1] [JCR: 5.165]
32	Abu-Khalaf M and Lewis F 2005 Automatica 41 779 DOI:10.1016/j.automatica.2004.11.034 [Cited within:2] [JCR: 2.919]
33	Wei Q, Zhang H and Dai J 2009 Neurocomputing 72 1839 DOI:10.1016/j.neucom.2008.05.012 [Cited within:1] [JCR: 1.634]
34	Zhang H, Song R, Wei Q and Zhang T 2011 IEEE Transactions on Neural Networks 22 1851 DOI:10.1109/TNN.2011.2172628 [Cited within:1] [JCR: 2.952]
35	Zhang H, Wei Q and Liu D 2011 Automatica 47 207 DOI:10.1016/j.automatica.2010.10.033 [Cited within:1] [JCR: 2.919]
36	Zhang H, Wei Q and Luo Y 2008 IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38 937 DOI:10.1109/TSMCB.2008.920269 [Cited within:4] [JCR: 3.236]
37	Song R, Xiao W and Wei Q 2014 Chin. Phys. B 23 050504 DOI:10.1088/1674-1056/23/5/050504 [Cited within:1] [JCR: 1.148] [CJCR: 1.2429]
38	Wei Q and Liu D 2014 IEEE Transactions on Industrial Electronics 61 6399 DOI:10.1109/TIE.2014.2301770 [Cited within:1] [JCR: 5.165]
39	Wei Q and Liu D 2014 IEEE Transactions on Automation Science and Engineering 11 1020 DOI:10.1109/TASE.2013.2284545 [Cited within:1] [JCR: 1.674]
40	Wei Q and Liu D 2014 IEEE Transactions on Automation Science and Engineering 11 1176 DOI:10.1109/TASE.2013.2280974 [Cited within:1] [JCR: 1.674]
41	Lewis F, Vrabie D and Vamvoudakis K 2012 IEEE Control Systems 32 76 DOI:10.1109/MCS.2012.2214134 [Cited within:1] [JCR: 2.372]
42	Lincoln B and Rantzer A 2006 IEEE Transactions on Automatic Control 51 1249 DOI:10.1109/TAC.2006.878720 [Cited within:1] [JCR: 2.718]
43	Al-Tamimi A, Lewis F and Abu-Khalaf M 2008 IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 38 943 DOI:10.1109/TSMCB.2008.926614 [Cited within:5] [JCR: 3.236]
44	Song R, Xiao W, Sun C and Wei Q 2013 Chin. Phys. B 22 090502 DOI:10.1088/1674-1056/22/9/090502 [Cited within:2] [JCR: 1.148] [CJCR: 1.2429]
45	Liu D and Wei Q 2014 IEEE Transactions on Neural Networks and Learning Systems 25 621 DOI:10.1109/TNNLS.2013.2281663 [Cited within:3]
46	Héno M 1976 Communications in Mathematical Physics 50 69 DOI:10.1007/BF01608556 [Cited within:1] [JCR: 1.971]
47	Lu J, Wu X, Lü J and Kang L 2004 Chaos Soliton. Fract. 22 311 DOI:10.1016/j.chaos.2004.01.010 [Cited within:1] [JCR: 1.246]

2012

1.148

1.2429

... ^{[1#cod#x2013 ...}

2011

1.148

1.2429

... 7] For most control methods of chaotic systems, such as impulsive control methods^{[2#cod#x2013 ...}

2010

1.148

1.2429

2008

1.148

1.2429

, Zhang

and Fu

2008 Chin. Phys. B 17 4407 DOI:10.1088/1674-1056/17/12/013

This paper is devoted to investigating the scheme of exponentialsynchronization for uncertain stochastic impulsive perturbed chaoticLur'e systems. The parametric uncertainty is assumed to be normbounded. Based on the Lyapunov function method, time-varying delayfeedback control technique and a modified Halanay inequality forstochastic differential equations, several sufficient conditions arepresented to guarantee the exponential synchronization in meansquare between two identical uncertain chaotic Lur'e systems withstochastic and impulsive perturbations. These conditions areexpressed in terms of linear matrix inequalities (LMIs), which caneasily be checked by utilizing the numerically efficient Matlab LMItoolbox. It is worth pointing out that the approach developed inthis paper can provide a more general framework for thesynchronization of multi--perturbation chaotic Lur'e systems, whichreflects a more realistic dynamics. Finally, a numerical example isprovided to demonstrate the effectiveness of the proposed method.

School of Information Science and Engineering, NortheasternUniversity, Shenyang 110004, China

... 4,6] and adaptive synchronization control methods,^{[8#cod#x2013 ...}

... Consider the following chaotic system:^[4] ...

2014

1.148

1.2429

2009

1.148

1.2429

Zhang

, Ma

, Fu

and Tong

2009 Chin. Phys. B 18 3751 DOI:10.1088/1674-1056/18/9/023

In this paper, an improved impulsive lag synchronization scheme fordifferent chaotic systems with parametric uncertainties is proposed.Based on the new definition of synchronization with error bound anda novel impulsive control scheme (the so-called dual-stage impulsivecontrol), some new and less conservative sufficient conditions areestablished to guarantee that the error dynamics can converge to apredetermined level, which is more reasonable and rigorous than theexisting results. In particular, some simpler and more convenientconditions are derived by taking the same impulsive distances andcontrol gains. Finally, some numerical simulations for the Lorenz systemand the Chen system are given to demonstrate the effectiveness andfeasibility of the proposed method.

a Department of Mathematics and Physics, LiaoningUniversity of Technology, Jinzhou 121001, China; b Key Laboratory of Integrated Automation for theProcess Industry,Ministry of Education, Northeastern University, Shenyang 110004, China;School of Information Science and Engineering,Northeastern University,Shenyang 110004, China; c School of Information Science and Engineering,Northeastern University,Shenyang 110004, China

... 4,6] and adaptive synchronization control methods,^{[8#cod#x2013 ...}

2014

1.148

1.2429

... 7] For most control methods of chaotic systems, such as impulsive control methods^{[2#cod#x2013 ...}

2002

1.246

0.0

Chen

and Lu

2002 Chaos Soliton. Fract. 14 643 DOI:10.1016/S0960-0779(02)00006-1

Abstract A new adaptive control method is proposed for adaptive synchronization of two uncertain chaotic systems, using a specific uncertain unified chaotic model as an example for illustration.

... 4,6] and adaptive synchronization control methods,^{[8#cod#x2013 ...}

2006

1.11

0.0

2004

0.0

... 10] only the stability properties of the chaotic systems are considered ...

2011

1.148

1.2429

... ^[11] However, due to the difficulties related to solving the time-varying Hamilton#cod#x2013 ...

2002

2.548

0.0

... ^{[12#cod#x2013 ...}

... [12] and [32] ...

... [12], Murray et al ...

... [12], the convergence and admissibility properties of the continuous-time policy iteration algorithm are discussed ...

2012

1.927

0.0

2013

0.0

2013

0.0

2014

2.919

0.0

... 16] Among those approximate approaches, the adaptive dynamic programming (ADP) algorithm, proposed by Werbos,^[17,18] has played an important role in seeking the approximate solutions of dynamic programming problems ...

1977

0.0

1991

0.0

2013

0.0

... ^{[19#cod#x2013 ...}

2013

0.0

2014

0.0

... 21] There are several synonyms used for ADP, including adaptive critic designs,^[22] adaptive dynamic programming,^{[23#cod#x2013 ...}

1997

2.952

0.0

... 21] There are several synonyms used for ADP, including adaptive critic designs,^[22] adaptive dynamic programming,^{[23#cod#x2013 ...}

... [22] and [28], ADP approaches were classified into several main schemes, which are heuristic dynamic programming (HDP), action-dependent HDP (ADHDP), dual heuristic dynamic programming (DHP), action-dependent DHP (ADDHP), which is also called Q-learning,^[31] globalized DHP (GDHP), and action-dependent GDHP (ADGDHP) ...

2014

0.0

... 21] There are several synonyms used for ADP, including adaptive critic designs,^[22] adaptive dynamic programming,^{[23#cod#x2013 ...}

2015

1.634

0.0

2010

0.0

0.572

Wei

, Zhang

, Liu

and Zhao

2010 Acta Automatica Sinica 36 121 DOI:10.3724/SP.J.1004.2010.00121

In this paper, an optimal control scheme for a class of nonlinear systems with time delays in both state and control variables with respect to a quadratic performance index function is proposed using a new iterative adaptive dynamic programming (ADP) algorithm. By introducing a delay matrix function, the explicit expression of the optimal control is obtained using the dynamic programming theory and the optimal control can iteratively be obtained using the adaptive critic technique. Convergence analysis is presented to prove that the performance index function can reach the optimum by the proposed method. Neural networks are used to approximate the performance index function, compute the optimal control policy, solve delay matrix function, and model the nonlinear system, respectively, for facilitating the implementation of the iterative ADP algorithm. Two examples are given to demonstrate the validity of the proposed optimal control scheme.

1. Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P.R. China2. School of Information Science and Engineering, Northeastern University, Shenyang 110004, P.R. China3. Department of Automatic Control Engineering, Shenyang Institute of Engineering, Shenyang 110136, P.R. China

针对一类状态和控制变量均带有时滞的非线性系统的带有二次性能指标函数最优控制问题, 本文提出了一种基于新的迭代自适应动态规划算法的最优控制方案. 通过引进时滞矩阵函数, 应用动态规划理论, 本文获得了最优控制的显式表达式, 然后通过自适应评判技术获得最优控制量. 本文给出了收敛性证明以保证性能指标函数收敛到最优. 为了实现所提出的算法, 本文采用神经网络近似性能指标函数、计算最优控制策略、求解时滞矩阵函数、以及给非线性系统建模. 最后本文给出了两个仿真例子说明所提出的最优策略的有效性.

2015

5.165

0.0

2015

1.124

0.0

... 27] approximate dynamic programming,^[28] neural dynamic programming,^[29] and reinforcement learning ...

1992

0.0

... 27] approximate dynamic programming,^[28] neural dynamic programming,^[29] and reinforcement learning ...

2003

2.952

0.0

... 27] approximate dynamic programming,^[28] neural dynamic programming,^[29] and reinforcement learning ...

2001

2.952

0.0

... ^[30] In Refs ...

... [30] and [45], which are omitted here ...

2015

5.165

0.0

2005

2.919

0.0

Abu-Khalaf

and Lewis

2005 Automatica 41 779 DOI:10.1016/j.automatica.2004.11.034

Abstract The Hamilton–Jacobi–Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic functional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function of this HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neural network is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region of attraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equation converges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.

... ^{[32#cod#x2013 ...}

... [12] and [32] ...

2009

1.634

0.0

2011

2.952

0.0

2011

2.919

0.0

Zhang

, Wei

and Liu

2011 Automatica 47 207 DOI:10.1016/j.automatica.2010.10.033

Abstract In this paper, a new iterative adaptive dynamic programming (ADP) method is proposed to solve a class of continuous-time nonlinear two-person zero-sum differential games. The idea is to use the ADP technique to obtain the optimal control pair iteratively which makes the performance index function reach the saddle point of the zero-sum differential games. If the saddle point does not exist, the mixed optimal control pair is obtained to make the performance index function reach the mixed optimum. Stability analysis of the nonlinear systems is presented and the convergence property of the performance index function is also proved. Two simulation examples are given to illustrate the performance of the proposed method.

... ^[35] In Ref ...

2008

3.236

0.0

... ^[36] Song et al ...

... ^[36,43,44] For each of the iterative controls u^[i](x_k), i #cod#x003D ...

... [36] and [43] is invalid for the policy iteration ...

... ^[36] Let the value iteration algorithm be implemented by Eq ...

2014

1.148

1.2429

2014

5.165

0.0

2014

1.674

0.0

2014

1.674

0.0

... 40] There are two main iterative ADP algorithms,^[41] which are based on value and policy iterations, respectively ...

2012

2.372

0.0

... 40] There are two main iterative ADP algorithms,^[41] which are based on value and policy iterations, respectively ...

2006

2.718

0.0

... [42] ...

2008

3.236

0.0

... Al-Tamimi and Lewis^[43] studied the deterministic discrete-time affine nonlinear systems and a value iteration algorithm, which is referred to as HDP and proposed for finding the optimal control law ...

... [43], it was proved that J^[i](x_k) is a nondecreasing sequence and bounded, and hence converges to J#cod#x002A ...

... ^[36,43,44] For each of the iterative controls u^[i](x_k), i #cod#x003D ...

... ^[43] We can define #cod#x003A9 ...

... [36] and [43] is invalid for the policy iteration ...

2013

1.148

1.2429

Song

, Xiao

, Sun

and Wei

2013 Chin. Phys. B 22

090502

DOI:10.1088/1674-1056/22/9/090502

Effects of an ultra-strong magnetic field on electron capture rates for Co-55 are analyzed in the nuclear shell model and under the Landau energy levels quantized approximation in the ultra-strong magnetic field, and the electron capture rates on 10 abundant iron group nuclei at the surface of a magnetar are given. The results show that electron capture rates on Co-55 are increased greatly in the ultra-strong magnetic field, by about 3 orders of magnitude generally. These conclusions play an important role in future study of the evolution of magnetars.

Du Jun 1 ;Li Ping-Ping 1 ;Luo Xia 1,2 ;

Effects of ultra-strong magnetic field on electron capture rates for 55 Co are analyzed in the nuclear shell model and under the Landau energy levels quantized approximation in the ultra-strong magnetic field, and the electron capture rates on 10 abundant iron group nuclei at the surface of magnetar are given. The results show that electron capture rates on 55 Co are increased greatly in the ultra-strong magnetic field, by about 3 orders of magnitude generally. These conclusions play an important role in future studying the evolution of magnetar.

... ^[44] For most previous value iteration algorithms of ADP, the initial performance index function is chosen as zero ...

... ^[36,43,44] For each of the iterative controls u^[i](x_k), i #cod#x003D ...

2014

0.0

... [45], the policy iteration algorithm for discrete-time systems was displayed and the stability and convergence properties were discussed ...

... [45], we develop a policy iteration ADP algorithm for the first time to solve the optimal tracking controls of discrete-time chaotic systems ...

... [30] and [45], which are omitted here ...

1976

1.971

0.0

... non systems^[46] and the new discrete chaotic system proposed in Ref ...

2004

1.246

0.0

, Wu

, Lü

and Kang

2004 Chaos Soliton. Fract. 22 311 DOI:10.1016/j.chaos.2004.01.010

Abstract A new discrete chaotic system with rational fraction is introduced in this paper. The dynamical behaviors, including the symmetry, value domain, equilibrium point, route to chaos, period-doubling bifurcation, and Lyapunov exponents spectrum, are further investigated. Moreover, we study the tracking and control problems of this new system. Also, numerical simulations are presented to show the effectiveness and feasibility of the developed controller.

... [47] ...