Conjugate Gradient Approach for Discrete Time Optimal Control Problems with Model-Reality Differences

In this chapter, an efficient computation approach is proposed for solving a general class of discrete-time optimal control problems. In our approach, a simplified optimal control model, which is adding the adjusted parameters into the model used, is solved iteratively. In this way, the differences between the real plant and the model used are calculated, in turn, to update the optimal solution of the model used. During the computation procedure, the equivalent optimization problem is formulated, where the conjugate gradient algorithm is applied in solving the optimization problem. On this basis, the optimal solution of the modified model-based optimal control problem is obtained repeatedly. Once the convergence is achieved, the iterative solution approximates to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, both linear and nonlinear examples are demonstrated to show the performance of the approach proposed. In conclusion, the efficiency of the approach proposed is highly presented.


Introduction
Optimal control problems are existing in engineering and natural sciences for so long, and the applications of the optimal control have been well defined in the literature [1][2][3][4]. With the rapid evolution of computer technology, the development of the optimal control techniques is reached a mature level, from classical control to modern control, from proportional-integral-derivative (PID) control to feedback control, and from adaptive control to intelligent control [5][6][7][8]. The studies in the optimal control field are still progressing, and attract the interest of, not only engineers and applied mathematicians but also biologists and financialists, to investigate and contribute to the optimal control theory.
In particular, the optimal control algorithm, which integrates system optimization and parameter estimation, gives a new insight into the control community. This algorithm is known as the integrated system optimization and parameter estimation (ISOPE), and its dynamic version is called the dynamic ISOPE (DISOPE). Both of these algorithms were introduced by Robert [9][10][11], and Robert and Becerra [12][13][14], respectively. The basic idea of DISOPE is applying the model-based optimal control, which has different structures and parameters compared to the original optimal control problem, to obtain the correct optimal solution of the original optimal control problem, in spite of model-reality differences. Recently, this algorithm has been extended to cover both deterministic and stochastic versions, and it is known as an integrated optimal control and parameter estimation (IOCPE) algorithm [15,16]. On the other hand, the application of the optimization techniques, particularly, using the conjugate gradient method for solving the optimal control problem [17][18][19] has also been studied, where the open-loop control strategy is concerned [3,8].
In this chapter, the conjugate gradient approach [17,19] is employed to solve the linear model-based optimal control problem for obtaining the optimal solution of the original optimal control problem. In our approach, the simplified model, which is adding the adjusted parameters, is formulated initially. Then, an expanded optimal control problem, which combines the system dynamic and the cost function from the original optimal control problem into the simplified model, is introduced. By defining the Hamiltonian function and the augmented cost function, the corresponding necessary conditions for optimality are derived. Among these necessary conditions, a set of necessary conditions is for the modified model-based optimal control problem, a set of necessary conditions defines the parameter estimation problem, and a set of necessary conditions calculates the multipliers [15].
By virtue of the modified model-based optimal control problem, an equivalence optimization problem is defined, and the related gradient function is determined. With an initial control sequence, the initial gradient and the initial search direction are computed. Then, the control sequences are updated through the line search technique, where the gradient and search direction would satisfy the conjugacy condition [17,18]. During the iteration, the state and the costate are updated by the control sequence obtained from the conjugate gradient approach. When the convergence is achieved within a tolerance given, the iterative solution approximates to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, examples of linear and nonlinear cases, which are damped harmonic oscillator [7] and continuous stirred-tank chemical tank [8], are studied.
The chapter is organized as follows. In Section 2, the problem statement is described in detail, where the original optimal control problem and the simplified model are discussed. In Section 3, the methodology used is further explained. The necessary conditions for optimality are derived, and the use of the conjugate gradient method is delivered in solving the equivalence optimization problem. In Section 4, examples of a damped harmonic oscillator and a continuous stirred-tank chemical reactor are studied. The results show the efficiency of the algorithm proposed. Finally, concluding remarks are made.

Problem statement
Consider a general class of the discrete-time nonlinear optimal control problem, given by where u k ð Þ ∈ ℜ m , k ¼ 0, 1, ⋯, N À 1, and x k ð Þ ∈ ℜ n , k ¼ 0, 1, ⋯, N, are the control sequences and the state sequences, respectively, while f : ℜ n Â ℜ m Â ℜ ! ℜ n represents the real plant, L : ℜ n Â ℜ m Â ℜ ! ℜ is the cost under summation, and φ : ℜ n Â ℜ ! ℜ is the terminal cost. Here, J 0 is the scalar cost function, and x 0 is the known initial state vector. It is assumed that all functions in (1) are continuously differentiable with respect to their respective arguments.
This problem, which is referred to as Problem (P), is regarded as the real optimal control problem. Due on the complex and nonlinear structure, solving Problem (P) actually requires the efficient computation techniques. For this reason, the simplified model of Problem (P) is identified to be solved such that the true optimal solution of Problem (P) could be approximated. Hence, this simplified model-based optimal control problem is defined as follows: where γ k ð Þ, k ¼ 0, 1, ⋯, N, and α k ð Þ, k ¼ 0, 1, ⋯, N À 1, are introduced as the adjusted parameters, whereas A is an n Â n transition matrix, and B is an n Â m control coefficient matrix. Besides, S N ð Þ and Q are n Â n positive semi-definite matrices, and R is a m Â m positive definite matrix. Here, J 1 is the scalar cost function.
Let this problem is referred to as Problem (M). It can be seen that, because of the different structures and parameters, only solving Problem (M) would not obtain the optimal solution of Problem (P) for not taking the adjusted parameters into account. Notice, adding the adjusted parameters into Problem (M) could let us calculate the differences between the real plant and the model used. On this basis, Problem (M) would be solved iteratively to give the correct optimal solution of Problem (P), in spite of model-reality differences.

System optimization with parameter estimation
Now, an expanded optimal control problem, which combines the real plant and the cost function in Problem (P) into Problem (M) and is referred to as Problem (E), is introduced by where v k ð Þ ∈ ℜ m , k ¼ 0, 1, ⋯, N À 1, and z k ð Þ ∈ ℜ n , k ¼ 0, 1, ⋯, N, are introduced to separate the sequences of control and state in the optimization problem from the respective signals in the parameter estimation problem, and k Á k denotes the usual Euclidean norm. The terms 1 2 r 1 u k ð Þ À v k ð Þ k k 2 and 1 2 r 2 x k ð Þ À z k ð Þ k k 2 with r 1 , r 2 ∈ ℜ are introduced to improve the convexity and to facilitate the convergence of the resulting iterative algorithm. Here, we clarify that the algorithm is designed such that the constraints v k ð Þ ¼ u k ð Þ and z k ð Þ ¼ x k ð Þ are satisfied upon termination of the iterations, assuming that convergence is achieved. Moreover, the state constraint z k ð Þ and the control constraint v k ð Þ are used for the computation of the parameter estimation and matching scheme, while the corresponding state constraint x k ð Þ and control constraint u k ð Þ are reserved for optimizing the model-based optimal control problem. Therefore, system optimization and parameter estimation are declared and mutually integrated.

Necessary conditions for optimality
Define the Hamiltonian function for Problem (E), given by:  (4), write the cost function in (3) to be the augmented cost function, that is, where p k ð Þ, ξ k ð Þ, λ k ð Þ, β k ð Þ, μ k ð Þ and Γ are the appropriate multipliers to be determined later.
Applying the calculus of variation [7,9,11,13,15] to the augmented cost function in (5), the following necessary conditions for optimality are obtained: (a) Stationary condition: (c) State equation: (e) Adjusted parameter equations: (f) Modifier equations: Notice that for the optimality necessary conditions obtained above, they are divided into three sets of necessary conditions. The first set of necessary conditions in (6)-(9) is the necessary conditions for the system optimization problem. The second set of necessary conditions in (10)-(12) defines the parameter estimation problem. The third set of necessary conditions in (13)-(15) provides the computation of multipliers. In fact, the necessary conditions, which are defined in (6)- (9), are the optimality for the modified model-based optimal control problem, and the adjusted parameters, which are calculated from the necessary conditions in (10)- (12), measure the differences between the real plant and the model used.

Modified model-based optimal control problem
As a consequence, the modified model-based optimal control problem, which is referred to as Problem (MM), is defined by with the specified α k ð Þ, γ k ð Þ, λ k ð Þ, β k ð Þ, Γ, v k ð Þ and z k ð Þ, where the boundary conditions are given by x 0 and p N ð Þ with the specified multiplier Γ. It is obvious that Problem (MM), which is derived from Problem (E), is a modification of optimal control problem and is also known as a modified linear quadratic regular problem. Importantly, the set of the necessary conditions in (6)- (9) for Problem (E) is the necessary conditions that are satisfied by Problem (MM). In addition, due to the quadratic criterion feature of the objective function, the conjugate gradient method [17,18], which is one of the numerical optimization techniques, could be applied to solve Problem (MM).

Conjugate gradient algorithm
For simplicity [19], establish Problem (MM) as a nonlinear optimization problem with the initial control given by u 0 ð Þ ¼ u k ð Þ 0 as follows: Let this problem as Problem (Q). Moreover, the Hamiltonian function defined in (4) is taken into consideration as an equivalent objective function. Hence, this Hamiltonian function allows the evaluation of the gradient function, which is the stationary condition in (6), and by using the iterative solution u i ð Þ ¼ u k ð Þ i to satisfy the state Eq. (8), which is solved forward in time, and the co-state Eq. (7), which is solved backward in time.
Define the gradient function g : ℜ m ! ℜ m as which is represented by the stationary condition in (6). For arbitrary initial control u 0 ð Þ ∈ ℜ m , the initial gradient and the initial search direction are calculated from The following line search equation is applied to update the control sequence: where a i ∈ ℜ is the step size, and its value can be determined from After that, the gradient and the search direction are updated by for i = 0, 1, 2, … represents the iteration numbers. From the discussion above, we present the result as a proposition given as follows: Proposition 1. Consider Problem (Q). The control sequence u i ð Þ , which is defined in (22) and is represented by is generated through a set of the search direction vector d i ð Þ whose components are linearly independent. Also, the direction d i ð Þ is conjugacy.
The conjugate gradient algorithm is summarized below: Conjugate gradient algorithm Data: Choose the arbitrary initial control u 0 ð Þ and the tolerance ε.
Step 0: Compute the initial gradient g 0 ð Þ from (20) and the initial search direction d 0 ð Þ from (21), respectively. Set i ¼ 0: Step 1: Solve the state Eq. (8) forward in time from k ¼ 0 to k ¼ N with the initial condition (9) to obtain x k ð Þ i , k ¼ 0, 1, ⋯, N: Step 2: Solve the costate Eq. (7) backward in time from k ¼ N to k ¼ 0 with the boundary condition (9), where p k ð Þ i is the solution obtained.
Step 3: Calculate the value of the cost functional J 3 u i ð Þ À Á from (17).
Step 4: Solve (23) to obtain the step size a i .
Step 6: Evaluate the gradient g iþ1 a.
Step 0 is the preliminary step for setting the initial search direction based on the gradient direction in using the conjugate gradient algorithm.
b. Steps 1, 2, and 3 are performed to solve the system optimization by using the corresponding control sequence u i ð Þ .
c. Steps 4, 5, and 6 are the computation steps in implementing the conjugate direction.

Iterative calculation procedure
Accordingly, Problem (Q ) is solved by using the conjugate gradient algorithm. Indeed, the solution procedure for system optimization with parameter estimation could be described by joining the conjugate gradient algorithm with the parameters estimated. A summary of the calculation procedure including the principle of model-reality differences is listed as follows: Iterative algorithm based on model-reality differences Data: A, B, Q, R, S N ð Þ, N, x 0 , r 1 , r 2 , k v , k z , k p , f , L: Note that A and B could be determined based on the linearization of f at x 0 or from the linear terms of f .
Step 2: Compute the modifiers Γ i , λ k ð Þ i and β k ð Þ i , k ¼ 0, 1, ⋯, N À 1, from (13)- (15). Notice that this step requires taking the derivatives of f and L with respect to v k ð Þ i and z k ð Þ i : and z k ð Þ i , solve Problem (Q ) using the conjugate gradient algorithm. This is called the system optimization step.
Step 4: Test the convergence and update the optimal solution of Problem (P). In order to provide a mechanism for regulating convergence, a simple relaxation method is employed: within a given tolerance, stop; else set i ¼ i þ 1, and repeat the procedure starting from Step 1.
Remark 2: a. In Step 0, the nominal solution could be obtained by using the standard procedure of the linear quadratic regulator approach, where the feedback gain and the Riccati equation are calculated offline.
b. In Step 3, applying the conjugate gradient algorithm to obtain the new control sequence will give a good effect if the conjugacy of the search direction is satisfied.
c. In Step 4, the simple relaxation method in (27)-(29) is used, so that the matching scheme for the parameters and the optimal solution can be established.

Illustrative examples
In this section, two examples are studied. The first example is for optimizing and controlling a damped harmonic oscillator [7], and the second example is related to optimal control of a continuous stirred-tank chemical reactor [8]. The mathematical models of these examples are discussed, and their optimal solution is obtained by using the algorithm discussed in Section 3. Here, the algorithm is implemented in the Octave 5.1.0 environment.

Example 1: a damped harmonic oscillator
Consider a damped harmonic oscillator [7] given by with the natural frequency ω = 0.8, the damping ratio δ = 0.1, and the initial state x 0 ¼ 10 10 ð Þ T : Define the state x ¼ x 1 x 2 ð Þ T , where x 1 is the displacement and x 2 is the velocity. For the purpose of controlling this oscillator, the following objective function J 0 ð Þ ¼ 1 2 ð 9:4 0:0 is minimized. This problem is a continuous-time linear optimal control problem, and the equivalence discrete time optimal control problem, which is regarded as Problem (P), is given by: with the initial state x 0 ¼ 10 10 ð Þ T : and the sampling time Δt ¼ 0:94 s is taken for the discretization transform.
Consider the model-based optimal control problem, which is regarded as Problem (M), given by: with the initial state x 0 ¼ 10 10 ð Þ T , and the adjusted parameters γ k ð Þ, k ¼ 0, 1, ⋯, N, and α k ð Þ, k ¼ 0, 1, ⋯, N À 1, are supplied to the model used. By using the algorithm proposed, the simulation result is shown in Table 1. Notice that the minimum cost for Problem (M) is 546.05 units without adding the adjusted parameters. Once the adjusted parameters are taken into consideration, the iterative solution approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences. It is highlighted that there is a 99% of the cost reduction to obtain the final cost of 128.50 units. Figures 1 and 2 show the trajectories of control and state, respectively. With this control effort, the state reaches at the steady state after 4 units of time, which presents the oscillator stopped from moving. Figure 3 shows the changes of the costate at the first 2 units of time. The optimal solution obtained is verified by satisfying the stationary condition as shown in Figure 4. Figures 5 and 6 show the adjusted parameters after the convergence is achieved, where the model-reality differences are measured during the iterative procedure.      with the initial state x 0 ¼ 0:05 0:00 ð Þ T , and the adjusted parameters γ k ð Þ, k ¼ 0, 1, ⋯, N, and α k ð Þ, k ¼ 0, 1, ⋯, N À 1, are added into the model. Table 2 shows the simulation result obtained by using the algorithm proposed. It is mentioned that the minimum cost for the linear model-based optimal control problem is 5.9589 units. At the beginning of the iteration calculation procedure, the initial cost is 0.147463 unit, and a 90% of cost reduction is addressed to give the final cost of 0.014167 unit.
The trajectories of the final control and the final state are, respectively, shown in Figures 7 and 8. It is noted that the state reaches to the steady state after 40 units of time by associating the control effort taken. This situation indicates that the temperature and the concentration are maintained at their steady state. Thus, the desired objective is confirmed. Figure 9 shows the costate behavior, which is       reduced gradually to zero at the terminal time, and Figure 10 shows the stationary condition, which examines the existing of the optimal solution. The adjusted parameters, which are shown in Figures 11 and 12, respectively, measure the differences between the model used and the real plant. Hence, the correct optimal solution of Problem (P) is approximated successfully by solving the model in Problem (M), and the efficiency of the algorithm proposed is demonstrated.

Concluding remarks
The approach, which integrates system optimization and parameter estimation, was discussed in this chapter. The use of the conjugate gradient method in solving the model-based optimal control problem has been examined, and the applicability of the conjugate gradient approach in associating the principle of model-reality differences was identified. Definitely, many computational approaches could be used to solve the model-based optimal control; however, the algorithm proposed in this chapter gives a tractable solution procedure for handling the optimal control problems with different structures and parameters, especially for obtaining the optimal solution for the nonlinear optimal control problem. In conclusion, the efficiency of the algorithm is highly recommended. In future research, it is strongly suggested to investigate the application of optimization techniques in stochastic optimization and control.