Neural Network Implementation of Non Linear Control Using Radial Basis Functions

This research is concerned with the design of radial basis function neural networks to implement a controller for nonlinear systems. Nonlinear systems are of particular interest given the fact that most real life systems are nonlinear in nature and control schemes for such systems are not as developed as their linear counterparts and involves a lot of heuristics. We show the ability of radial basis function networks (RBF) to serve as a single unifying model incorporating both nonlinear and linear methodologies. We focus on the problem of the inverted pendulum on a cart system, which is a classic problem in a lot of control literatures. The problem is to swing the pendulum from a given initial state, which is typically the hanging down position, to the up position and then to keep it balanced in the up position. In swinging the pendulum, the cart to which the pendulum is attached is moved back and forth on a track until the pendulum is in the up position. This system is a very useful model in that it demonstrates a multi-variable highly nonlinear system that belongs to a class of nonlinear systems that cannot be controlled by traditional nonlinear techniques such as feedback linearization. In training the RBF network, we explore several different control schemes to produce the training data. These control schemes could also be easily extrapolated to work with other multi-variable nonlinear systems. We first design a neural controller for the second order system describing the pendulum dynamics only. The controller is able to drive the state variables from any permissible state of the system to zero, and to keep it stabilized in that equilibrium state. Secondly, we again show the network's ability to implement nonlinear control of the fourth order pendulum/ cart system. We


Introduction
The work reported in this thesis represents a subset of a rapidly growing interest in the use of neural networks as a paradigm for the control of nonlinear systems or the representation of systems in system identification problems. Recent research [1], [2], [3], [4], [5], [6] has attempted to define the role of neural networks in control theory.
A major focus of these research efforts has been to establish a mathematical formulation of these network architectures through which a general control methodology could be developed. It is worth noting that neural networks are natural to use for nonlinear control and identification methods due to the fact that these networks lend themselves easily in performing nonlinear mappings in multidimensional space. Assuming that there exists an input-output mapping that achieves the control objective, the network is trained in a supervised fashion by modifying the synaptic weights so as to minimize the difference between the desired response and the actual response produced by the input signal. In addition, neural networks are able to generalize; generalization refers to the networks producing reasonable outputs for inputs not encountered during training (learning). This implies that neural networks are robust and can be easily retrained to adapt their synaptic weights to compensate for minor changes in the environmental conditions under which they are operating. In this day and age of parallel and distributed computation, these networks lend themselves easily as practical tools both in hardware and software implementations. In the hardware form neural networks have the potential to be inherently fault tolerant in the sense 1 that its performance is gracefully degraded under adverse operating conditions. For example, if a neuron or its connecting links are damaged, recall of a stored pattern is impaired in quality. However, due to the distributed nature of information in the network, the damage has to be extensive before the overall performance of the network is degraded seriously.
Consider a system described by the following equations; where xis a vector-valued state vector, J(.), g(.) and h(.) are nonlinear functions and u and y are the input and output of the system, respectively. The objective of the control problem is to determine the input, u, so that the system behaves in a desired fashion. For example, there are two ways in which the system (plant) can be controlled: regulation and tracking. In the former, the main goal is to stabilize the plant around a fixed operating point, typically referred to as an equilibrium point.
In the latter, the aim is to make the output, y, follow an input signal asymptotically.
As stated above the goal of the control problem is to make the plant behave in a certain, deterministic way. But the nature of the plant itself provides the framework for the control mechanism. Systems are generally characterized as being either linear or nonlinear. Linear control techniques have over the past few decades been well documented and successfully implemented [7] and [8].

Problem Statement
As stated previously, the objective of a controller is to produce an input signal, preferably an optimal one, that would move the states of the system in desired trajectories.
The goal of this research is to design a neural controller for the inverted pendulum system that would swing the pendulum from any given state in its allowable state space to the equilibrium state of standing upright and drive all other state variables to zero . Furthermore, the controller should be self correcting given any perturbation.
This goal exceeds that of [4] which is to get the pendulum from the equilibrium state 3 of hanging down to the target equilibrium state of standing erect.
The mathematical model used for the inverted pendulum system is given by Vaccaro, [7]. A schematic of this system is shown in Fig. 1 where: The linearized state-space model of the inverted pendulum/ cart system is given by;

Pendulum-cart Problem
In the paper by Suykens et al [4], a control law is proposed using either a feed-forward or recurrent neural networks to switch a multi-variable nonlinear plant between equilibrium points and to stabilize the plant at the target equilibrium point. This network design incorporates a linear controller to stabilize the plant at the target equilibrium point. As an illustration of this control strategy Suykens uses the inverted pendulumcart system in which the task is to swing the pendulum from down to up and to locally stabilize it in the up position. In this section we present only the feedfoward network design given in that paper, as well :findings of [10] in designing and implementing this network to control the pendulum-cart system.
In order to determine the weights, Suykens suggests an optimization scheme in which optimal weights are found such that a cost function is minimized using a steepest gradient descent algorithm ( e.g Constr in Matlab). This training scheme does not require that an input-output mapping is known beforehand, but instead determines the mapping during training.
Given a nonlinear system, as in (1.1), the optimal control problem is to minimize a cost function over the weights of the network. The cost function is given by (1.9) where Tis the final time, ((x(t)) = x(t)Tx(t) (quadratic control) or ( = 0 (terminal control).
The input-output relationship of the neural network is given by where a is the maximum amplitude of the control signal, wT is the weight vector for the output layer, V is the weight matrix for the input layer. In the linear region the tanh can be dropped because tanh x ~ x if x is small. Therefore (1.10) in the linear region can be written as (1.11) If we let wT be a function of V such that (1.12) then (  Their results indicate the weight matrix, V, is highly dependent on the system parameters l , mt , and m. For example, the authors were able to determine the optimal weights that balanced the pendulum cart model used by Suykens, but when the pendulum half length was changed from 0.5m to 0.55m or the total mass from l.lkg to l.25kg the controller could not achieved the control objective given the same weight matrix V. Further results showed that the authors were unable to find the optimal weights to swing up and balance the pendulum-cart model given by Vaccaro (see (1.5)). In concluding this section, let us review the pros and cons of this neural control law,

Pros
The incorporation of the linear controller into the neural network is seamless and is mathematically well-defined. In addition, the optimal control input is determined during training of the network which means that input-output mappings are not needed in training like it would be if backpropagation or a least square training method had been used.

Cons
There is no guarantee a priori that the linearized region will be entered. The network may become stuck at a local minimal depending on the initial weights matrix chosen or the step size of the gradient descent algorithm. Furthermore, there is no reliable way to pick this initial weight matrix. Suykens suggest using a random martix which is normally distributed with a variance between 0 and 1. But as discussed in [10] this rule of thumb choice is not at all reliable.

Organization of Chapters
In chapter two, we present the mathematical model for radial basis neural networks and demonstrate their ability to generalize. Also, we show how these networks can be more efficient by using self-organizing neural networks to determine their centers.

8
In chapter three, we design a controller for the second order model for the pendulum system. We explore the use of the dynamic programming algorithm along with techniques from linear control theory to produce the training data for the network and we investigate two schemes for placing the centers, fixed selection of the centers and self-organized selection of the centers.
In chapter four, we present the controller for the fourth order model of the pendulum system. We utilize a new method to generate the training data for the network using the energy information of the system. Results are given using the self-organizing placement of centers scheme discussed in the previous two chapters.
Finally in chapter five, we summarize our work and propose ideas for future work.

Chapter 2
Neural Network Implementation of

Introduction
In this chapter we approach the design of a neural network as a curve-fitting or approximation problem to implement nonlinear functions in a high dimensional space.
According to this design strategy, training the network is equivalent to finding a surface in multidimensional space that provides the best fit to the training data. Correspondingly, generalization is equivalent to the use of this multidimensional surface to interpolate the test data.
On a historical note, Broomhead and Lowe, [12], were the first to use radial basis functions in the design of neural networks. Other major contributi9ns to the theory, design, and application of radial basis function networks include works by Moody and Darken, [13], Poggio and Girosi, [14] and Chen, [11].

Radial Basis Function Networks
The construction of a radial basis function network in its most basic form involves three different layers as shown in Fig The problem is to find an approximation Y = F( x) of the mapping (2.1) value for any argument x E P· From numerical analysis we know that the most convenient way of representing an unknown nonlinear function is to present it as a linear expansion .. ,Na } is a set of Na radial basis functions , 11 -11 denotes the Euclidean norm, Wi are weights of the expansion, and Ci ERP , i = 1, 2, ... ,Na are the centers of the radial basis functions.
Two commonly used radial basis functions are

217
(2.5) Theoretical investigations and practical results suggest that the choice of radial basis functions is not crucial to the performance of the RBF network [11]. Our choice of radial basis functions in this thesis is that of the Gaussian functions, which is generally expressed as (2 .6) where :E-1 is the inverse covariance matrix of the Gaussian distribution and can be expressed in terms of a norm weighting matrix C i, [1 6], [17] (2 .7)

Exact RBF Network
In the exact RBF network implementation of the mapping in (2.1), we set Na = N and we take the known data points xi, i = 1, 2, .. . , N to be the centers of the radial · functions. We can therefore rewrite (2.3) as basis :an be expressed in matrix notation as

Generalized RBF Network
In the generalized RBF network implementation of (2.1), we set Na :::; N and we consider that the centers of the network do not necessarily coincide with the training data points. The network expansion is depicted in (2.3).

13
Assuming the networks centers are known and fixed , let us fit the training set data in (2.2) using the network in eqn. [2.3]. Utilizing the same notations in (2.11) and (2.12) , we can represent the fitting problem in the regression form (2.14) where E = [ e i ... eN r is a residual error vector. Since <I> is not guaranteed to be well conditioned or even a full rank matrix, we will look for a regularized least squares solution to (2.14) that minimizes JJc: J J} + aJJE>JJ}, 0 <a:::; 1 (2.15) where II· II is the Frobenius norm and a is a scalar regularization parameter introduced to compensate for ill-conditioned problems. Solving eqns. [2.14]

Self Organizing Feature Map Networks
The performance of an RBF network critically depends upon the chosen centers. The RBF centers should suitably sample the input domain of the network and reflect the data distribution. Furthermore, due to obvio\ls reasons in considering real time implementations of these networks, it is prefered to have as few basis functions (reduction of the dimensionality of the hidden layer space) as possible-hence reducing the computational time of the network. But the question arises as how to best select appropriate centers.
In this section we propose using a self-organizing f eature-mapping (SOFM) algorithm, developed by Kohonen [1 9] in which the topography of the input domain is learned in an unsupervised fashion and the centers of the RBF network are then taken to be the weights of the SOFM. The SOFM algorithm draws striking resemblance to the k-means clustering algorithm, which is well documented in a lot of pattern classification literature [5] . To begin the discussion of the SOFM algorithm let us define an input matrix, X, representing the set of input vectors over time, denoted by (2.17) where To find the best match of the input vector x with the weight vectors Wj, we define the best matching criterion to be the minimum Euclidean distance between vectors.
i (Xi) = a r g min 11 Xiw j 11 , J j=l,2, ... , Q , i=l,2, ... N (2.19) where i(x) is the index that identifies the neuron that best matches the input vector. This neuron is classified as the the winning neuron and is part of a topological neighborhood, denoted by Ai(x) ( n). An example of a neighborhood topology is illustrated in Fig. (2.2). Given this winning neuron, the idea is then to adjust it along with its neighboring neurons to move closer to the input vector in a Euclidean sense.
Kohonen's SOFM algorithm is summarized by the following steps;

5.
Continuation. Continue with step 2 until no noticeable changes are observed.

A Nonlinear Function Implementation Example
Let us consider a system of equations given by 16 (2 .20) ( 2.21) where t = [ 0 0.1 ... 9. 9 10 ] and let us also define another function u to be the linear combination of x1 and x2 (2 .22) The task is to obtain a mapping x ---+ u.
In order to achieve this mapping, we first utilize the SOFM network to determine the locations of the centers of a generalized RBF network. We batched the input vectors as in ( 2 .1 7) We trained the SOFM network using the 101 data points for 1000 epochs 1 to produce 15 center locations. Fig (2 d is taken to be the average distance between the neighboring center nodes [3]. We set a= 0, because the interpolation matrix is well-conditioned. The Second Order Model

Introduction
Given the equations for the pendulum without regard to the dynamics of the cart and where u(t) is the equivalent to the acceleration of the cart , we would like in this chapter to design the RBF network such that it is optimal and balances the pendulum in the upright position given any initial state. For ease of computation the orientation of the pendulum is changed from the hanging down position x1 = 0 to x1 = 7r, and the upright position x1 = 7r to x1 = 0. The training data is generated by the dynamic programming algorithm developed by Richard Bellman [20].

Dynamic Programming
The method of dynamic programming is a process by which the performance measure of a system is minimized by using a concept called the principle of optimality. This principle is described as, [21]; An optimal policy that has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
The dynamic programming algorithm is summarized as follows; 1. Quantize the state space, considering the maximum possible ranges of state values. Each state variable, x, is quantized as follows; x Xmin + k6.x k = 0, 1, ... , M, M = Xmax -Xmin 6.x (3.2) This requires an a priori knowledge of the state space.
2. Quantize the control effort (input), considering the maximum allowable range.  the optimal control input is stored in a table or matrix. 20 3 Results from Dynamic Programming and the 3.

Umin s s
The performance index is given by; Once inside the linear region, switch to the linear controller. Let the system 2. 3.
run with the linear controller for a while so that the training data contain a few presentations in the linear region.
Repeat the two steps above starting with different initial conditions to obtain different trajectories.
The objective is to have these trajectories span as much of the state space as possible. 4. Use these optimal trajectories to train the network by setting up input and target matrices and solving for the weights in a least squares sense. 5. Design and insert a linear controller for the linear region to guarantee that the states become stabilized once they enter the linear region (see appendix A for the design for the pole placement linear controller) . To better approximate the linear controller add more centers in the linear region as shown in Fig. 3

.2.
We also see from Fig. 3.3 that the pendulum remains stabilized in the upright position. x
Given equation (2.16) in Chapter two the RBF network was designed with a chosen to beg x 10-7 . The training data for the network are the six trajectories shown in  As discussed earlier, the dynamic programming algorithm exploits the principle of optimality which in concise terms goes something like this; if path abc is optimal from a to c then path be is optimal from b to c. This concept is demonstrated in Fig.   3.6. Note that we inserted more basis functions about the linear region to ensure that the linear controller is better approximated.

Self-Organized Selection of Centers
In the previous section we placed the centers of the radial basis functions on a lattice of fixed interval grid points. This task was trivial given the dimensionality of the state space. However, in dimensions of three or higher, ones ability to visualize the state space of a system becomes impaired and placing centers on hyper-spheres is by no means trivial. We now use the SOFM algorithm discussed in the previous chapter and see how it compares with the results of fixed grid centers.
The SOFM network was trained with the same six trajectories used to train the fixed grid RBF as shown in Fig. 3.1. We used a single layered network with 160 neurons to produce the centers of the RBF network. T he result of that training for a thousand epochs is displayed in Fig. 3.8. The covariance matrix used is; where d = 2. 7 is the average distance between centers.  x

3.
In this section we briefly explore the possibility of using the results of the previous sections to implement a control law for the fourth order pendulum-cart system. Recall the equations for the cart dynamics from (1.5) Also recall from (3 .1) that u is equivalent to the acceleration of the cart. Substituting it for X4 in yields where u is the input to the cart system. Solving for u produces where x 4 is obtained by numerical integration of u : (3.8) (3.9) (3 .10) Observe that the cart dynamics are represented by linear equations whereas the pendulum dynamics are represented by nonlinear equations. Given that we have already determined the control inputs for the pendulum system using dynamic programming, which was fairly easy to do given that the plant was second order, we can now compute the input to the cart system. Figure 3.12 show a block diagram of this control by segmentation. It would seem that we have solved the fourth order problem and can pack our bags and go home, but that may be foolhardy. There are two serious drawbacks to this control scheme. The first is that there does not seem to be a way to place a threshold limit on u , which becomes a considerable point in hardware implementation. The second drawback is when the acceleration of t he cart goes to zero, the cart could still be moving at a constant velocity. Another drawback to this control law is the errors introduced in performing the integration in (3.1 0); more precisely in determining the constant of integration. The Fourth Order Model

Introduction
In Chapter two dynamic programming was used to generate the training data for the network and lend itself as a powerful nonlinear control design tool, producing optimal control t rajectories. However, for higher dimensions of three or greater dynamic programming becomes almost impractical and suffers from what Bellman [20] called the curse of dimensionality. What this means is that the number of quantized state vectors, which is the product of each quantized state variable, becomes exceedingly large requiring a lot of memory for storage and tremendously increases computational time. It is given this drawback that we present in this chapter a new control law that will produce the training data for the fourth order system (see Chapter one).

Energy Controller
In this section we develop a control law to regulate the swinging energy of the pendulum without regard to the cart dynamics. The resulting control system is such that the swinging energy will converge to the desired energy trajectory from almost all initial conditions. This controller design is based on a pap er by Chung and Hauser , [9] , in which they proposed a control law that would regulate the swing energy of the pendulum-on-a-cart system by maintaining a desired periodic orbit.
Given the equations for the pendulum dynamics in (1.5) i2( t) x2( t) -Asin(x1 (t)) -~ cos(x1 (t))(-Cx4(t) + Du(t)) ng ( 4.1) We would like to design a feedback control u so that the swing energy of the pendulum, defined by the kinetic and potential energy of the rod, is regulated to a desired swing energy fI. Note that B = x1 and w = x 2 . m and l are mass and length of the rod, respectively and g is acceleration due to gravity. H is a time-varying function that depends on the energy of the initial states, B(t0 ) and w(io), and on the energy at the final states, B(t1) = 7r and w(t1) = 0. fI is given by where f3 is a design parameter. This function is used to drive the total energy, H , gradually from the initial energy determined by the initial state to a final energy when the pendulum is inverted in the linear region and from where linear control can then be employed. For example, if the initial energy is lower than at the final state, fI would increase the total swinging energy until it reaches the final energy level. On the other hand, if the initial energy is higher than at .the final state, fI would decrease the total swing energy until it reaches the final energy level.
Next we define the error function

E(B,w) = H(B,w) -fI
and if we choose the feedback control law to be u = aw cos BE ( 4.4) then in the limit as t becomes large, E(t) goes to zero [9]. 32 2 1 Simulation Results of the Energy Controller 4 . . The energy control law by itself could not balance and stabilize the pendulum in the upright position (linear region); so we insert the pole placement linear controller (see appendix A) when the pendulum-cart system enters the linear region (see Chapter one) to stabilize and to compensate for minor perturbations in the system. Also , the energy controller has to be given a nonzero initial state or it will remain at rest with u == o.

RBF Network Training Simulation Results
We trained the RBF network using two trajectories, one whose initial state vector Xo = [i"" 0 , 0, 0, Of is in the nonlinear region, and the other whose initial state vector Xo = [(7r-0 .4), 0, 0, of is in the linear region. The latter is to ensure that the network has enough presentations in the linear region so that it would learn to mimic the linear controller.
We chose 100 radial basis functions whose centers are determined by the SOFM network, and a fixed inverse covariance matrix of these centers was selected to be

Summary
In this research we demonstrated the ability of radial basis function networks to implement control of nonlinear systems, given that there exists training data that achieves the design objectives. Furthermore, the RBF network controller was successful in meshing together seamlessly controllers from nonlinear control theory and linear control theory. We also showed how a self-organizing feature map can be used to place the centers for the RBF network, thereby making the network more efficient and possibly adaptive.
In both Chapters three and four, the control laws presented are able to balance and stabilize the pendulum-cart syste1? from all permissible initial states. The dynamic programming algorithm discussed in chapter three is a powerful control scheme that guarantees optimal results for any nonlinear control problem. Unlike gradient descent algorithms use in optimal control theory dynamic programming cannot get stuck in a region of local minimal, but instead produces global results. Furthermore, a nice feature of the algorithm is that we are able to put constraints on the control input as well as the state variables. The curse of dimensionality, which is the only major but considerable drawback of dynamic programming, limits the algorithm to lower dimensional problems because in higher dimensions computation become expensive in terms of a huge requirement for computer memory and processor time. On the other hand, the energy control law presented in chapter four is fast to converge to a result , but the result is not optimal and is specific to the pendulum problem.
In retrospect of this research to use RBF networks to implement nonlinear control systems, the author believes that the single most important thing to improve with these networks, is determining a strategy for selecting the spreads of the centers or inverse covariance matrix. An accurate determination of this parameter is crucial for the network's ability to generalize and achieve the design objectives. For dimensions in which visualization of the distribution of the network's centers is impossible, such a strategy is very much needed. However, if generalization and minimal network configuration are not issues of the application of interest, then exact RBF networks, which are not sensitive to this covariance parameter, can be used.

Further Work
As noted in the concluding remarks of the previous section, the single most important thing that needs improvement with the generalized RBF networks is determining the spread of the centers; hence, it is basis for further work. The author proposes the following strategies for obtaining the spread of the centers; 1. Define an initial inverse covariance matrix for all centers where dis the average distance between centers, C is a norm weighting matrix that is diagonal and is used to normalize t he input data in a unit hypersphere.
Next, we minimize a cost function over the inverse covariance matrix min£= llY -Yll where Y is the desired output vector and Y is the approximated output vector.
Using this scheme we can get an optimal inverse covariance matrix that gives a general representation of the spreads of all the centers. It should be noted that the position of the centers are fixed and determined by the SOFM network. The requirement is to find the free parameters Wi, Ci, and Ei 1 so as to minimize Finally, the author would like to see some work done in further developing the control by segmentation methodology proposed in Chapter three.