On-Line Adaptive Dynamic Programming for Feedback Control
Stability analysis and controller design are among the most important issues in feedback control problems. Usually, controller design for linear system can be obtained by solving the Riccati equation. However, when comes to the nonlinear control problem, Riccati equation becomes the well-known Hamilton-Jacobi-Bellman (HJB) equation which is difficult to tackle directly. Fortunately, adaptive dynamic programming (ADP) has been widely recognized as one of the “core methodologies” to achieve optimal control in stochastic process in a general case to achieve brain-like intelligent control. Extensive efforts and promising results have been achieved over the past decades. The achievements cover a large variety of problems, including system stability, convergence analysis, controller design, optimal control, state prediction, etc. ^ This dissertation investigates the on-line ADP techniques for the feedback control systems and provides novel methods to solve several existing problems in this field. Specifically, the improvement and the original contribution of this dissertation can be summarized from algorithms, architectures, and applications, respectively. ^ From the algorithms, an event-triggered ADP method is provided by sampling the efficient states rather than the entire system states generated during the learning process. The designed control law is only updated according to the sampled states to reduce the computation cost. In order to guarantee the sampled states are efficient, the theoretical analysis is provided to generate an event threshold to make sure the stability of the system during the event triggered learning process. It is said that only when the difference between the sampled and the current states is larger than the threshold, the system samples the state from the environment and updates the control law according to the sample states. This idea is further developed in the partially observable environment and the event threshold is designed only based on the observed feedback. A neural-network-based observer is designed to recover the internal states from the partially observable ones. Both the observer and the control law are updated aperiodically based on the sampled system outputs. In this way, the computation and the transmission load can be significantly reduced. From the simulation results, it is shown that the event-triggered ADP method can achieve competitive performance at the same time. ^ From the architectures, a new framework, named “goal representation adaptive dynamic programming (GrADP)”, is proposed and introduced in this dissertation. It is regarded as the foundation of building intelligent systems through internal reward learning, goal representation and state-action association. Unlike the traditional ADP design with an action network and a critic network, this new approach integrates an additional goal network, such that to build a general internal reinforcement signal. Unlike the traditional fixed or predefined reinforcement signal, this new design can adaptively update the internal reinforcement representation over time and thus facilitate the system’s learning and optimization to accomplish the ultimate goals. This dissertation for the first time provides the theoretical foundation of the GrADP design. It is shown that the designed internal reinforcement signal can give the agent more information by considering more distance lookahead, and therefore, this signal is more efficient. ^ From the applications, this dissertation designs the ADP method for a class of Markov jump systems (MJSs) to find the optimal control law even though the system state keeps jumping among several subsystems. This dissertation also shows that the control law obtained from the learning process can quickly converge to the optimal solutions which verifies the effectiveness of the proposed method.^
"On-Line Adaptive Dynamic Programming for Feedback Control"
Dissertations and Master's Theses (Campus Access).