Department of Electrical, Computer, and Biomedical Engineering Faculty Publications

Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming

Jian Fu, Wuhan University of Technology
Sujuan Wei, Wuhan University of Technology
Haibo He, University of Rhode IslandFollow
Shengyong Wang, WISDRI(WuHan) Automation Co., Ltd

Document Type

Conference Proceeding

Date of Original Version

1-1-2014

Abstract

We present a novel online learning control algorithm (OLCPA) which comprises projected gradient temporal difference for action-value function (PGTDAVF) and advanced heuristic dynamic programming with one step delay (AHD-POSD). PGTDAVF can guarantee the convergence of temporal difference(TD)-based policy learning with smooth action-value function approximators, such as neural networks. Meanwhile, AHDPOSD is a specially designed framework for embedding PGTDAVF in to conduct online learning control. It not only coincides with the intention of temporal difference but also enables PGTDAVF to be effective under nonidentical policy environment, which results in more practicality. In this way, the proposed algorithms achieve the stability and practicability simultaneously. Finally, simulation of online learning control on a cart pole benchmark demonstrates practical control capability and efficiency of the presented method.

Publication Title, e.g., Journal

Proceedings of the International Joint Conference on Neural Networks

Citation/Publisher Attribution

Fu, Jian, Sujuan Wei, Haibo He, and Shengyong Wang. "Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming." Proceedings of the International Joint Conference on Neural Networks (2014): 3649-3656. doi: 10.1109/IJCNN.2014.6889756.

Link to Full Text

COinS

DOI

https://doi.org/10.1109/IJCNN.2014.6889756

Department of Electrical, Computer, and Biomedical Engineering Faculty Publications

Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Citation/Publisher Attribution

DOI

Search

Browse

Author Corner

Department of Electrical, Computer, and Biomedical Engineering Faculty Publications

Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming

Authors

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Citation/Publisher Attribution

Share

DOI

Search

Browse

Author Corner