An online actor-critic learning approach with Levenberg-Marquardt algorithm
Date of Original Version
This paper focuses on the efficiency improvement of online actor-critic design base on the Levenberg-Marquardt (LM) algorithm rather than traditional chain rule. Over the decades, several generations of adaptive/approximate dynamic programming (ADP) structures have been proposed in the community and demonstrated many successfully applications. Neural network with backpropagation has been one of the most important approaches to tune the parameters in such ADP designs. In this paper, we aim to study the integration of Levenberg-Marquardt method into the regular actor-critic design to improve weights updating and learning for a quadratic convergence under certain condition. Specifically, for the critic network design, we adopt the LM method targeting improved learning performance, while for the action network, we use the neural network with backpropagation to provide an appropriate control action. A detailed learning algorithm is presented, followed by benchmark tests of pendulum swing up and balance and cart-pole balance tasks. Various simulation results and comparative study demonstrated the effectiveness of this approach. © 2011 IEEE.
Proceedings of the International Joint Conference on Neural Networks
Ni, Zhen, Haibo He, Danil V. Prokhorov, and Jian Fu. "An online actor-critic learning approach with Levenberg-Marquardt algorithm." Proceedings of the International Joint Conference on Neural Networks , (2011): 2333-2340. doi:10.1109/IJCNN.2011.6033520.