A hybrid evolving and gradient strategy for approximating policy evaluation on online critic-actor learning
Document Type
Conference Proceeding
Date of Original Version
8-23-2012
Abstract
In this paper, we propose a novel strategy for approximating policy evaluation during online critic-actor learning procedure. We adopt the adaptive differential evolution with elites (ADEE) to optimize moving least square temporal difference with one step (MLSTD(0)) at the early stage which is good at global searching. Next we apply gradient method to perform local search efficiently and effectively. That solves the dilemma between explore and exploit in weight seeking for critic neural network. Simulation results on the online learning control of a cart pole benchmark demonstrate the efficiency of the presented method. © 2012 Springer-Verlag.
Publication Title, e.g., Journal
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume
7367 LNCS
Issue
PART 1
Citation/Publisher Attribution
Fu, Jian, Haibo He, Huiying Li, and Qing Liu. "A hybrid evolving and gradient strategy for approximating policy evaluation on online critic-actor learning." Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7367 LNCS, PART 1 (2012): 555-564. doi: 10.1007/978-3-642-31346-2_62.