A hybrid evolving and gradient strategy for approximating policy evaluation on online critic-actor learning

Document Type

Conference Proceeding

Date of Original Version

8-23-2012

Abstract

In this paper, we propose a novel strategy for approximating policy evaluation during online critic-actor learning procedure. We adopt the adaptive differential evolution with elites (ADEE) to optimize moving least square temporal difference with one step (MLSTD(0)) at the early stage which is good at global searching. Next we apply gradient method to perform local search efficiently and effectively. That solves the dilemma between explore and exploit in weight seeking for critic neural network. Simulation results on the online learning control of a cart pole benchmark demonstrate the efficiency of the presented method. © 2012 Springer-Verlag.

Publication Title, e.g., Journal

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

7367 LNCS

Issue

PART 1

Share

COinS