Electrical, Computer, and Biomedical Engineering Faculty Publications

Two-time-scale online actor-critic paradigm driven by POMDP

Bo Liu, Charles V. Schaefer, Jr. School of Engineering and Science
Haibo He, Charles V. Schaefer, Jr. School of Engineering and Science
Daniel W. Repperger, Air Force Research Laboratory

Document Type

Conference Proceeding

Date of Original Version

6-9-2010

Abstract

In this paper, we analyze a class of actor-critic algorithms under partially observable Markov decision process (POMDP) environment. Specifically, in this work we focus on the two-time-scale framework in which the critic uses a temporal difference with neural network (NN) as nonlinear function approximator, and the actor is updated using greedy algorithm with the stochastic gradient approach. Instead of the common construction of hidden state estimator, we develop the idea originated from Singh, Jaakkola and Jordan (1994) into an online action-dependent actor-critic paradigm. This framework explores the ability of the adaptive dynamic programming (ADP) approach in POMDP environment without implementing extra architectures such as state estimators. Both the theoretical analysis and simulation studies validate that the framework performs effectively under certain assumptions given in this paper. ©2010 IEEE.

Publication Title, e.g., Journal

2010 International Conference on Networking Sensing and Control Icnsc 2010

Citation/Publisher Attribution

Liu, Bo, Haibo He, and Daniel W. Repperger. "Two-time-scale online actor-critic paradigm driven by POMDP." 2010 International Conference on Networking Sensing and Control Icnsc 2010 (2010). doi: 10.1109/ICNSC.2010.546149.

Link to Full Text

COinS

DOI

https://doi.org/10.1109/ICNSC.2010.546149

Electrical, Computer, and Biomedical Engineering Faculty Publications

Two-time-scale online actor-critic paradigm driven by POMDP

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Citation/Publisher Attribution

DOI

Search

Browse

Author Corner

Electrical, Computer, and Biomedical Engineering Faculty Publications

Two-time-scale online actor-critic paradigm driven by POMDP

Authors

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Citation/Publisher Attribution

Share

DOI

Search

Browse

Author Corner