Date of Award
2023
Degree Type
Thesis
Degree Name
Master of Science in Electrical Engineering (MSEE)
Department
Electrical, Computer, and Biomedical Engineering
First Advisor
Haibo He
Abstract
In the current reinforcement learning (RL) taxonomy, there exist few algorithms for discrete action environments that are capable of learning stochastic policies in an off-policy manner. Learning stochastic policies brings benefits such as stable training and smoother exploration strategies. Training an algorithm in an off-policy manner allows for greater sample efficiency, as experiences collected while interacting with a learning environment can be used more than once. Stable performance and good sample efficiency are highly important when collecting experiences from a learning environment is expensive. This thesis proposes a new algorithm for discrete action RL called Discrete General Policy Optimization (Discrete GPO) that has both of the above characteristics. The algorithm is designed following recent theoretical developments in trust region policy optimization techniques. The performance of Discrete GPO is tested in different simulated learning environments, and a comparison to other state of the art methods is provided.
Recommended Citation
Clavette, Nicholas, "A SAMPLE EFFICIENT OFF-POLICY ACTOR-CRITIC APPROACH FOR DISCRETE ACTION ENVIRONMENTS" (2023). Open Access Master's Theses. Paper 2374.
https://digitalcommons.uri.edu/theses/2374
Terms of Use
All rights reserved under copyright.