Date of Award
Master of Science in Electrical Engineering (MSEE)
Electrical, Computer, and Biomedical Engineering
In the current reinforcement learning (RL) taxonomy, there exist few algorithms for discrete action environments that are capable of learning stochastic policies in an off-policy manner. Learning stochastic policies brings benefits such as stable training and smoother exploration strategies. Training an algorithm in an off-policy manner allows for greater sample efficiency, as experiences collected while interacting with a learning environment can be used more than once. Stable performance and good sample efficiency are highly important when collecting experiences from a learning environment is expensive. This thesis proposes a new algorithm for discrete action RL called Discrete General Policy Optimization (Discrete GPO) that has both of the above characteristics. The algorithm is designed following recent theoretical developments in trust region policy optimization techniques. The performance of Discrete GPO is tested in different simulated learning environments, and a comparison to other state of the art methods is provided.
Clavette, Nicholas, "A SAMPLE EFFICIENT OFF-POLICY ACTOR-CRITIC APPROACH FOR DISCRETE ACTION ENVIRONMENTS" (2023). Open Access Master's Theses. Paper 2374.