Date of Award


Degree Type


Degree Name

Master of Science in Electrical Engineering (MSEE)


Electrical, Computer, and Biomedical Engineering

First Advisor

Haibo He


In the current reinforcement learning (RL) taxonomy, there exist few algorithms for discrete action environments that are capable of learning stochastic policies in an off-policy manner. Learning stochastic policies brings benefits such as stable training and smoother exploration strategies. Training an algorithm in an off-policy manner allows for greater sample efficiency, as experiences collected while interacting with a learning environment can be used more than once. Stable performance and good sample efficiency are highly important when collecting experiences from a learning environment is expensive. This thesis proposes a new algorithm for discrete action RL called Discrete General Policy Optimization (Discrete GPO) that has both of the above characteristics. The algorithm is designed following recent theoretical developments in trust region policy optimization techniques. The performance of Discrete GPO is tested in different simulated learning environments, and a comparison to other state of the art methods is provided.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.