An Improved Trust-Region Method for Off-Policy Deep Reinforcement Learning
Document Type
Conference Proceeding
Date of Original Version
1-1-2023
Abstract
Reinforcement learning (RL) is a powerful tool for training agents to interact with complex environments. In particular, trust-region methods are widely used for policy optimization in model-free RL. However, these methods suffer from high sample complexity due to their on-policy nature, which requires interactions with the environment for each update. To address this issue, off-policy trust-region methods have been proposed, but they have shown limited success in highdimensional continuous control problems compared to other off-policy DRL methods. To improve the performance and sample efficiency of trust-region policy optimization, we propose an off-policy trust-region RL algorithm. Our algorithm is based on a theoretical result on a closed-form solution to trust-region policy optimization and is effective in optimizing complex nonlinear policies. We demonstrate the superiority of our algorithm over prior trust-region DRL methods and show that it achieves excellent performance on a range of continuous control tasks in the Multi-Joint dynamics with Contact (MuJoCo) environment, comparable to state-of-the-art off-policy algorithms.
Publication Title, e.g., Journal
Proceedings of the International Joint Conference on Neural Networks
Volume
2023-June
Citation/Publisher Attribution
Li, Hepeng, Xiangnan Zhong, and Haibo He. "An Improved Trust-Region Method for Off-Policy Deep Reinforcement Learning." Proceedings of the International Joint Conference on Neural Networks 2023-June, (2023). doi: 10.1109/IJCNN54540.2023.10191837.