Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles

Document Type


Date of Original Version



This paper presents a parameterized batch reinforcement learning algorithm for near-optimal longitudinal control of autonomous land vehicles (ALVs). The proposed approach uses an actor-critic architecture, where parameterized feature vectors based on kernels are learned from collected samples for approximating the value functions and policies. One difference between the parameterized batch actor-critic (PBAC) algorithm and previous actor-critic learning approaches is that the critic and actor in PBAC share the same linear features, which has been theoretically proved to be a beneficial property for the convergence of actor-critic learning approaches. In order to obtain better learning efficiency, least-squares-based batch updating rules are designed for the critic and actor, respectively. Based on the PBAC learning algorithm, a data-driven longitudinal control method is presented for ALVs to obtain near-optimal control policies which adaptively tune the fuel/brake control signals to track different speeds. A multiobjective reward function is designed so that both tracking precision and driving smoothness are considered. Extensive experiments were conducted on a real ALV platform while driving on flat, slippery, sloping, and bumpy roads. The experimental results illustrate the superiority of the PBAC-based self-learning controller over conventional longitudinal control methods such as proportional-integral (PI) control and learning-based PI control.

Publication Title, e.g., Journal

IEEE Transactions on Systems, Man, and Cybernetics: Systems