Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
Date of Original Version
Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.
Zhu, Yuanheng, Dongbin Zhao, Haibo He, and Junhong Ji. "Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems." Cognitive Computation 7, 6 (2015): 763-771. doi:10.1007/s12559-015-9350-z.