Department of Electrical, Computer, and Biomedical Engineering Faculty Publications

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Yuanheng Zhu, Institute of Automation Chinese Academy of Sciences
Dongbin Zhao, Institute of Automation Chinese Academy of Sciences
Haibo He, University of Rhode IslandFollow
Junhong Ji, Harbin Institute of Technology

Document Type

Article

Date of Original Version

12-1-2015

Abstract

Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.

Publication Title, e.g., Journal

Cognitive Computation

Volume

Issue

Citation/Publisher Attribution

Zhu, Yuanheng, Dongbin Zhao, Haibo He, and Junhong Ji. "Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems." Cognitive Computation 7, 6 (2015): 763-771. doi: 10.1007/s12559-015-9350-z.

Link to Full Text

COinS

DOI

https://doi.org/10.1007/s12559-015-9350-z

Department of Electrical, Computer, and Biomedical Engineering Faculty Publications

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Issue

Citation/Publisher Attribution

DOI

Search

Browse

Author Corner

Department of Electrical, Computer, and Biomedical Engineering Faculty Publications

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Authors

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Issue

Citation/Publisher Attribution

Share

DOI

Search

Browse

Author Corner