Learning without external reward

Document Type


Date of Original Version



In the traditional reinforcement learning paradigm, a reward signal is applied to define the goal of the task. Usually, the reward signal is a «hand-crafted» numerical value or a pre-defined function: it tells the agent how good or bad a specific action is. However, we believe there exist situations in which the environment cannot directly provide such a reward signal to the agent. Therefore, the question is whether an agent can still learn without the external reward signal or not. To this end, this article develops a self-learning approach which enables the agent to adaptively develop an internal reward signal based on a given ultimate goal, without requiring an explicit external reward signal from the environment. In this article, we aim to convey the self-learning idea in a broad sense, which could be used in a wide range of existing reinforcement learning and adaptive dynamic programming algorithms and architectures. We describe the idealized forms of this method mathematically, and also demonstrate its effectiveness through a triple-link inverted pendulum case study.

Publication Title, e.g., Journal

IEEE Computational Intelligence Magazine