Learning without external reward
Date of Original Version
In the traditional reinforcement learning paradigm, a reward signal is applied to define the goal of the task. Usually, the reward signal is a «hand-crafted» numerical value or a pre-defined function: it tells the agent how good or bad a specific action is. However, we believe there exist situations in which the environment cannot directly provide such a reward signal to the agent. Therefore, the question is whether an agent can still learn without the external reward signal or not. To this end, this article develops a self-learning approach which enables the agent to adaptively develop an internal reward signal based on a given ultimate goal, without requiring an explicit external reward signal from the environment. In this article, we aim to convey the self-learning idea in a broad sense, which could be used in a wide range of existing reinforcement learning and adaptive dynamic programming algorithms and architectures. We describe the idealized forms of this method mathematically, and also demonstrate its effectiveness through a triple-link inverted pendulum case study.
Publication Title, e.g., Journal
IEEE Computational Intelligence Magazine
He, Haibo, and Xiangnan Zhong. "Learning without external reward." IEEE Computational Intelligence Magazine 13, 3 (2018): 48-54. doi: 10.1109/MCI.2018.2840727.