What is the «credit assignment problem» in reinforcement learning?

dmitrii_fediuk · May 15, 2019, 11:23pm

It refers to the fact that rewards, especially in fine grained state-action spaces, can occur terribly temporally delayed.
For example, a robot will normally perform many moves through its state-action space where immediate rewards are (almost) zero and where more relevant events are rather distant in the future.
As a consequence such reward signals will only very weakly affect all temporally distant states that have preceded it.
It is almost as if the influence of a reward gets more and more diluted over time and this can lead to bad convergence properties of the RL mechanism.
Many steps must be performed by any iterative reinforcement-learning algorithm to propagate the influence of delayed reinforcement to all states and actions that have an effect on that reinforcement.

scholarpedia.org/w/index.php?title=Reinforcement_learning&oldid=127590#.28Temporal.29_Credit_Assignment_Problem

A major challenge is that a game of backgammon can involve dozens of moves, and yet it is only at the end of the game that the reward, in the form of victory, is achieved. The reward must then be attributed appropriately to all of the moves that led to it, even though some moves will have been good ones and others less so. This is an example of a credit assignment problem.

Christopher M. Bishop - «Pattern Recognition and Machine Learning» (2006)

dmitrii_fediuk · May 15, 2019, 11:25pm