Reinforcement Learning

Variations

Updating after each move can make it take very long to converge if the only reward is at the end.
Instead, we can save the whole set of rewards and update at the end, in reverse order.

Another technique is to store past state-action transitions and their rewards and re-train on them periodically.
This helps when the Q values of the neighbors have changed.

José M. Vidal .

17 of 22