Reinforcement Learning

Q-Learning Convergence

If the system is a deterministic MDP and
the immediate rewards are bounded (i.e., there is some $c$ s.t. $|r(s,a)| < c$ and
the agent visits every possible state infinitely often, then
Q-learning is proven to converge, eventually.
That is $\hat{Q}$ will eventually equal $Q$.

José M. Vidal .

15 of 22