Q-Learning Convergence
- If the system is a deterministic MDP and
- the immediate rewards are bounded (i.e., there is some $c$
s.t. $|r(s,a)| < c$ and
- the agent visits every possible state infinitely often,
then
- Q-learning is proven to converge, eventually.
- That is $\hat{Q}$ will eventually equal $Q$.
José M. Vidal
.
15 of 22