Temporal Difference
- We can then rewrite it as a recursive function
\[
Q^{\lambda}(s_{t},a_{t}) = r_{t} + \gamma [ (1 - \lambda) \max_{a}\hat{Q}(s_{t},a_{t})+ \lambda Q^{\lambda}(s_{t+1},a_{t+1})]
\]
- The TD($\lambda$) algorithm (by Sutton) uses this training
rule.
- TD($\lambda$) sometimes converges faster than
Q-learning.
- TD-Gammon uses it.
José M. Vidal
.
21 of 22