Temporal Difference

Reinforcement Learning

We can then rewrite it as a recursive function \[ Q^{\lambda}(s_{t},a_{t}) = r_{t} + \gamma [ (1 - \lambda) \max_{a}\hat{Q}(s_{t},a_{t})+ \lambda Q^{\lambda}(s_{t+1},a_{t+1})] \]
The TD($\lambda$) algorithm (by Sutton) uses this training rule.
TD($\lambda$) sometimes converges faster than Q-learning.
TD-Gammon uses it.