Q-Learning Algorithm

Reinforcement Learning

For each $s,a$ set $\hat{Q}(s,a) = 0$.
Observe the current state $s$.
Select an action $a$ and execute it.
Receive reward $r$.
Update the table entry for $\hat{Q}(s,a)$ with \[ \hat{Q}(s,a) \leftarrow r + \gamma \max_{a'}\hat{Q}(s',a') \]
$s \leftarrow s'$ //Observe the new state.
Goto 3.