Q-Learning Algorithm
Q-Learning Algorithm
- For each $s,a$ set $\hat{Q}(s,a) = 0$.
- Observe the current state $s$.
- Select an action $a$ and execute it.
- Receive reward $r$.
- Update the table entry for $\hat{Q}(s,a)$ with
\[ \hat{Q}(s,a) \leftarrow r + \gamma \max_{a'}\hat{Q}(s',a') \]
- $s \leftarrow s'$ //Observe the new state.
- Goto 3.
José M. Vidal
.
13 of 22