Q-Learning Motivation
- So we want to learn $V^{\pi^{*}} \equiv V^{*}$.
- The agent could just do a lookahead search to choose the best action for each state:
\[ \pi^{*}(s) = \arg \max_{a} [r(s,a) + \gamma V^{*}(\delta(s,a))] \]
easy, right?
- Yes, but only if we know $\delta$ and $r$.
- Most often, we don't.
José M. Vidal
.
10 of 22