Q-Learning Motivation
- So we want to learn $V^{\pi^{*}} \equiv V^{*}$.
- The agent could just do a lookahead search to choose the best action for each state:
\[ \pi^{*}(s) = \arg \max_{a} [r(s,a) + \gamma V^{*}(\delta(s,a))] \]
easy, right?
José M. Vidal
.
9 of 22