Q Function
- Define new function very similar to $V^*$
\[ Q(s,a) \equiv r(s,a) + \gamma V^{*}(\delta(s,a)) \]
If agent learns $Q$, it can choose optimal action even without knowing
$\delta$ because this
\[ \pi^{*}(s) = \arg \max_{a} [r(s,a) + \gamma V^{*}(\delta(s,a))] \]
is the same as this
\[ \pi^{*}(s) = \arg \max_{a} Q(s,a) \]
- Our agent will learn $Q$.
José M. Vidal
.
11 of 22