Q Function

Reinforcement Learning

Define new function very similar to $V^*$ \[ Q(s,a) \equiv r(s,a) + \gamma V^{*}(\delta(s,a)) \] If agent learns $Q$, it can choose optimal action even without knowing $\delta$ because this \[ \pi^{*}(s) = \arg \max_{a} [r(s,a) + \gamma V^{*}(\delta(s,a))] \] is the same as this \[ \pi^{*}(s) = \arg \max_{a} Q(s,a) \]
Our agent will learn $Q$.