Reinforcement Learning Problem
- The agent is in a state $s_0$, takes action $a_0$ for
which it receives a reward of $r_0$ and ends up at $s_1$:
\[ s_0 \rightarrow^{r_0} s_1 \rightarrow^{r_1} s_2 \rightarrow^{r_2}
s_3 \cdots \]
- We define the problem:
$S$ finite set of states
$A$ finite set of actions
$r_t = r(s_t,a_t)$
$s_{t+1} = \delta(s_t,a_t)$
- We also assume that $s_{t+1}$ and $r_t$ depend
only on the current state and action.
José M. Vidal
.
4 of 22