Reinforcement Learning Problem
- The agent is in a state $s_0$, takes action $a_0$ for
which it receives a reward of $r_0$ and ends up at $s_1$:
\[ s_0 \rightarrow^{r_0} s_1 \rightarrow^{r_1} s_2 \rightarrow^{r_2}
s_3 \cdots \]
- We define the problem:
$S$ finite set of states
$A$ finite set of actions
$r_t = r(s_t,a_t)$
$s_{t+1} = \delta(s_t,a_t)$
- We also assume that $s_{t+1}$ and $r_t$ depend
only on the current state and action.
- That is, the system is a Markov Decision Process.
- Note that $\delta$ and $r$ might be
nondeterministic.
- Also, the agent might not know $\delta$ and $r$
José M. Vidal
.
5 of 22