Reinforcement Learning Problem

The agent is in a state $s_0$, takes action $a_0$ for which it receives a reward of $r_0$ and ends up at $s_1$: \[ s_0 \rightarrow^{r_0} s_1 \rightarrow^{r_1} s_2 \rightarrow^{r_2} s_3 \cdots \]
We define the problem:
$S$ finite set of states
$A$ finite set of actions
$r_t = r(s_t,a_t)$
$s_{t+1} = \delta(s_t,a_t)$
We also assume that $s_{t+1}$ and $r_t$ depend only on the current state and action.
That is, the system is a Markov Decision Process.
Note that $\delta$ and $r$ might be nondeterministic.
Also, the agent might not know $\delta$ and $r$