Training Examples

$V(b)$: the true target function
$\hat{V}(b)$ : the learned function
Calculating $\hat{V}$ will require training examples. An example looks like: \[ \langle \langle bp = 2, rp = 4, bk = 1, rk=0, bt=0, rt=0\rangle, +100\rangle \]
$V_{train}(b)$ is the training value for $b$, for example: \[ V_{train}(\langle bp = 2, rp = 4, bk = 1, rk=0, bt=0, rt=0\rangle) = 100 \]
Its easy to assign values to final board states. For intermediate boards we can use \[ V_{train}(b) \leftarrow \hat{V}(Successor(b)) \]
$Successor(b)$ gives the next board state following $b$ for which it is again the program's turn to move.