The performance system uses the learned
function and plays a game.
The critic takes a trace of this game and
produces a series of training examples. In our case,
$V_{train}$.
The generalizer takes the training examples and
produces a new estimate of the target function. LMS in our
example.
The experiment generator takes as input the
current learned function and outputs a new problem. In our
example we always output the same problem: the start board
state.