Final Design

The performance system uses the learned function and plays a game.
The critic takes a trace of this game and produces a series of training examples. In our case, $V_{train}$.
The generalizer takes the training examples and produces a new estimate of the target function. LMS in our example.
The experiment generator takes as input the current learned function and outputs a new problem. In our example we always output the same problem: the start board state.