Direct information (or training examples)
consists of individual checkerboard states and their correct
moves.
Indirect information consist of a move
sequences and the final outcomes (win or lose).
When using indirect information we are faced with the
credit assignment problem: determining how much
credit each move should receive for the final outcome.
Decide whether the learner chooses its own examples or
they are presented to it by a teacher.
Make the distribution of training examples representative
of those used by the performance metric.
In the checker's example we could choose $E$ to be games
played against itself.