Generalization from Examples

Often there are way too many states to build a table,
or the state is not completely visible.
We can fix this by replacing $\hat{Q}$ with a neural net or other generalizer.
1. For example, encode $s,a$ as the network inputs and train it to output the target values of $\hat{Q}$ given by the training rules.
2. Or, train a separate network for each action, using the state as input and $\hat{Q}$ as output.
3. Or, train one network with the state as input but with one $\hat{Q}$ output for each action.
TD-Gammon used neural nets and Backpropagation.