Generalization from Examples
- Often there are way too many states to build a
table,
- or the state is not completely visible.
- We can fix this by replacing $\hat{Q}$ with a neural net
or other generalizer.
- For example, encode $s,a$ as the
network inputs and train it to output the target
values of $\hat{Q}$ given by the training rules.
- Or, train a separate network for each action, using
the state as input and $\hat{Q}$ as output.
- Or, train one network with the state as input but
with one $\hat{Q}$ output for each action.
- TD-Gammon used neural nets and Backpropagation.
José M. Vidal
.
22 of 22