Introduction
- So far, our machine learning problems have assumed:
- that we can present the learner with correct examples,
- and that learning goes first, then we test.
- Imagine a robot learning to dock with the batter
charger.
- It gets a reward (charge) when successful, but must
determine how to encourage actions that led there (temporal
credit assignment problem).
- The reward might be probabilistic.
- It can explore the domain, finding a better route? (exploration vs. exploitation problem)
- The state might only be partially observable.
- It might have to learn several related task with the
same environment and using the same sensors.
José M. Vidal
.
2 of 22