Reinforcement Learning

This talk is based on

1 Introduction

1.1 TD-Gammon

Gammon

1.2 Reinforcement Learning Problem

1.3 The Learning Task

1.4 Value function

1.5 Example

2 Q-Learning Motivation

2.1 Q Function

2.2 Learning Q

2.3 Q-Learning Algorithm

Q-Learning Algorithm
  1. For each $s,a$ set $\hat{Q}(s,a) = 0$.
  2. Observe the current state $s$.
  3. Select an action $a$ and execute it.
  4. Receive reward $r$.
  5. Update the table entry for $\hat{Q}(s,a)$ with \[ \hat{Q}(s,a) \leftarrow r + \gamma \max_{a'}\hat{Q}(s',a') \]
  6. $s \leftarrow s'$ //Observe the new state.
  7. Goto 3.

2.4 Q-Learning Example

ex

2.5 Q-Learning Convergence

2.6 How to Choose an Action

2.7 Variations

2.8 Nondeterministic Rewards and Actions

2.8.1 Nondeterministic Q-Learning

3 Temporal Difference Learning

3.1 Temporal Difference

4 Generalization from Examples

URLs

  1. Machine Learning book at Amazon, http://www.amazon.com/exec/obidos/ASIN/0070428077/multiagentcom/
  2. Slides by Tom Mitchell on Machine Learning, http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html
  3. Sutton and Barto: Reinforcement Learning, http://www-anw.cs.umass.edu/~rich/book/the-book.html
  4. Temporal Difference Learning and TD-Gammon by Gerald Tesauro, http://www.research.ibm.com/massive/tdl.html
  5. n-armed bandit, http://www-anw.cs.umass.edu/~rich/book/2/node2.html
  6. Learning to Predict by the Methods of Temporal Difference, by Sutton, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-88.ps.gz
  7. Sutton's Homepage, http://www-anw.cs.umass.edu/~rich/sutton.html

This talk available at http://jmvidal.cse.sc.edu/talks/reinforcementlearning/
Copyright © 2009 José M. Vidal . All rights reserved.

31 May 2003, 08:44PM