A demonstration of the value iteration algorithm as applied to a 2D world where a robot can move North, South, East, or West.
We build a graph where the _node_s represent the state of the underlying MDP and the directed links represent actions that can be taken on each state. Each link has a transitions variable which holds the set of nodes that can be reached when taken that action, along with the probabilities.
The thickness of each edge/action is proportional to its current utility.
prob-action-works is the probability that the action (North, South, East, West) will actually take the robot to that square. With 1 - prob-action-works the robot will end up either at its current spot or at one of the other reachable nodes that is a distance of < 2 from the intended destination, with equal probability.
The plot shows the maximum change in utility over all nodes. As expected, this value decreases monotonically.
Setup and Go.
Jose M Vidal