Value Iteration
Title: Value Iteration
Author: William Wiles
Description:
Small demo of the Value Iteration Algorithm
HOW IT WORKS
The MDP is made with a series of trees. First a tree is made from the initial state to the rest of the states and then based on the number of actions chosen trees are made from other states chosen at random. As a special note, it is noticable that the preliminary tree that connects the initial state to the rest of the states is a very good candidate for containing the solution path for the iteration algorithm. However, since the links are setup independent of when the probabilities are assigned, this is not absolute.
I ran into much difficulty in picking a way to build the MDP since the link turtles are fairly constrained. This is the third revision and I admit it is not perfect, an action from a state can only implicitly leave the agent in the same state; i.e. when the links are built the calling state will not have itself as a choice, and therefore it is not as random as it ought be. The way an action can cause the agent to remain in the same state occurs when the summation of all probabilities corrisponding to that action at that state not equaling 1. The remainder is the probabilty of that action causing the agent to remain in the same state. Another issue that I've lived with by virtue of complexity is that the build-MDP function likes complete graphs. If there is a high number of action choices relative to the number of states (somewhere around 1:1) then you will very likely get a complete or near complete graph. Had I addeded support for links to not be associated to just one action, i.e. a link holds an "action list" then this might not be the case. Experimenting with each link holding a list of actions proved extremely complex when summing individual probilities for actions from a state, and proved to possibly jeopordize the manner for remaining in the same state.
The value iteration algorithm is implemented almost exactly as mentioned in Chapter 1 pg. 13, instead of looping through all states, each state is called to perform the utility update on its own. Since the MDPs are not particularly "deep" with the maximum set at 10 states, a solution is found relatively quickly.
HOW TO USE IT
Setup will build the MDP to solve and assign rewards to each state. The initial state
will be indicated in blue and it will have the smallest reward. After setting up, you
can either step through the value-iteration algorithm one step at a time or run it till
completion.
Notice that the reward which is initially displayed in the state will change to the current utility of the state. The error function as well as the current delta is displayed in the lower left, and the current policy will be updated in each iteration of the algorithm. This graph is NOT a histogram. The states run accross the x-axis and the value associated with each state is the number of the action that the policy has chosen for that state.
In the upper right of the world there are smiley faces, these display the colors corrisponding to the action numbers.
While value-iteration is running, the color of the state will corrispond (similarly in hue) to the action chosen by the policy. The intial state will be slightly brighter, while other states will be slightly darker. This serves as only as a visual aid to the Value Action graph which is more accurate in showing the policy for each state.
THINGS TO NOTICE
Sometimes during setup, links will flash back and forth between colors, this is due to
a problem with the number of states vs the number of links assigned, with a dependancy
on the maximum probability allowed to a link. Sometimes setup will become confused.
Halting and restarting usually resolves the problem as it is rare that it occurs.
NETLOGO FEATURES
The turtle parallelism made writing the iteration portions quite easy, but required much revision on my part. Much different than the standard function-based programming scheme of Java and C.
CREDITS AND REFERENCES
I referenced the distributed breakout algorithm (DBgc.nlogo), http://jmvidal.cse.sc.edu/netlogomas/DBgc.html for strategy on designing graphs with links.
And for API info, http://ccl.northwestern.edu/netlogo/