For this problem set you will implement a NetLogo program that displays a randomly generated MDP and then solves it using the value iteration algorithm from Figure 1.2 in the textbook.
Your program should have sliders that let the user choose the number of states, the number of actions, and a measure of how likely an action is likely to fail and lead to another state. That is, at one extreme we have error-free (deterministic) actions always leads from one state to another. Thus, the model you generate will have a probability of 1 associated with every edge. At the other extreme each action can lead us to every state in the graph, with equal probability. Try to come up with a set of sliders that let the user choose from a wide variety of different MDP topologies.
Implementing and displaying your MDP will be made much easier if you use the new NetLogo links primitives for drawing directed graphs.
Once you can generate an MDP you will then implement the value iteration algorithm to solve the generated MDP. Your final result should also be shown graphically, perhaps by changing the color of the chosen action edges. Note also that the value iteration algorithm can be easily parallelized by asking each node to do its calculation in parallel.
As will all the problem sets, you will hand them in using our department's dropbox.