For the algorithm to work correctly, the time spent in metalevel thinking must be small compared to the time it takes to do an actual node expansion and propagation of a new solution (i.e. a PE(t,l)). Otherwise, the agent is probably better off choosing nodes to expand at random without spending time considering which one might be better.
A node expansion involves the creation of a new matrix M. If we assume that there are k agents and each one has n possible actions, then the number of elements in the matrix will be . Each one of these elements represents the utility of the situation that results after all the agents take their actions. In order to calculate this utility, it is necessary to simulate the new state that results from all the agents taking their respective actions (i.e. as dictated by the element's position in the matrix) and then calculate the utility, to some particular agent, of this new situation (see Figure 5). The calculation of this utility can be an arbitrarily complex function of the state, and must be performed for each of the elements in the matrix.
Figure 5: A very simplified version of the Pursuit Problem with only 3
agents and in a 4 by 4 grid. Given the old situation, each one of the
elements of the matrix corresponds to the payoff in one of the many
possible next situations. There is one for each combination of moves
the agents might take (e.g. North, East, Idle). Sometimes, the moves of
the predators might interfere with each other. The predator calculating
the matrix has to resolve these conflicts when determining the new
situation, before calculating its utility.
The next step is the propagation of the expected strategy. If we replace one of the ZK leaf strategies by some other strategy, then the time to propagate this change all the way to the root of the tree depends both on the distance between the root and leaf, and the time to propagate a strategy past a node. Because of the way strategies are calculated, it is almost certain that any strategy that is the result of evaluating a node will be a pure strategy. If all the children of a node are pure strategies, then the strategy the node evaluates to can be found by calculating the maximum of n numbers. In other words, the k-dimensional matrix collapses into a one-dimensional vector. If c of the children have mixed strategies, we would need to add numbers and find the maximum partial sum. In the worst case, c=k-1, so we need to add numbers.
We can conclude that propagation of a solution will require a good number of additions and max functions, and these must be performed for each leaf we wish to consider. However, these operations are very simple since, in most instances, the propagation past a node will consist of one max operation. This time is small, especially when compared to the time needed for the simulation and utility calculation of different possible situations. A more detailed analysis can only be performed on an application-specific basis, and it would have to take into account actual computational times.
Next: Implementation Strategies
Up: Algorithm
Previous: Algorithm