Next: Algorithm Up: Theory Previous: Theory

Notation and Formalisms

Our implementation of limited rationality and our notation closely parallels Russell and Wefald's work [14], although with some major differences as will be pointed out later. We define a partially expanded situation as a subset of a situation where only some of the nodes have been expanded. That is, if t is a partially expanded situation of s, then it must contain the root node of s, and all of t's nodes must be directly or indirectly connected to this root by other nodes in t. The situation t might have fewer nodes than s. At those places where the tree was pruned, t will insert the zero knowledge strategy, as shown in Figure 2.

Figure 2: These are graphical representations of (a) a situation s (the agent in s expects to get the payoffs given by S1 and believes that its opponents will get payoffs based on S2 and S3, respectively), and (b) a partially expanded situation t that corresponds to s.

We also define our basic computational action to correspond to the expansion of a leaf node l in a partially expanded situation t, plus the propagation of the new strategy up the tree. The expansion of leaf node l corresponds to the creation of the matrix M, and its placement in l along with f(x), W, and pointers to the children. The children are set to the ZK strategy because we have already spent some time generating M and we do not wish to spend more time calculating a child's matrix, which would be a whole other ``step'' or computational action. We do not even want to spend time calculating each child's expected strategy since we wish to keep the computation fairly small. The strategy that results from propagating the ZK strategies past M is then propagated up the tree of t, so as to generate the new strategy that t evaluates to. The whole process, therefore, is composed of two stages. First we Expand l and then we Propagate the new strategy up t. We will call this process , or PE(t,l) for short. The procedure PE(t,l) is our basic computational action.

We define the time cost for doing PE(t,l) as the time it takes to generate the new matrix M, plus the time to propagate a solution up the tree, plus the time costs incurred so far. This time cost is:

where c and d are constants of proportionality. The function g(x) gives us a measure of the ``urgency'' of the task. That is, if there is a deadline at time T before which the agent must act, then the function g(x) should approximate infinity as x approximates T. We now define some more notation to handle all the strategies associated with a particular partially expanded situation.

t
Partially expanded situation, all of the values below are associated with it. Corresponds to the fully expanded situation s.

The strategy the agent currently favors in the partial situation t.

The strategies the agent in t believes the n opponents will favor.

t.leaves
These are the leaf nodes of the partially expanded situation t that can still be expanded (i.e. those that are not also leaf nodes in s).

The strategy an agent expects to play given that, instead of actually expanding leaf l (where .leaves) he simply uses its expected value, which he got from f(x), and propagates this strategy up the tree.

The strategies an agent expects the other agents to play given that, instead of actually expanding leaf l, he simply uses its expected value.

This is the payoff an agent gets in his current situation. To calculate it, use the root matrix and let the agent's strategy be while the opponents' strategies are .

Given all this notation, we can finally calculate the utility of a partially expanded situation t. Let t' be situation t after expanding some leaf node l in it, i.e. . The utility is the payoff the agent will get, given the current evaluation of t'. The expected utility of t' is the utility that an agent in t expects to get if he expands l so as to then be in t'.

Note that the utility of a situation depends on the shape of the tree below it since both and are calculated by using the whole tree. There are no utility values associated with the nodes by themselves.

Figure 3: Here we can see how the algorithm proceeds when expanding a situation. We start with t in (a). For each leaf we calculate the expected gain. (b) shows how this is done for leaf l. If we do decide to expand leaf l we will end up with a new t', as seen in (c).

The driving force for the agent's reasoning is to decide what its best course of action is. Thus, expanding a situation is only useful if, by doing so, the agent chooses a different action than it would have before doing the expansion. And the degree to which it is better off with the different action rather than the original one (given what it now knows) represents the gain due to the expansion. More specifically, the Gain of situation t', G(t'), is the amount of payoff an agent gains if it previously was in situation t and, after expanding leaf leaves, is now in situation t'. Since , we could also view this as G(PE(t,l)). The Expected Gain, E(G(t')), is similarly E(G(PE(t,l))), which is approximated by G(Propagate(E(Expand(t,l)))) in our algorithm. We can justify this approximation because Propagate() simply propagates the strategy at the leaf up to the root, and it usually maps similar strategies at the leaves to similar strategies at the root. So, if the expected strategy at the leaf, as returned by E(Expand(t,l)), is close to the real one, then the strategy that gets propagated up to the root will also be close to the real one. The formal definitions of gain and expected gain are:

Notice that E(G(t')) reaches a minimum when , since is chosen so as to maximize the payoff given . In this case, G(t')= - TC(t,l). This means that the gain will be negative if the possible expansion l does not lead to a different and better strategy.

Next: Algorithm Up: Theory Previous: Theory

Jose M. Vidal
jmvidal@umich.edu
Sun Mar 10 12:52:06 EST 1996