Our implementation of limited rationality and our notation closely
parallels Russell and Wefald's work [14], although
with some major differences as will be pointed out later.
We define a *partially expanded situation* as a subset of a
situation where only some of the nodes have been
expanded. That is, if *t* is a partially expanded situation of *s*,
then it must contain the root node of *s*, and all of *t*'s nodes must
be directly or indirectly connected to this root by other nodes in
*t*. The situation *t* might have fewer nodes than *s*. At those
places where the tree was pruned, *t* will insert the zero knowledge
strategy, as shown in Figure 2.

**Figure 2:** These are graphical representations of (a) a situation *s*
(the agent in *s* expects to get the payoffs given by S1 and believes
that its opponents will get payoffs based on S2 and S3, respectively),
and (b) a partially expanded situation *t* that corresponds to *s*.

We also define our basic computational action to correspond to the
expansion of a leaf node *l* in a partially expanded situation *t*,
plus the propagation of the new strategy up the tree. The expansion of
leaf node *l* corresponds to the creation of the matrix *M*, and its
placement in *l* along with *f*(*x*), *W*, and pointers to the children.
The children are set to the ZK strategy because we have already spent
some time generating *M* and we do not wish to spend more time
calculating a child's matrix, which would be a whole other ``step'' or
computational action. We do not even want to spend time calculating
each child's expected strategy since we wish to keep the computation
fairly small. The strategy that results from propagating the ZK
strategies past M is then propagated up the tree of *t*, so as to
generate the new strategy that *t* evaluates to. The whole process,
therefore, is composed of two stages. First we Expand *l* and then we
Propagate the new strategy up *t*. We will call this process
, or *PE*(*t*,*l*) for short. The procedure
*PE*(*t*,*l*) is our basic computational action.

We define the time cost for doing *PE*(*t*,*l*) as the time it takes to
generate the new matrix *M*, plus the time to propagate a solution up
the tree, plus the time costs incurred so far. This time cost is:

where *c* and *d* are constants of proportionality. The function
*g*(*x*) gives us a measure of the ``urgency'' of the task. That is, if
there is a deadline at time *T* before which the agent must act, then
the function *g*(*x*) should approximate infinity as *x* approximates
*T*. We now define some more notation to handle all the strategies
associated with a particular partially expanded situation.

t- Partially expanded situation, all of the values below are associated with it. Corresponds to the fully expanded situation
s.

- The strategy the agent currently favors in the partial situation
t.

- The strategies the agent in
tbelieves thenopponents will favor.

t.leaves- These are the leaf nodes of the partially expanded situation
tthat can still be expanded (i.e. those that are not also leaf nodes ins).

- The strategy an agent expects to play given that, instead of actually expanding leaf
l(where .leaves) he simply uses its expected value, which he got fromf(x), and propagates this strategy up the tree.

- The strategies an agent expects the other agents to play given that, instead of actually expanding leaf
l, he simply uses its expected value.

- This is the payoff an agent gets in his current situation. To calculate it, use the root matrix and let the agent's strategy be while the opponents' strategies are .

Given all this notation, we can finally calculate the utility of a
partially expanded situation *t*. Let *t*' be situation *t* after
expanding some leaf node *l* in it, i.e. . The
utility is the payoff the agent will get, given the current evaluation
of *t*'. The expected utility of *t*' is the utility that an agent in
*t* *expects* to get if he expands *l* so as to then be in *t*'.

Note that the utility of a situation depends on the shape of the tree below it since both and are calculated by using the whole tree. There are no utility values associated with the nodes by themselves.

Unfortunately, we cannot use the utilities of situations in the traditional way, which is to expand leaves so as to maximize expected utility. Such traditional reasoning in a recursive modeling system would mean that the system would not introspect (examine its recursive models) more deeply if it thought that doing so might lead it to discover that it in fact will not receive as high a payoff as it thought it would before doing the introspection. Ignoring more deeply nested models because they reduce estimated payoff is a bad strategy, because ignoring them does not change the fact that other agents might take the actions that lead to lower payoffs. This reasoning is better understood with the help of an example. We contrast our problem with the case of using minimax for playing a board game, as studied in [14]. There, if a player realizes that taking move A will lead to a set of possible outcomes, none of which have higher utility than he gets when taking move B, then the player should not consider move A anymore. Within our framework, however, this result does not apply. Imagine a predator who is very close to surrounding the prey. It might initially decide that the others will be taking moves so that, if he moves North, the prey will be captured. This state has a very high utility and is very desirable to the predator. However, the predator also realizes that if he thinks some more about the predator to his left, he will probably change his mind about what the predator to the left will do, so he will then prefer to go West. The resulting situation will not be a capture and, therefore, will have smaller utility. The question then is: should the predator stop thinking about the other one to its left, just because doing so will lead him to think his prospects for the future are worse? We do not believe so. The predator should continue to think more deeply about others as long as doing so makes him change his mind about which move to take, even when it leads him to the realization that things are worse than he initially believed. Thus, we need a different notion of what an agent can ``gain'' by examining more deeply nested knowledge.

**Figure 3:** Here we can see how the algorithm proceeds when expanding a
situation. We start with *t* in (a). For each leaf we calculate the
expected gain. (b) shows how this is done for leaf *l*. If we
do decide to expand leaf *l* we will end up with a new *t*', as seen
in (c).

The driving force for the agent's reasoning is to decide what its best
course of action is. Thus, expanding a situation is only useful if,
by doing so, the agent chooses a different action than it would have
before doing the expansion. And the degree to which it is better off
with the different action rather than the original one (given what it
now knows) represents the *gain* due to the expansion. More
specifically, the Gain of situation *t*', *G*(*t*'), is the amount of
payoff an agent gains if it previously was in situation *t* and, after
expanding leaf leaves, is now in situation *t*'. Since , we could also view this as *G*(*PE*(*t*,*l*)).
The Expected Gain, *E*(*G*(*t*')), is similarly *E*(*G*(*PE*(*t*,*l*))), which is
approximated by G(Propagate(E(Expand(*t*,*l*)))) in our algorithm. We
can justify this approximation because Propagate() simply propagates
the strategy at the leaf up to the root, and it usually maps similar
strategies at the leaves to similar strategies at the root. So, if the
expected strategy at the leaf, as returned by E(Expand(*t*,*l*)), is
close to the real one, then the strategy that gets propagated up to
the root will also be close to the real one. The formal
definitions of gain and expected gain are:

Notice that *E*(*G*(*t*')) reaches a minimum when , since is chosen so as to
maximize the payoff given . In this
case, *G*(*t*')= - *TC*(*t*,*l*). This means that the gain will be negative if
the possible expansion *l* does not lead to a different and better
strategy.

** Next:** Algorithm
**Up:** Theory
** Previous:** Theory

jmvidal@umich.edu

Sun Mar 10 12:52:06 EST 1996