A rational agent that can be impacted by other agents will benefit if it can predict, with some degree of reliability, the likely actions of other agents. With such predictions, the rational agent can choose its own actions more judiciously, and can maximize its expected payoff in an interaction (or over a set of interactions). To get such predictions, an agent could rely on communication for explicit disclosure by other agents, or on superficial patterns of actions taken by others, or on deeper models of other agents' state. While each of these approaches has its merits, each also involves some cost or risk: effective communication requires careful decision-making about what to say, when to say it, and whether to trust what is heard [2] [9]; relying on learned patterns of action [15] risks jumping to incorrect expectations when environmental variations occur; and using deeper models of other agents can be more accurate but extremely time-consuming [7].
In this paper, we concentrate on coordinated decision-making using
deeper, nested models of agents. Because these models allow an agent
to fully represent and use all of its available knowledge about itself
and others, the quality of its decision-making can be high. As noted
above, however, the costs of reasoning based on recursive models can be
prohibitive, since the models grow exponentially in size as the number
of levels increases. Our goal, therefore, has been to devise an
algorithm which can prune portions of the recursive structure that can
be safely ignored (that do not alter an agent's choice of action). We
have done this by incorporating limited rationality capabilities into
the Recursive Modeling Method (RMM). Our new algorithm allows an agent
to decrease its deliberation time without trading off too much quality
(expected payoff) in its choice of action. Our contribution,
therefore, is to show how a principled approach to multi-agent
reasoning such as RMM can be made more practical, and to
demonstrate the practical mechanisms in a canonical multi-agent
decision-making problem.
We start by providing a short description of RMM and the Pursuit task, on which we tested our algorithm. Afterward we present situations as our way of representing RMM hierarchies, and define their expected gain. These concepts form the basis of our algorithm, shown in the Algorithm section, which expands only those nodes with the highest expected gain. The Implementation section presents the results of an implementation of our algorithm, from which we derive some conclusions and directions for further work, discussed in the Conclusion.