A rational agent that can be impacted by other agents will benefit if it can predict, with some degree of reliability, the likely actions of other agents. With such predictions, the rational agent can choose its own actions more judiciously, and can maximize its expected payoff in an interaction (or over a set of interactions). To get such predictions, an agent could rely on communication for explicit disclosure by other agents, or on superficial patterns of actions taken by others, or on deeper models of other agents' state. While each of these approaches has its merits, each also involves some cost or risk: effective communication requires careful decision-making about what to say, when to say it, and whether to trust what is heard  ; relying on learned patterns of action  risks jumping to incorrect expectations when environmental variations occur; and using deeper models of other agents can be more accurate but extremely time-consuming .
In this paper, we concentrate on coordinated decision-making using deeper, nested models of agents. Because these models allow an agent to fully represent and use all of its available knowledge about itself and others, the quality of its decision-making can be high. As noted above, however, the costs of reasoning based on recursive models can be prohibitive, since the models grow exponentially in size as the number of levels increases. Our goal, therefore, has been to devise an algorithm which can prune portions of the recursive structure that can be safely ignored (that do not alter an agent's choice of action). We have done this by incorporating limited rationality capabilities into the Recursive Modeling Method (RMM). Our new algorithm allows an agent to decrease its deliberation time without trading off too much quality (expected payoff) in its choice of action. Our contribution, therefore, is to show how a principled approach to multi-agent reasoning such as RMM can be made more practical, and to demonstrate the practical mechanisms in a canonical multi-agent decision-making problem.
We start by providing a short description of RMM and the Pursuit task, on which we tested our algorithm. Afterward we present situations as our way of representing RMM hierarchies, and define their expected gain. These concepts form the basis of our algorithm, shown in the Algorithm section, which expands only those nodes with the highest expected gain. The Implementation section presents the results of an implementation of our algorithm, from which we derive some conclusions and directions for further work, discussed in the Conclusion.