We assume that the agents inhabit a discrete world with a finite
number of states, denoted by the set W. The agents have common
knowledge of the fact that every agent can see the state of the world
and the actions taken by any other agent. There are n agents,
numbered . If we let = ``agent i can
see the action taken by agent j'', and = ``agent i and
j both see the same world'', then the notation in
[2] lets us express these ideas more
succinctly in terms of common knowledge among the agents in N, using
the two following statements:
and .
We group the set of all actions taken by all agents in , where Ai is the set of actions that
can be taken by agent i and is one particular
action. We will sometimes assume that . All agents take actions at discrete time intervals and these
actions are considered simultaneous and seen by all.
Looking down on a such a system, we see that there is an oracle
mapping Mi(w) for each agent i, which returns the best action
that agent i can take in state w. However, as we shall see, this
function might be constantly changing, making the agents' learning
task that much more difficult. We will also refer to a similar
function , which returns the best action in state
w, if all other agents take the actions specified by . It is
assumed that the function M is ``myopic'', that is, it does not take
into account the possibility of future encounters, it simply returns
the action that maximizes the immediate payoff given the current
situation. Some of the limitations imposed by this assumption are
relaxed by the fact that agents learn from past experience.
The MAS model we have just described is general enough to encompass a
wide variety of domains. Its two main restrictions are its
discreteness, and the need for the world and the agents' actions to be
completely observable by all agents. It does not, however, say
anything about the agents and their structure. We propose to describe
the possible agents at the knowledge level and characterize them as
0,1,2...-level modelers. The modeling levels refer to the types of
knowledge that these agents keep.
A 0-level agent is not capable of recognizing the fact that
there are other agents in the world. The only way it ``knows'' about
the actions of others is if their actions lead to changes in the world
w, or in the reward it gets. At the knowledge level, we can say that
a 0-level agent i knows a mapping from states w to actions
ai. This fact is denoted by Ki(fi(w)), where . We
will later refer to this mapping as the function gi(w). The goal
of the agent is to have gi(w) = Mi(w). The knowledge can
either be known by the agent (i.e. pre-programmed), or it can be
learned. We will talk about the complexity of learning in a later
Section.
The reader will note that 0-level agents only look at the current
world state w when deciding which action to take. It is
possible that this information is not enough for making a correct
decision. In these cases the 0-level agents
are handicapped because of their simple modeling capabilities.
Table: The type of knowledge the different agent
levels are trying to acquire. They can acquire this knowledge
using any learning technique.
A 1-level agent i recognizes the fact that there are other
agents in the world and that they take actions, but it does not know
anything more about them. Given these facts, the 1-level agent's
strategy is to predict the other agents' actions based on their past
behavior and any other knowledge it has, and use these predictions
when trying to determine its best action. Essentially, it assumes that
the other agents pick their actions by using a mapping from w to
a. At the knowledge level, we say that it knows where
, and KiKj(fij(w)) for all other agents
j, where . Again, we can say that the agent's
actions are given by the function gi(w).
A 2-level agent i also recognizes the other agents in the
world, but has some information about their decision processes
and previous observations. That is, a 2-level agent has insight into
the other agents' internal procedures used for picking an action. This
intentional model of others allows the agent to dismiss
``useless'' information when picking its next action. At the
knowledge level, we say that a 2-level agent knows , , and
KiKjKk(fijk(w)). A simple way a 1-level agent can become 2-level is by assuming
that ``others are like him'', and modeling others using the same
learning algorithms and observations the agent, itself, was using when
it was a 1-level agent.
We can keep defining n-level agents with deeper models in a
similar way. An n-level agent i would have knowledge of the type
, , ..., ,
. The number of K's is n+1.
Jose M. Vidal
jmvidal@umich.edu
Thu Apr 24 15:00:31 EDT 1997