The world and its agents

<BODY LANG="EN" bgcolor="#ffffff" text="#000000"> <P> <H1><A NAME="SECTION00020000000000000000">The world and its agents</A></H1> <P> We assume that the agents inhabit a discrete world with a finite number of states, denoted by the set <I>W</I>. The agents have common knowledge of the fact that every agent can see the state of the world and the actions taken by any other agent. There are <I>n</I> agents, numbered <IMG WIDTH=68 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1041" SRC="img8.gif">. If we let <IMG WIDTH=30 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1043" SRC="img9.gif"> = ``agent <I>i</I> can see the action taken by agent <I>j</I>'', and <IMG WIDTH=30 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1049" SRC="img10.gif"> = ``agent <I>i</I> and <I>j</I> both see the same world'', then the notation in [<A HREF="node7.html#reasoning:about:knowledge" target="contents">2</A>] lets us express these ideas more succinctly in terms of common knowledge among the agents in <I>N</I>, using the two following statements: <IMG WIDTH=81 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1057" SRC="img11.gif"> and <IMG WIDTH=80 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1059" SRC="img12.gif">. <P> We group the set of all actions taken by all agents in <IMG WIDTH=112 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1061" SRC="img13.gif">, where <I>A</I><SUB><I>i</I></SUB> is the set of actions that can be taken by agent <I>i</I> and <IMG WIDTH=36 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1067" SRC="img14.gif"> is one particular action. We will sometimes assume that <IMG WIDTH=75 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1069" SRC="img15.gif">. All agents take actions at discrete time intervals and these actions are considered simultaneous and seen by all. <P> Looking down on a such a system, we see that there is an oracle mapping <I>M</I><SUB><I>i</I></SUB>(<I>w</I>) for each agent <I>i</I>, which returns the best action that agent <I>i</I> can take in state <I>w</I>. However, as we shall see, this function might be constantly changing, making the agents' learning task that much more difficult. We will also refer to a similar function <IMG WIDTH=56 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1079" SRC="img16.gif">, which returns the best action in state <I>w</I>, if all other agents take the actions specified by <IMG WIDTH=17 HEIGHT=17 ALIGN=MIDDLE ALT="tex2html_wrap_inline1083" SRC="img17.gif">. It is assumed that the function <I>M</I> is ``myopic'', that is, it does not take into account the possibility of future encounters, it simply returns the action that maximizes the immediate payoff given the current situation. Some of the limitations imposed by this assumption are relaxed by the fact that agents learn from past experience. <P> The MAS model we have just described is general enough to encompass a wide variety of domains. Its two main restrictions are its discreteness, and the need for the world and the agents' actions to be completely observable by all agents. It does not, however, say anything about the agents and their structure. We propose to describe the possible agents at the knowledge level and characterize them as 0,1,2...-level modelers. The modeling levels refer to the types of knowledge that these agents keep. <P> A <b>0-level agent</b> is not capable of recognizing the fact that there are other agents in the world. The only way it ``knows'' about the actions of others is if their actions lead to changes in the world <I>w</I>, or in the reward it gets. At the knowledge level, we can say that a 0-level agent <I>i</I> knows a mapping from states <I>w</I> to actions <I>a</I><SUB><I>i</I></SUB>. This fact is denoted by <I>K</I><SUB><I>i</I></SUB>(<I>f</I><SUB><I>i</I></SUB>(<I>w</I>)), where <IMG WIDTH=55 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1097" SRC="img18.gif">. We will later refer to this mapping as the function <I>g</I><SUB><I>i</I></SUB>(<I>w</I>). The goal of the agent is to have <I>g</I><SUB><I>i</I></SUB>(<I>w</I>) = <I>M</I><SUB><I>i</I></SUB>(<I>w</I>). The knowledge can either be known by the agent (i.e. pre-programmed), or it can be learned. We will talk about the complexity of learning in a later Section. <P> The reader will note that 0-level agents only look at the current world state <I>w</I> when deciding which action to take. It is possible that this information is not enough for making a correct decision. In these cases the 0-level agents are handicapped because of their simple modeling capabilities. <P> <P><A NAME="521"> </A><IMG WIDTH=166 HEIGHT=106 ALIGN=BOTTOM ALT="table124" SRC="img19.gif"><BR> <STRONG>Table:</STRONG> The type of knowledge the different agent levels are trying to acquire. They can acquire this knowledge using any learning technique.<BR> <P> <P> A <b>1-level agent</b> <I>i</I> recognizes the fact that there are other agents in the world and that they take actions, but it does not know anything more about them. Given these facts, the 1-level agent's strategy is to predict the other agents' actions based on their past behavior and any other knowledge it has, and use these predictions when trying to determine its best action. Essentially, it assumes that the other agents pick their actions by using a mapping from <I>w</I> to <I>a</I>. At the knowledge level, we say that it knows <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> where <IMG WIDTH=78 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1125" SRC="img21.gif">, and <I>K</I><SUB><I>i</I></SUB><I>K</I><SUB><I>j</I></SUB>(<I>f</I><SUB><I>ij</I></SUB>(<I>w</I>)) for all other agents <I>j</I>, where <IMG WIDTH=61 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1131" SRC="img22.gif">. Again, we can say that the agent's actions are given by the function <I>g</I><SUB><I>i</I></SUB>(<I>w</I>). <P> A <b>2-level agent</b> <I>i</I> also recognizes the other agents in the world, but has some information about their decision processes and previous observations. That is, a 2-level agent has insight into the other agents' internal procedures used for picking an action. This <EM>intentional</EM> model of others allows the agent to dismiss ``useless'' information when picking its next action. At the knowledge level, we say that a 2-level agent knows <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif">, <IMG WIDTH=94 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1139" SRC="img23.gif">, and <I>K</I><SUB><I>i</I></SUB><I>K</I><SUB><I>j</I></SUB><I>K</I><SUB><I>k</I></SUB>(<I>f</I><SUB><I>ijk</I></SUB>(<I>w</I>)). A simple way a 1-level agent can become 2-level is by assuming that ``others are like him'', and modeling others using the same learning algorithms and observations the agent, itself, was using when it was a 1-level agent. <P> We can keep defining <b>n-level agents</b> with deeper models in a similar way. An n-level agent <I>i</I> would have knowledge of the type <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif">, <IMG WIDTH=94 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1139" SRC="img23.gif">, ..., <IMG WIDTH=110 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1149" SRC="img24.gif">, <IMG WIDTH=85 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1151" SRC="img25.gif">. The number of <I>K</I>'s is <I>n</I>+1. <P> <HR> <P><ADDRESS> <I>Jose M. Vidal <BR> jmvidal@umich.edu <BR> Thu Apr 24 15:00:31 EDT 1997</I> </ADDRESS> </BODY>