Sample Learning Complexity

<BODY LANG="EN" bgcolor="#ffffff" text="#000000"> <P> <H1><A NAME="SECTION00040000000000000000">Sample Learning Complexity</A></H1> <P> <A NAME="seclearning"> </A> <P> Lets say that a 0-level agent does not have perfect knowledge (i.e. its <I>K</I><SUB><I>i</I></SUB>(<I>f</I><SUB><I>i</I></SUB>(<I>w</I>)) does not match the oracle <I>M</I>(<I>w</I>) function), then we know that some or all of it's <IMG WIDTH=36 HEIGHT=12 ALIGN=MIDDLE ALT="tex2html_wrap_inline1191" SRC="img31.gif"> mappings must be wrong and need to be learned. If the agent is using some form of supervised learning (i.e. where a teacher tells it which action to take each time), then it is trying to learn one of |<I>A</I><SUB><I>i</I></SUB>|<SUP>|<I>W</I>|</SUP> possible 0-level models. If instead it is using some form of reinforcement learning, where it gets a reward (positive or negative) after every action, then it is trying to learn one of <IMG WIDTH=49 HEIGHT=22 ALIGN=MIDDLE ALT="tex2html_wrap_inline1195" SRC="img32.gif"> possible models, where <I>R</I> is the set of rewards it gets (<IMG WIDTH=36 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1199" SRC="img33.gif">). This means that, if the agent is being taught which actions are better, then it just needs to learn the mapping from state <I>w</I> to action <I>a</I>. While, if it gets a reward for each action in each state, then it needs to learn the mapping from state-action (<I>w</I>,<I>a</I>) pairs to their rewards in order to determine which actions lead to the highest reward. <P> If, on the other hand, a 1-level agent is wrong, then the problem could be either in its <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif">, or in its <I>K</I><SUB><I>i</I></SUB><I>K</I><SUB><I>j</I></SUB>(<I>f</I><SUB><I>ij</I></SUB>(<I>w</I>)). An interesting case is where we assume that the former knowledge is already known by the agent. This can happen in MASs where the designer knows what the agent should do given what all the other agents will do. So, assuming that <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> is always correct, we have <I>K</I><SUB><I>i</I></SUB><I>K</I><SUB><I>j</I></SUB>(<I>f</I><SUB><I>ij</I></SUB>(<I>w</I>)) as the only source of the discrepancy. Since agents can observe each other's actions in all states, we can assume that they learn this knowledge using some form of supervised learning (i.e. the observed agent is the teacher because it ``tells'' others what it does in each <I>w</I>). Therefore, in learning this knowledge an agent will be picking from a set of |<I>A</I><SUB><I>j</I></SUB>|<SUP>|<I>W</I>|</SUP> possible models. <P> It should be intuitive (assuming <IMG WIDTH=75 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1069" SRC="img15.gif">) that the learning problem that the 1-level agent has is the same magnitude as the one the 0-level agent using supervised learning has, but smaller than the reinforcement 0-level agent's problem. However, we can make this a bit more formal by noticing that we can use the size of the hypothesis (or concept) space to determine the <EM>sample complexity</EM> of the learning problem. This give us a rough idea of the number of examples that a PAC-learning algorithm would have to see before reaching an acceptable hypothesis (i.e. model). <P> We first define the error, at any given time, of agent <I>i</I>'s action function <I>g</I><SUB><I>i</I></SUB>(<I>w</I>), as: <BR><A NAME="error"> </A><IMG WIDTH=382 HEIGHT=12 ALIGN=MIDDLE ALT="displaymath1468" SRC="img34.gif"><BR> <P> where <I>D</I> is the distribution from which world states <I>w</I> are drawn, and <I>g</I><SUB><I>i</I></SUB>(<I>w</I>) returns the action that agent <I>i</I> will take in state <I>w</I>, given its current knowledge (i.e. all the <IMG WIDTH=26 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1235" SRC="img35.gif"> models). We also let <IMG WIDTH=7 HEIGHT=13 ALIGN=MIDDLE ALT="tex2html_wrap_inline1237" SRC="img36.gif"> be the upper bound we wish to set on the probability that <I>i</I> has a bad model, i.e. one with <IMG WIDTH=65 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1241" SRC="img37.gif">. The sample complexity is bounded from above by <I>m</I>, whose standard definition from computational learning theory is: <P> <BR><A NAME="samplecomp"> </A><IMG WIDTH=288 HEIGHT=28 ALIGN=MIDDLE ALT="displaymath1470" SRC="img38.gif"><BR> <P> where |<I>H</I>| is the size of the hypothesis (i.e. model) space. Given these equations, we can plug in values for one particularly interesting case, and we get an interesting result. <P> <BR><A NAME="th1level"> </A><IMG WIDTH=411 HEIGHT=83 ALIGN=BOTTOM ALT="theorem212" SRC="img39.gif"><BR> <b>Proof</b> We saw before that |<I>H</I>|=|<I>A</I>|<SUP>|<I>W</I>|</SUP> for the 1-level agent, and <IMG WIDTH=79 HEIGHT=22 ALIGN=MIDDLE ALT="tex2html_wrap_inline1255" SRC="img40.gif"> for the 0-level with reinforcement-based learning. Using Equation <A HREF="node4.html#samplecomp" target="contents">2</A> we can determine that the 1-level agent's sample complexity will be less than the 0-level reinforcement agent as long as |<I>R</I>| > |<I>A</I>|<SUP>1/|<I>A</I>|</SUP>, which is always true because <IMG WIDTH=36 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1199" SRC="img33.gif"> and |<I>A</I>| > 0. <P> <P><A NAME="573"> </A><BR><A NAME="comptable"> </A><IMG WIDTH=395 HEIGHT=115 ALIGN=BOTTOM ALT="table224" SRC="img41.gif"><BR><BR> <STRONG>Table:</STRONG> Size of the hypothesis spaces |<I>H</I>| for learning the different sets of knowledge, depending on whether the agent uses supervised or reinforcement learning. <I>A</I><SUB><I>i</I></SUB> is the set of actions and <I>R</I><SUB><I>i</I></SUB> is the set of rewards for agent <I>i</I>, <I>n</I> is the number of agents, and <I>W</I> the set of possible world states.<BR> <P> <P> This theorem tells us that, in these cases, the 1-level will have better models, on average, than the 0-level agent. In fact, we can calculate the size of the hypothesis space |<I>H</I>| for all the different types of knowledge, as seen in Table <A HREF="node4.html#comptable" target="contents">2</A>. This table, along with Equation <A HREF="node4.html#samplecomp" target="contents">2</A>, can be used to determine the sample complexity of learning the different types of knowledge for any agent that uses any form of supervised or reinforcement learning. In this way, we can compare two agents to determine which one will have the more accurate models, on average. Please note that some of these complexities are independent of the number of agents (<I>n</I>). We can do this because we assume that all actions are seen by all agents so an agent can build <IMG WIDTH=33 HEIGHT=6 ALIGN=BOTTOM ALT="tex2html_wrap_inline1315" SRC="img42.gif"> models of all other agents in parallel, and assume everyone else can do the same. However, the actual computational costs will increase linearly with each agent, since the agent will need to maintain a separate model for each other agent. The sample complexities rely on the assumption that, between each action, there is enough time for the agent to update its models. <P> A designer of an agent for a MAS can consult Table <A HREF="node4.html#comptable" target="contents">2</A> to determine how long his agent will take to learn accurate models, given different combinations of implemented versus learned knowledge, and supervised versus reinforcement learning algorithms. However, we can further refine this table by noticing that if a designer has, for example, <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> knowledge he can actually apply this knowledge when building a 0-level agent. The use of this knowledge will result in a reduction in the size of the hypothesis space for the 0-level agent. <P> The reduction can be accomplished by looking at the <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> knowledge and determining which <IMG WIDTH=36 HEIGHT=12 ALIGN=MIDDLE ALT="tex2html_wrap_inline1191" SRC="img31.gif"> pairings are impossible and eliminating these from the hypothesis space of the 0-level modeler. That is, for all <IMG WIDTH=35 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1323" SRC="img43.gif"> and <IMG WIDTH=36 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1067" SRC="img14.gif">, eliminate from the table of all possible mappings all the <IMG WIDTH=36 HEIGHT=12 ALIGN=MIDDLE ALT="tex2html_wrap_inline1191" SRC="img31.gif"> mappings for which: <P> <OL> <LI> There does not exist an <IMG WIDTH=51 HEIGHT=23 ALIGN=MIDDLE ALT="tex2html_wrap_inline1329" SRC="img44.gif"> such that <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> and <IMG WIDTH=78 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1125" SRC="img21.gif">, i.e. the action <I>a</I><SUB><I>i</I></SUB> is never taken in state <I>w</I>, regardless of what the others do. <LI> For all <IMG WIDTH=51 HEIGHT=23 ALIGN=MIDDLE ALT="tex2html_wrap_inline1329" SRC="img44.gif"> it is true that <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> and <IMG WIDTH=78 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1125" SRC="img21.gif">, i.e. the agent takes the same action <I>a</I><SUB><I>i</I></SUB> in <I>w</I> no matter what the others do. </OL> <P> After their application, we are left with a new table <IMG WIDTH=66 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1349" SRC="img45.gif"> with <IMG WIDTH=55 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1351" SRC="img46.gif">, and each <IMG WIDTH=37 HEIGHT=17 ALIGN=MIDDLE ALT="tex2html_wrap_inline1353" SRC="img47.gif"> has a set <I>A</I><SUP><I>w</I></SUP><SUB><I>i</I></SUB> associated with it. We can then determine that, if the new 0-level modeler uses supervised learning, the size of its hypothesis space will be <IMG WIDTH=62 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1357" SRC="img48.gif">. While, if it uses reinforcement learning, its hypothesis size will be |<I>R</I><SUB><I>i</I></SUB>|<SUP>|<I>T</I><SUB><I>i</I></SUB>|</SUP>. Table <A HREF="node4.html#comp1table" target="contents">3</A>(a) summarizes the size of the hypothesis spaces for learning the different types of knowledge given that the designer uses the <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> knowledge to reduce the hypothesis spaces of other types of knowledge. <P> <P><A NAME="563"> </A><BR><A NAME="comp1table"> </A><IMG WIDTH=414 HEIGHT=252 ALIGN=BOTTOM ALT="table316" SRC="img49.gif"><BR><BR> <STRONG>Table:</STRONG> Size of the hypothesis spaces |<I>H</I>| for learning the different types of knowledge. The (a) columns assume the designer already has <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> knowledge, while in (b) he also has <I>K</I><SUB><I>i</I></SUB><I>K</I><SUB><I>j</I></SUB>(<I>f</I><SUB><I>ij</I></SUB>(<I>w</I>)). If knowledge is known then |<I>H</I>|=0.<BR> <P> <P> Similarly, if the designer also has the knowledge <IMG WIDTH=94 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline1139" SRC="img23.gif">, he creates a reduced table <I>T</I><SUB><I>j</I></SUB> for all other agents. The new hypothesis spaces will then be given by Table <A HREF="node4.html#comp1table" target="contents">3</A>(b). <P> For example, a designer for our example market economy MAS can quickly realize that he knows what price his agent should bid given the bids of all others and the probabilities that the buyer will pick each bid. That is, the designer has <IMG WIDTH=73 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline1123" SRC="img20.gif"> knowledge. He also can determine that in a market economy he can not implement a 0-level supervised learning agent because, even after the fact, it is impossible for a 0-level to determine what it should have bid. Therefore, using Theorem <A HREF="node4.html#th1level" target="contents">2</A>, the designer will choose to implement a 1-level supervised learning agent and not a 0-level reinforcement learning agent. More complicated situations would be dealt with in a similar way using Table <A HREF="node4.html#comp1table" target="contents">3</A>. <P> <HR> <P><ADDRESS> <I>Jose M. Vidal <BR> jmvidal@umich.edu <BR> Thu Apr 24 15:00:31 EDT 1997</I> </ADDRESS> </BODY>