Lets say that a 0-level agent does not have perfect knowledge (i.e.
its Ki(fi(w)) does not match the oracle M(w) function), then we know that
some or all of it's mappings must be wrong and need to
be learned. If the agent is using some form of supervised learning
(i.e. where a teacher tells it which action to take each time), then
it is trying to learn one of |Ai||W| possible 0-level models.
If instead it is using some form of reinforcement learning, where it
gets a reward (positive or negative) after every action, then it is
trying to learn one of possible models, where
R is the set of rewards it gets (). This means that, if
the agent is being taught which actions are better, then it just needs
to learn the mapping from state w to action a. While, if it gets a
reward for each action in each state, then it needs to learn the
mapping from state-action (w,a) pairs to their rewards in order to
determine which actions lead to the highest reward.
If, on the other hand, a 1-level agent is wrong, then the problem
could be either in its , or in its KiKj(fij(w)). An interesting case
is where we assume that the former knowledge is already known by the
agent. This can happen in MASs where the designer knows what the agent
should do given what all the other agents will do. So, assuming that
is always correct, we have KiKj(fij(w)) as the only source of the
discrepancy. Since agents can observe each other's actions in all
states, we can assume that they learn this knowledge using some form
of supervised learning (i.e. the observed agent is the teacher because
it ``tells'' others what it does in each w). Therefore, in learning
this knowledge an agent will be picking from a set of |Aj||W|
It should be intuitive (assuming )
that the learning problem that the 1-level agent has is the same
magnitude as the one the 0-level agent using supervised learning has,
but smaller than the reinforcement 0-level agent's problem. However,
we can make this a bit more formal by noticing that we can use the
size of the hypothesis (or concept) space to determine the sample
complexity of the learning problem. This give us a rough idea of
the number of examples that a PAC-learning algorithm would have to see
before reaching an acceptable hypothesis (i.e. model).
We first define the error, at any given time, of agent i's action
function gi(w), as:
where D is the distribution from which world states w are drawn,
and gi(w) returns the action that agent i will take in state
w, given its current knowledge (i.e. all the models).
We also let be the upper bound we wish to set on the
probability that i has a bad model, i.e. one with
. The sample complexity is bounded
from above by m, whose standard definition from computational
learning theory is:
where |H| is the size of the hypothesis (i.e. model) space. Given
these equations, we can plug in values for one particularly
interesting case, and we get an interesting result.
Proof We saw before that |H|=|A||W| for the 1-level
agent, and for the 0-level with
reinforcement-based learning. Using Equation 2 we can
determine that the 1-level agent's sample complexity will be less than
the 0-level reinforcement agent as long as |R| > |A|1/|A|, which
is always true because and |A| > 0.
Table: Size of the hypothesis spaces |H| for
learning the different sets of knowledge, depending on whether the
agent uses supervised or reinforcement learning. Ai is the
set of actions and Ri is the set of rewards for agent i,
n is the number of agents, and W the set of possible world
This theorem tells us that, in these cases, the 1-level will have
better models, on average, than the 0-level agent. In fact, we can
calculate the size of the hypothesis space |H| for all the different
types of knowledge, as seen in Table 2. This table,
along with Equation 2, can be used to determine the
sample complexity of learning the different types of knowledge for any
agent that uses any form of supervised or reinforcement learning. In
this way, we can compare two agents to determine which one will have
the more accurate models, on average. Please note that some of these
complexities are independent of the number of agents (n). We can do
this because we assume that all actions are seen by all agents so an
agent can build models of all other agents in parallel, and
assume everyone else can do the same. However, the actual
computational costs will increase linearly with each agent, since the
agent will need to maintain a separate model for each other agent. The
sample complexities rely on the assumption that, between each action,
there is enough time for the agent to update its models.
A designer of an agent for a MAS can consult Table 2 to
determine how long his agent will take to learn accurate models, given
different combinations of implemented versus learned knowledge, and
supervised versus reinforcement learning algorithms. However, we can
further refine this table by noticing that if a designer has, for
example, knowledge he can actually apply
this knowledge when building a 0-level agent. The use of this
knowledge will result in a reduction in the size of the hypothesis
space for the 0-level agent.
The reduction can be accomplished by looking at the knowledge
and determining which pairings are impossible and
eliminating these from the hypothesis space of the 0-level modeler.
That is, for all and , eliminate from the
table of all possible mappings all the mappings for
- There does not exist an such
that and , i.e. the action
ai is never taken in state w, regardless of what the others
- For all it is true that
and , i.e. the agent takes the same
action ai in w no matter what the others do.
After their application, we are left with a new table with , and each has a set
Awi associated with it. We can then determine that, if the new
0-level modeler uses supervised learning, the size of its hypothesis
space will be . While, if it uses
reinforcement learning, its hypothesis size will be
|Ri||Ti|. Table 3(a) summarizes the size
of the hypothesis spaces for learning the different types of knowledge
given that the designer uses the knowledge to reduce the
hypothesis spaces of other types of knowledge.
Table: Size of the hypothesis spaces |H| for
learning the different types of knowledge. The (a) columns assume
the designer already has knowledge, while in (b) he also
has KiKj(fij(w)). If knowledge is known then |H|=0.
Similarly, if the designer also has the knowledge , he creates a
reduced table Tj for all other agents. The new hypothesis spaces
will then be given by Table 3(b).
For example, a designer for our example market economy MAS can
quickly realize that he knows what price his agent should bid given
the bids of all others and the probabilities that the buyer will pick
each bid. That is, the designer has knowledge. He also can
determine that in a market economy he can not implement a 0-level
supervised learning agent because, even after the fact, it is
impossible for a 0-level to determine what it should have bid.
Therefore, using Theorem 2, the designer will choose to
implement a 1-level supervised learning agent and not a 0-level
reinforcement learning agent. More complicated situations would be
dealt with in a similar way using Table 3.
Jose M. Vidal
Thu Apr 24 15:00:31 EDT 1997