Decentralized learning means that:
- The agents share the same learning process.
- There are many agents learning.
- One agent learns and tells the others what it learned.
- The system learns but not the individual agents.
- Each agent learns a different part of the problem and keeps it to itself.
The credit assignment problem is made especially difficult in a MAS because:
- We need to determine which agent(s) were responsible for the reward.
- The dynamics of the system can be complicated.
- The agents might not reach a consensus.
- There is often not enough credit to assign to all the agents.
- Selfish agents desire more credit than they should get.
Which one of the following choices is not one of the assumptions made by the reinforcement learning problem formulation:
- The agent's reward function changes over time.
- There is a finite set of states of the world.
- Time is discrete.
- The agent's movement from state to state is described by a Markov process.
- If the agent takes the same action from the same state at several different times he might end up in different states each time.
In reinforcement learning a policy for an agent describes
- The action the agent will take in each state.
- The action the agent should take in each state in order to maximize its immediate reward.
- The action the agent should take in each state order to maximize its time-discounted reward.
- The best set of possible actions.
- The order in which the agent trades-off exploitation and exploration.
The Q-learning update formula is:
- Q(s,a) = (1-b)Q(s,a) + b(R + g*maxa' in AQ(s',a'))
- Q(s,a) = b*Q(s,a) + R
- Q(s,a) = (1-b)Q(s,a) + b*g*maxa' in AQ(s',a'))
- Q(s,a) = b(R + g*maxa' in AQ(s',a'))
- Q(s,a) = (1-b)Q(s,a) + b(R + g*Q(s,a) - g*mina' in AQ(s',a'))
The explore versus exploit dilemma happens when an agent
- Tries do decide whether to take the action it thinks is best or try one that he thinks will be worse.
- Tries to decide whether to take advantage of other agents or go wandering about.
- Is undecided about which action is best for itself and which one is better for the system as a whole.
- Cannot determine the long-term impact of its actions.
- Is uncertain as to which state it will end up in after taking a particular action.
In Q-learning the learning rate parameter
- Determines how to weight new versus older experiences.
- There is no such parameter.
- Determines how well the Q-learning works.
- Can be used to modify the number of times an agents needs to see something before it learns it.
- Is only useful in conjunction with the exploration rate parameter.
Classifiers in a classifier system are divided into
- Input, Output, and Strength parts.
- State and action pairs.
- Regions of shared expertise.
- Coalitions.
- Pre and post-action ensembles.
The bucket brigade algorithm used in classifier systems works by
- Having the winning classifier give some of its strength to the one that fired before.
- Having the classifiers form a chain of command where one calls the next one.
- Distributing the reward evenly between all the classifiers involved in the choice of a successful action.
- Applying the crossover operator to classifiers that activate each other.
- Having each classifier pass a piece of strength ("bucket") to the next one in the activation line.
A classifier system uses a genetic algorithm in order to:
- Kill off bad classifiers and breed the good ones with each other.
- Extending the abilities of a classifier so that it can learn to change its behavior.
- Simulate the process of simulated annealing.
- Determine the evolutionary strength of the classifier system.
- It does not use genetic algorithms.
A 1-level learning agent models other agents as 0-level. This means that:
- It looks at their past actions and assumes that their future actions will be completely determined by them.
- It learns the expected reward for each of its actions.
- It builds a model of how the other agents make their decisions and uses this model to simulate their decision process.
- It uses a genetic algorithm to find the best model of the other agents.
- It uses a Q-learning algorithm to determine which of the other agents' actions will lead to greater rewards for the agent.
Experiments using 0, 1, and 2-level buyer and seller agents in a market simulation lead to the conclusion that
- Higher-level agents do better until other agents start to also become higher level.
- The higher the level the better the agent will perform.
- The lower the level the better the agent will perform.
- It does not matter what level the agent is in, a market equilibrium is always reached.
- Learning is not useful in a market system.
The nodes in Usenet avoid sending messages that other nodes have already seen by
- Each message keeps a list of the nodes it has visited.
- Asking the other node if it has seen the message before sending it.
- Accessing a central database.
- Looking at the message's unique ID>
- Doing a DNS lookup.
If the all the root DNS servers went down, what will happen?
- DNS lookups to names that are not cached in the local host would fail.
- The Internet would crash: any contact between two computers would be impossible.
- Nothing.
- Users would only be able to access computers within their local domain.
- The local nodes would take over all the responsibilities of the root nodes.
The tragedy of the commons is that
- Shared resources will be depleted by rational agents.
- Selfish agents cannot share.
- Goods can never be fairly allocated among rational agents.
- Fields of grass will always be abused by sheep.
- A dictator will always arise to take over any shared good.
The TCP protocol implements explicit cooperation by:
- Reducing the frequency of re-sends when there is congestion.
- Limiting the bandwidth it uses to be a percentage of the available bandwidth.
- Letting others go first when sending a message.
- Providing an error-free connection between hosts.
- Finding the best route for a packet.
Which one of the following choices is not an existing limitation of the Gnutella protocol.
- New nodes can only join in after being approved by a vote.
- Searches do not scale well.
- A node can only talk to its nearby neighbors.
- Nodes can provide false results to a search request.
- There is not way to know which node the data came from.
In Gnutella, the Time To Live (TTL) parameter is used to
- Prevent queries for living forever in the system.
- Prevent freeloader nodes for living forever in the system.
- Routing queries to the most appropriate nodes.
- Avoiding sending a query to a node that has already seen it.
- Bringing down the system if necessary.
Which one of the following statement is not a true statement about the Freenet search protocol:
- The search string provided by the user is matched against the title of the available documents in each node.
- Nodes keep a copy of the data they have forwarded.
- Searches are done depth-first.
- It uses its own special port number.
- Documents are given a globally unique identifier.
Which one of the following methods is not a reasonable way of subverting the Gnutella network.
- Trying to bring down every node on the network.
- Flooding the system with a lot of queries.
- Returning false results to the queries.
- Modifying passing queries by increasing their TTL.
- Creating a worms that attacks vulnerabilities in the client programs.
As software programs become larger and our need to control their
complexity increases, the prevailing strategy used to handle
complexity is (as explained in the "Go To the Ant" paper):
- Towards increased localization of responsibilities.
- The use of agent architectures.
- Towards more adaptation.
- Good initial design.
- The creation of interpreted programming languages.
Ants are able to sort the larvae, eggs, and food in their nest by:
- Remembering what they have seen recently.
- Using pheromones to signal others.
- Following the directions from the queen.
- Marking an area of the nest as the larvae area, another as the eggs area, etc.
- Letting the workers fight for the food.
Termites are able to build theirs nests by:
- Depositing waste in the places that have higher pheromone intensity.
- Depositing waste in the places marked by the queen.
- Moving about randomly until they find a dropping then depositing waste on it.
- Finding the area nearest to a food source.
- Choosing stochastically when to make a dropping based on the number of other termites they have seen recently.
The fact that wasps achieve task differentiation is especially surprising since
- They are genetically identical.
- They do it without interacting with each other.
- They make use of pheromones.
- They are able to elect a leader.
- Once it is achieved the process used to achieve it can be stopped.
The basic flocking behavior of birds, fish, and RoboCup players, can be achieved by using:
- Attraction and repulsion forces.
- Communication.
- Pheromones.
- Task differentiation.
- Task allocation.
If you are implementing a MAS that does some function F, you
should (according to the "Go To the Ant" paper):
- Determine which agent behaviors will result in the emergent behavior that implements F.
- Divide F into smaller pieces and give each piece to an agent.
- Implement F in one of the agents.
- Provide each agent with the ability to do F.
- Find some other function G which is similar to F but can be distributed among the agents.
Maintaining agents small in scope (local sensing and action) is a good technique because:
- It avoids excessive broadcasting of messages.
- It makes agents more powerful.
- It reduces the amount of information an agent should remember.
- It increases the responsibilities of the agent.
- It promotes cooperation.
Which one of the following is not necessarily a way of achieving agent diversity?
- Using different agent architectures.
- Placing agent at different physical locations.
- Using repulsion and attraction forces.
- Giving them different responsibilities.
- Giving them different behaviors.
The architectures could implement the same behavior.
Task differentiation in wasps is achieved by
- Having each wasp maintain a Force and a Foraging Threshold parameters.
- Having each was maintain and Demand and a Force parameters.
- Having wasps fight, the winner gets the Foraging job and the loser the Nursing job.
- Letting their genetic makeup determine their behavior.
- Using pheromones.
The following are all good reasons to keep your agents small, except which one?
- Small agents can communicate with many other agents.
- Small programs are easier to write.
- Big programs attract functionality.
- Big agents could become a central point of failure.
- Small agents are more likely to have specialized functionality.
The EL Farol Bar problem is one where a number of agents try to
determine which night to attend the bar. The agents decide which night of the week to attend:
- At the beginning of the week.
- Based on communications with other agents.
- Using genetic algorithms.
- By flipping a coin.
- By forming subgroups.
The COIN framework applies only to groups of agents:
- That use reinforcement learning.
- That use classifier systems.
- That communicate with each other.
- That use shared plans.
- That are trying to solve a task allocation problem.
In the COIN framework, macrolearning refers to
- The modification of the agents' utility functions.
- The use of reinforcement learning.
- The use of meta-level knowledge to solve the learning problem.
- Learning about the learning problem itself.
- The modification of each agent's learning rate parameter.
According to the COIN framework, a constraint-aligned system is one:
- In which any change to the state of the agents in subworld w at time 0 will have no effect on the state of agents outside of w at times later than 0.
- One where a change at time 0 to the agents in w results in an increased vale for gw(S) if and only if it results in an increased value for G(S).
- In which, for each subworld w, all agents in it have the same subworld utility functions.
- One where a change at time 0 to the agents in w results in an decreased vale for gw(S) if and only if it results in an decreased value for G(S).
- One where a change at time 0 to the agents in w results in an decreased vale for gw(S) if it also results in an decreased value for G(S).
The wonderful life utility define by COIN can be stated as:
- The utility the whole system received minus the utility the whole system would have received had the agent never existed.
- The utility the whole system would have received had the agent never existed.
- The utility the agent would have received had the other agents never existed.
- The best possible utility the agent could have received.
- The utility the whole system would have received had the agent never existed minus the utility the agent received.
The COIN framework says that we should use the wonderful life utility
- As the reward function for the agents.
- As a way to determine the dynamics of the system.
- As a way to determine the ways in which selfish reinforcement agents will behave.
- As a measure of the global utility.
- As the learning function.
There is no such thing as a "learning function".
Let the utility of N agents attending El Farol Bar be at any
night be U(N), let S(N) be the number of agents that attended at
night N, and let n be the night that agent i attended. What is
the wonderful life reward for i?
- U(S(n)) - U(S(n) - 1)
- U(S(n-1))
- U(S(n))- S(U(n))
- U(n) - U(S(n))
- U(S(n))
In the COIN experiments it was clear that the wonderful life utility performed the best. Which one came in second?
- Global reward
- Uniform Division reward
- Skewed reward.
- Exponential decay reward.
- Aristocratic utility reward.
In the El Farol Bar problem in some cases (depending one which reward function was used) the optimal allocation was never reached. This was because:
- The agents kept choosing actions which frustrated each other.
- The problem was not constrained-aligned.
- The problem was not well-factored.
- There was not enough time.
- Agents were making stochastic decisions.
In the COIN experiments with the leader-follower problem the
performance was greatly improved by using macrolearning. In this
case macrolearning had the effect of:
- Placing many leaders and their followers in the same subworld.
- Increasing the utility that the followers received.
- Clamping down the wonderful life utility to a different value.
- Using the aristocratic utility.
- Changing the learning algorithm used by all the agents.