During the past two semesters I have taught a graduate-level course on multiagent systems. The course uses the RoboCup platform as a way to give the students hands-on experience on the design and implementation of a sophisticated multiagent system. As part of the course, we read several papers that describe the implementation of successful RoboCup teams. As such, I had already read a few of the papers from the author of this book, but I still lacked a complete and cohesive picture of his RoboCup team. It was thus with great interest that I read this book, hoping to get a better understanding of the various technologies used to build the agents and of the way in which these technologies are fused together to form a coherent team of agents. I was not disappointed.
The jacket of this volume informs us that the book makes four main contributions: a description of an architecture that allows agents to take on roles to form teams dynamically, a layered learning approach to building systems, a team-partitioned opaque-transition reinforcement-learning algorithm, and a fully functioning RoboCup team which incorporates these techniques. The book summarizes Stone's work on the CMUnited RoboCup teams, which won the 1998 and 1999 competitions in the simulated league. The simulated league is composed of software agents that communicate with a soccerserver program. The soccerserver maintains the state of the world. The agents' communications with the server simulate the interface between an agent and its sensors and effectors. The book focuses of the CMUnited-98 and CMUnited-99 simulated league teams, although there is some discussion of a robot team.
The book is composed of ten chapters and four appendices. The appendices give implementation details about the CMUnited team. These appendices are especially valuable for those wishing to implement their own teams. Chapter one is the introduction. It motivates the development of CMUnited by pointing to the lack of implemented multiagent systems that operate in domains that are real-time, noisy, collaborative , and adversarial. Chapter two explains how the soccerserver works and provides low-level details of the physics it implements. The chapter concludes with a quick overview of the CMUnited real robots team.
Chapter three presents the team architecture, one of the main contributions of the book. The team architecture consists of an agent architecture plus some added support for teamwork. Each agent has: a world state that maintains the agent's beliefs about the current state of the world; an internal state; a locker-room agreement that defines the teamwork structure the agent uses; a set of internal behaviors that update the agent's internal state based on the current state, the world state, and the locker-room agreement; and a set of external behaviors that determine the action the agent will take given the internal state and the world state. The action taken by the external behaviors is also fed into a predictor, which updates the world state to simulate the predicted effects of the action.
Behaviors are defined as condition/action pairs, where the action can be either an atomic action or another behavior. The condition part of a behavior is over the input values given to that behavior, which vary depending on whether it is an internal or external behavior, as explained above. Each behavior is implemented as a function containing a long series of if-then rules, some of which call on other behaviors as part of their action. As such, behaviors form an invocation hierarchy. There is one one top-level internal behavior and one top-level external behavior.
The teamwork structure consists of a set of agent roles, communication protocols, formations, and multi-step multiagent plans for execution in specific situations. A formation is an assignment of roles to agents. The agents use communications to inform each other of the role that each is taking. Since messages can be lost, it is possible that agents will have inconsistent beliefs about which roles others are playing or even which formation they are using. This situation is handled by the creation of robust behaviors that do not depend upon having correct up-to-date knowledge of other agents' roles. Examples of such behaviors are given for the robotic soccer domain, but the book does not provide a general approach for designing robust behaviors that fail gracefully when messages are lost.
The architecture's flow of control starts with the arrival of an input from the soccerserver at the end of a simulator cycle. This input is used to update the world state. The top-level internal behavior is executed and it updates the internal state. Afterwards, a long set of if-then rules is executed whose purpose is to set the player-mode. The player-mode is one of the variables examined by the condition part of the external behaviors. The top-level external behavior is then executed, which generates the agent's action for this cycle.
The book does not make it easy for the reader to understand this flow of control or how the roles and behaviors interact with each other. Since roles, behaviors, states, and player-modes are all explained in separate sections, the reader often has to go back and forth between chapters in order to determine how these variables affect the agent's ultimate behavior. Still, after some careful reading, I was able to get a clear understanding of the major aspects of the system, but I am still unclear on some of the details. For example, the relationship between a player-mode and an internal state is still unclear to me. I would have also appreciated a more detailed diagram of the whole architecture, as well as a control-flow diagram indicating when roles, modes, and states are changed and when actions are taken.
Chapter four describes the layered learning methodology. This methodology calls for the use of machine learning techniques to learn lower-level behaviors, which are then used as building blocks for learning higher-level behaviors. This chapter formalizes this approach and provides some general principles for its application. A specific example of how to use layered learning is given in the succeeding chapters.
Chapter five describes how the agents learned a ball-interception skill using neural networks. This skill forms one of the lower-layers in the layered learning hierarchy of the CMUnited team. Chapter six describes how the pass-evaluation skill was learned using decision trees. This skill tells a player the likelihood that a pass to a given teammate will be successful given the current state of the world. The skill forms a higher layer since the agents that learned it had already learned the ball-interception skill. That is, the players first learn to intercept the ball, then they learn the probability that a given pass will succeed given that all the players use the learned ball-interception skill.
Layered learning is a useful technique for domains with large state spaces that cannot be learned directly, but where the designer has some deeper knowledge about the problem domain that can reduce the space into a set of more manageable lower-level skills. The technique, therefore, presumes that the designer is able to determine which are the appropriate lower-level skills for a particular domain, a sometimes difficult task. For example, the author identifies ball-interception as the most essential low-level skill for his RoboCup agents. Another designer less familiar with the domain might have chosen dribbling or kicking as the most essential skill in robotic soccer. It is not known how much harder it would be to build a winning robocup team based on the wrong low-level skill, or how to determine that the chosen skill is the wrong one.
A third learned layer is presented on Chapter seven. This layer is learned using the author's team-partitioned, opaque-transition reinforcement-learning (TPOT-RL) method. TPOT-RL extends reinforcement learning by: partitioning the value (reward) function among the team members, using action-dependent features to create a small feature space, and giving an immediate rewards to states that would otherwise get a long-term discounted reward propagated back from a goal state. The TPOT-RL algorithm is formalized and used for learning pass selection: deciding which of the teammates, if any, should receive a pass.
The last three chapters each deal with the competition results, related work, and the conclusion. The competition results summarize CMUnited's outstanding performance in the various RoboCup competitions and make the point that the quality of game-play, for all teams, continues to increase with each successive tournament. The related work chapter effectively places this work within the context of ongoing multiagent and machine learning research. Finally, the conclusion summarizes the contributions of the book and examines possible future directions for this research.
In summary, the book presents a thorough overview of the design and implementation of the CMUnited team. It is an excellent case-study on the process of building a complex, almost real-time, multiagent system that needs to handle realistically noisy input. The author carries us through the long process of building such a system, introducing, as he goes along, the technologies he had to use or invent in order to solve each new problem. The layered learning paradigm and the TPOT-RL learning algorithm are two new tools that agent builders would be wise to add to their tool-box. The book also explains why each particular technique was chosen over other possible candidates, at times showing the experiments that were carried out to determine which of the competing technologies works best in the Robocup domain.
As such, the book's title is misleading. It is its subtitle "A Winning Approach to Robotic Soccer" that best captures the essence of the book. The book only spends a few pages explaining the theory behind layered learning at an abstract level. Most of the material on this subject is presented only with examples of how it was used in the CMUnited team and a reader hoping for a full text on the techniques and ideas behind layered learning will be disappointed. On the other hand, a reader that is looking for a detailed description of the implementation of a complex multiagent system, which happens to use machine learning as a central component, will be amply rewarded.
Jose M. Vidal Last modified: Mon Sep 3 14:16:13 EDT 2001