IJCAI-99 Workshop on
Agents Learning About, From and With other Agents
2 August 1999, Stockholm, Sweden
|Important Dates||Schedule||Topics||Organizing Committee||Programme Committee||Proceedings|
Coordination of the activities of multiple agents, whether selfish or cooperative, is essential for the viability of any system in which multiple agents must coexist. Learning and adaptation are invaluable mechanisms by which agents can evolve coordination strategies that meet the demands of the environments and the requirements of individual agents.
Researchers in machine learning and adaptive systems have been addressing issues concerned with learning and adapting from past experience, observation, failures, etc. Whereas most of this research has focused on techniques for acquisition and effective use of problem solving knowledge from the viewpoint of a single autonomous agent, a few recent investigations have opened the possibility of application of some of these techniques in multiagent settings. Most of these recent results, however, use existing learning techniques to show that individual agents can respond to the uncertainties inherent in the environment and/or uncertainties imposed by the behavior of other agents.
The goal of this workshop is to focus on research that will address unique requirements for agents learning and adapting to work alongside other agents. Recognizing the applicability and limitations of current machine learning research as applied to multiagent problems as well as developing new learning and adaptation mechanisms particularly targeted to this class of problems will be of particular relevance to this workshop.
We focus on three different ways in which machine learning can be used within a Multi-Agent System. An agent can learn about other agents in order to compete and/or cooperate with them. An agent can learn from other agents, taking advantage of their experiences and incorporating these into its knowledge base. Finally, an agent can learn with (alongside) other agents---sharing, interfering, or helping them as it learns.
We would particularly welcome new insights into these problems from other related disciplines and thus would like to emphasize the inter-disciplinary nature of the workshop. Among others, papers of the following kind are welcome:
The workshop is open to all members of the AI community but the number of participants will be limited. Participants will be selected by the committee based on the quality of their submitted papers. Those wishing to attend without submitting a paper are welcomed to send a one page abstract stating their interests as they relate to the workshop. All participants must register for the main IJCAI conference.
- Benefits of adaptive/learning agents over agents with fixed behavior in multiagent problems.
- Evaluation of the effectiveness of individual learning strategies (e.g., case-based, explanation-based, inductive), or multistrategy combinations, in the context of multiagent problems.
- Characterization of learning and adaptation methods in terms of modeling power, communication abilities, knowledge requirement, processing abilities of individual agents.
- Developing learning and adaptation strategies, or reward structures, for environments with cooperative agents, selfish agents, partially cooperative (will cooperate only if individual goals are not sacrificed) and for environments that can contain mixture of these types of agents.
- Analyzing and constructing algorithms that guarantee convergence and stability of group behavior.
- Analyzing effects of knowledge acquisition mechanism on responsiveness of agents or groups to addition/deletion of other agents from the environment.
- Study of adaptive behavior in team games, where one group of cooperative agents are pitted against another group of cooperative agents.
- Inter-disciplinary research on multi-agent learning and adaptation (including, but not limited to, research in organizational theory, psychology, sociology, and economics).
- Co-evolving multiple agents with similar/opposing interests.
- Investigation of teacher-student relationships among agents.
|Submission deadline:||15 March 1999|
|Notification of acceptance:||19 April 1999|
|Deadline for requests for participation:||30 June 1999|
|Camera ready copy and author registration due:||17 May 1999|
|Workshop:||2 August 1999|
- 8:30-9:00 Esma Aïmeur and Sarita Bassil. A mixed initiative for teaching and learning html in intelligent tutoring systems.
- 9:00-9:30 Tucker Balch. Reward and diversity in multirobot foraging.
- 9:30-10:00 Michael Schillo and Petra Funk. Learning from and about other agents in terms of social metaphors.
- 10:00-10:30 Break
- 10:30-11:00 Manisha Mundhe and Sandip Sen. Evaluating concurrent reinforcement learners.
- 11:00-11:30 Bob Price and Craig Boutilier. Implicit imitation in multiagent reinforcement learning.
- 11:30-12:00 Michael Rovatsos and Jürgen Lind. Learning cooperation in repeated games.
- 12:00-12:30 Anish Biswas, Sandip Debnath, and Sandip Sen. Believing others: Pros and cons.
- 12:30-2:00 Lunch
- 2:00-2:30Peter Stone and Manuela Veloso. Layered learning.
- 2:30-3:00 Dicky Suryadi and Piotr J. Gmytrasiewicz. Learning models of other agents using influence diagrams.
- 3:00-3:30 Keiki Takadama, Takao Terano, Katsunory Shimohara, Koichi Hori, and Shinichi Nakasuka. Can multiagents learn in organization? analyzing organizational-learning oriented classifier systems.
- 3:30-4:00 Coffee break
- 4:00-4:30 Gerald Tesauro. Pricing in agent economies using neural networks and multi-agent Q-learning.
- 5:30--IJCAI Opening Ceremony
Department of Mathematical & Computer Sciences
University of Tulsa,
600 South College Avenue,
Tulsa, OK 74104-3189.
José M. Vidal
Electrical and Computer Engineering
Swearingen Engineering Center
University of South Carolina
Columbia, SC 29208-0001
11 references, last updated Mon May 24 10:13:45 1999
Our goal with respect to a student learning process is threefold: we want him to learn from his mistakes, we want him to learn from others' mistakes, and we do not want him to repeat his own mistakes. For this purpose, a learning strategy called Double Test Learning (DTL) involving three agents, two simulated pedagogical agents (the tutor and the classmate) and one real agent (the learner), was elaborated recently. The DTL strategy has been implemented in an intelligent tutoring system called HITS designed to teach HTML. In our implementation of the DTL strategy, the classmate receives the same training as the learner and both have the same level of knowledge. Once the training is completed, the tutor will then test the classmate (Post-Test 1). When the classmate finishes the Post-Test 1, a Revision phase is granted to the learner, where he can view the notes he took on his agenda during Post-Test 1. The tutor then turns to the human learner and Post-Test 2 is started. During this phase, the learner only has access to his memory and the knowledge that he recently acquired through the classmate's answers. The most important point to emphasize is that the learner benefits from the classmate's mistakes.
This research seeks to quantify the impact of the choice of reward function on behavioral diversity in learning robot teams. The methodology developed for this work has been applied to multirobot foraging, soccer and cooperative movement. This paper focuses specifically on results in multirobot foraging. In these experiments three types of reward are used with Q-learning to train a multirobot team to forage: a local performance-based reward, a global performance-based reward, and a heuristic strategy referred to as shaped reinforcement. Local strategies provide each agent a specific reward according to its own behavior, while global rewards provide all the agents on the team the same reward simultaneously. Shaped reinforcement provides a heuristic reward for an agent's action given its situation. The experiments indicate that local performance-based rewards and shaped reinforcement generate statistically similar results: they both provide the best performance and the least diversity. Finally, learned policies are demonstrated on a team of Nomadic Technologies' Nomad150 robots.
In open environments there is no central control over agent behaviors. On the contrary, agents in such systems can be assumed to be primarily driven by self interests. Under the assumption that agents remain in the system for significant time periods, or that the agent composition changes only slowly, we have previously presented a prescriptive strategy for promoting and sustaining cooperation among self-interested agents. The adaptive, probabilistic policy we have prescribed promotes reciprocative cooperation shown to improve both individual and group performance in the long run. In the short run, however, selfish agents could exploit reciprocative agents. In this paper, we evaluate the hypothesis that the exploitative tendencies of selfish agents can be effectively curbed if reciprocative agents share their ``opinions'' of other agents. Since the true nature of agents are not known a priori and is learned from experience, believing others can also pose other hazards. We provide a learned trust-based evaluation function that is shown to resist both individual and concerted deception on the part of selfish agents.
Assumptions underlying the convergence proofs of Reinforcement learning (RL) algorithms like Q-learning are violated when multiple interacting agents adapt their strategies on-line as a result of learning. Empirical investigations in several domains, however, have produced encouraging results. We systematically evaluate the convergence behavior of concurrent reinforcement learning agents using game matrices of varying complexity as studied by Claus and Boutilier [ Claus and Boutilier, 1998 ]. Variants of simple RL algorithms are evaluated for convergence under relative prominence of global optima, feedback noise, scale up of game matrix size, and game matrix characteristics. Our results show surprising departures from that observed by Claus and Boutilier, particular for larger problem sizes. We present an analysis and explanation of the experimental results that provides insight into the nature of convergence of these concurrent learners. We identify the variants of the algorithms studied as recursive modeling agents. This allows us to suggest more effective learning agents with deeper levels of nesting of beliefs about other agents. We also discuss the effect of greedy and non-greedy strategies on the modeling agents. Our results show that the greedy strategy turns out to be better for higher level modeling agents.
Imitation is actively studied as an effective means of learning in multi-agent environments. It allows an agent to learn how to act optimally by observing the actions of cooperative teachers or more experienced agents. We propose a straightforward imitation mechanism called "model extraction" that can be integrated easily into standard model-based reinforcement learning algorithms. Roughly, by observing a mentor with similar capabilities, an agent can extract information about its own capabilities in unvisited parts of the state space. The extracted information can accelerate learning dramatically. We illustrate the benefits of model extraction by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability, possible interactions and common abilities, we briefly comment on extensions to the model that relax these.
In the field of multi-agent systems, the study of coordination, cooperation and collaboration assumes a prominent position. Most of the research concerned with these issues concentrates on explicit negotiation between agents, on the investigation of settings in which global system goals have to be balanced with agents' individual goals or on the exploitation of real-world knowledge to determine efficient coordination strategies. We present a social learning and reasoning component as part of a layered learning agent architecture for iterated multi-player games, which is capable of implementing cooperative behavior in societies of purely ``selfish'' agents. This can be accomplished by learning about other agents' preferences, and by finding out how valuable other agents' actions are for the agent's own success. We claim that this is possible without any a priori knowledge of the underlying payoff matrices and without explicit communication between agents, and first experiments yield promising results.
We present work that has been conducted in a sociological and psychological context. Our aim is to establish a mechanism that enables agents to cope with environments that contain both selfish and co-operative entities, where the mixture and the behavior of these entities is previously unknown to all agents. We achieve this by enabling agents to evaluate trust in others, based upon the observations they gather. Using trust, they are able to request observations from others and make use of this possibly manipulated data, therefore enlarging their data base on behavior of others. Our approach results in significantly faster and better behavior adaptation. We demonstrate the improvement in performance of agents using trust compared to the performance of others that just use their own observations.
This paper presents ``layered learning,'' a hierarchical machine learning paradigm. Layered learning applies to tasks for which learning a direct mapping from inputs to outputs is in principle intractable with existing learning algorithms. Given a hierarchical task decomposition, layered learning seamlessly integrates separate learning at each layer. The learning of each subtask layer directly facilitates the learning of the next higher subtask layer by determining at least one of three of its components: (i) the set of training examples; (ii) the input representation; and/or (iii) the output representation. We introduce layered learning in its domain-independent general form. We then present a full implementation in a complex domain, namely simulated robotic soccer.
We adopt decision theory as a descriptive paradigm to model rational agents. We use influence diagrams as a modeling representation of agents, which is used to interact with them and to predict their behavior. In this paper, we provide a framework that an agent can use to learn the models of other agents in a multi-agent system (MAS) based on their observed behavior. Since the correct model is usually not known with certainty our agents maintain a number of possible models and assign a probability to each of them being correct. When none of the available models is likely to be correct, we modify one of them to better account for the observed behaviors. The modification refines the parameters of the influence diagram used to model the other agent's capabilities, preferences, or beliefs. The modified model is then allowed to compete with the other models and the probability assigned to it being correct can be arrived at based on how well it predicts the behaviors of the other agent already observed.
Organizational-learning oriented Classifier System (OCS) is an extension of Learning Classifier Systems (LCSs) to multiagent enviroments with introducing the concepts of organizational learning (OL) in organization and management science. Unlike the conventional multiagent systems in the literature, which utilize specific and elaborate techniques, OCS integrates four mechanisms from multi-strategic standpoints. This paper investigates the performance of OCS from the viewpoint of OL and then compares it with conventional LCSs. Intensive experiments on a complex scalable domain have revealed that (1) the integration of four learning mechanisms in OL is effective in solution and computational cost; (2) OCS finds good solutions at less computational cost in comparison with conventional LCSs.
This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. This paper studies simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. Despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, such convergence is also found even at large discount parameters. Furthermore, the Q-derived policies increase profitability and damp out or eliminate cyclic price ``wars'' compared to simpler policies based on zero lookahed or short-term lookahead. The use of function approximators (neural nets) instead of lookup tables is also investigated; preliminary findings indicate that reasonably good policies can be obtained even though the absolute accuracy of the function approximation may be poor.