Hierarchical MultiAgent Reinforcement Learning

Vidal's library

Title:	Hierarchical MultiAgent Reinforcement Learning
Author:	Rajbala Makar, Sridhar Mahadevan, and Mohammad Ghavamzadeh
Book Tittle:	Proceedings of the Fifth International Conference on Autonomous Agents
Pages:	246--253
Year:	2001
Abstract:	In this paper we investigate the use of hierarchical rein- forcement learning to speed up the acquisition of cooper- ative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the high- est level(s) of the hierarchy are congured to represent the joint task-action space among multiple agents. In this ap- proach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (prim- itive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primitive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including at multi-agent, single agent using MAXQ, selsh multiple agents using MAXQ (where each agent acts inde- pendently without communicating with the other agents), as well as several well-known AGV heuristics like "rst come rst serve", "highest queue rst" and "nearest station rst". We also compare the tradeos in learning speed vs. perfor- mance of modeling joint action values at multiple levels in the MAXQ hierarchy.

Cited by 1 - Google Scholar

@InProceedings{makar01a,
  author =	 {Rajbala Makar and Sridhar Mahadevan and Mohammad
                  Ghavamzadeh},
  title =	 {Hierarchical MultiAgent Reinforcement Learning},
  booktitle =	 {Proceedings of the Fifth International Conference on
                  Autonomous Agents},
  pages =	 {246--253},
  year =	 2001,
  abstract =	 {In this paper we investigate the use of hierarchical
                  rein- forcement learning to speed up the acquisition
                  of cooper- ative multi-agent tasks. We extend the
                  MAXQ framework to the multi-agent case. Each agent
                  uses the same MAXQ hierarchy to decompose a task
                  into sub-tasks. Learning is decentralized, with each
                  agent learning three interrelated skills: how to
                  perform subtasks, which order to do them in, and how
                  to coordinate with other agents. Coordination skills
                  among agents are learned by using joint actions at
                  the highest level(s) of the hierarchy. The Q nodes
                  at the high- est level(s) of the hierarchy are
                  congured to represent the joint task-action space
                  among multiple agents. In this ap- proach, each
                  agent only knows what other agents are doing at the
                  level of sub-tasks, and is unaware of lower level
                  (prim- itive) actions. This hierarchical approach
                  allows agents to learn coordination faster by
                  sharing information at the level of sub-tasks,
                  rather than attempting to learn coordination taking
                  into account primitive joint state-action values. We
                  apply this hierarchical multi-agent reinforcement
                  learning algorithm to a complex AGV scheduling task
                  and compare its performance and speed with other
                  learning approaches, including at multi-agent,
                  single agent using MAXQ, selsh multiple agents using
                  MAXQ (where each agent acts inde- pendently without
                  communicating with the other agents), as well as
                  several well-known AGV heuristics like "rst come rst
                  serve", "highest queue rst" and "nearest station
                  rst". We also compare the tradeos in learning speed
                  vs. perfor- mance of modeling joint action values at
                  multiple levels in the MAXQ hierarchy.},
  keywords =     {multiagent reinforcement learning},
  url = 	 {http://jmvidal.cse.sc.edu/library/makar01a.pdf},
  cluster = 	 {2242243449876899562}
}

Last modified: Wed Mar 9 10:15:16 EST 2011