AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response Against Stationary Opponents

Vidal's library

Title:	AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response Against Stationary Opponents
Author:	Vincent Conitzer and Tuomas Sandholm
Journal:	Machine Learning
Year:	2006
DOI:	10.1007/s10994-006-0143-1
Abstract:	Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games, assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

@Article{conitzer06a,
  author =	 {Vincent Conitzer and Tuomas Sandholm},
  title =	 {{AWESOME}: A General Multiagent Learning Algorithm
                  that Converges in Self-Play and Learns a Best
                  Response Against Stationary Opponents},
  journal =	 {Machine Learning},
  year =	 2006,
  abstract =	 {Two minimal requirements for a satisfactory
                  multiagent learning algorithm are that it 1. learns
                  to play optimally against stationary opponents and
                  2. converges to a Nash equilibrium in self-play. The
                  previous algorithm that has come closest, WoLF-IGA,
                  has been proven to have these two properties in
                  2-player 2-action (repeated) games, assuming that
                  the opponent's mixed strategy is observable. Another
                  algorithm, ReDVaLeR (which was introduced after the
                  algorithm described in this paper), achieves the two
                  properties in games with arbitrary numbers of
                  actions and players, but still requires that the
                  opponents' mixed strategies are observable. In this
                  paper we present AWESOME, the first algorithm that
                  is guaranteed to have the two properties in games
                  with arbitrary numbers of actions and players. It is
                  still the only algorithm that does so while only
                  relying on observing the other players' actual
                  actions (not their mixed strategies). It also learns
                  to play optimally against opponents that eventually
                  become stationary. The basic idea behind AWESOME
                  (Adapt When Everybody is Stationary, Otherwise Move
                  to Equilibrium) is to try to adapt to the others'
                  strategies when they appear stationary, but
                  otherwise to retreat to a precomputed equilibrium
                  strategy. We provide experimental results that
                  suggest that AWESOME converges fast in practice. The
                  techniques used to prove the properties of AWESOME
                  are fundamentally different from those used for
                  previous algorithms, and may help in analyzing
                  future multiagent learning algorithms as well.},
  doi = 	 {10.1007/s10994-006-0143-1},
  url = 	 {http://jmvidal.cse.sc.edu/library/conitzer06a.pdf}
}

Last modified: Wed Mar 9 10:16:36 EST 2011