Convergence to Pareto Optimality in General Sum Games via Learning Opponent's Preference

Vidal's library

Title:	Convergence to Pareto Optimality in General Sum Games via Learning Opponent's Preference
Author:	Dipyaman Banerjee and Sandip Sen
Book Tittle:	Working Notes of the AAMAS Workshop on Adaptation and Learning in Autonomous Agents and Multiagent Systems
Year:	2006
Abstract:	We consider the learning problem faced by two self-interested agents playing any general-sum game repeatedly where the opponent payoff is unknown. The concept of Nash Equilibrium in repeated games provides us an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. However, such a strategy can sometimes lead to a Pareto-dominated outcome for the repeated game. Our goal is to design learning strategies that converge to a Pareto-efcient outcome that also produces a Nash Equilibrium payoff for repeated two player n-action general-sum games. We present a learning algorithm, POSNEL, which learns opponent's preference structure and produces, under self-play, Nash equilibrium payoffs in the limit in all such games. We also show that such learning will generate Pareto-optimal payoffs in a large majority of games. We derive a probability bound for convergence to Nash Equilibrium payoff and experimentally demonstrate convergence to Pareto optimality for all structurally distinct 2-player 2-action conict games. We also compare our algorithm with existing algorithms such as WOLF-IGA and JAL and showed that POSNEL on average, outperforms both the algorithms.

@InProceedings{banerjee06a,
  author =	 {Dipyaman Banerjee and Sandip Sen},
  title =	 {Convergence to Pareto Optimality in General Sum
                  Games via Learning Opponent's Preference},
  booktitle =	 {Working Notes of the {AAMAS} Workshop on Adaptation
                  and Learning in Autonomous Agents and Multiagent
                  Systems},
  year =	 2006,
  abstract =	 {We consider the learning problem faced by two
                  self-interested agents playing any general-sum game
                  repeatedly where the opponent payoff is unknown. The
                  concept of Nash Equilibrium in repeated games
                  provides us an individually rational solution for
                  playing such games and can be achieved by playing
                  the Nash Equilibrium strategy for the single-shot
                  game in every iteration. However, such a strategy
                  can sometimes lead to a Pareto-dominated outcome for
                  the repeated game. Our goal is to design learning
                  strategies that converge to a Pareto-efcient outcome
                  that also produces a Nash Equilibrium payoff for
                  repeated two player n-action general-sum games. We
                  present a learning algorithm, POSNEL, which learns
                  opponent's preference structure and produces, under
                  self-play, Nash equilibrium payoffs in the limit in
                  all such games. We also show that such learning will
                  generate Pareto-optimal payoffs in a large majority
                  of games. We derive a probability bound for
                  convergence to Nash Equilibrium payoff and
                  experimentally demonstrate convergence to Pareto
                  optimality for all structurally distinct 2-player
                  2-action conict games. We also compare our algorithm
                  with existing algorithms such as WOLF-IGA and JAL
                  and showed that POSNEL on average, outperforms both
                  the algorithms.},
  url = 	 {http://jmvidal.cse.sc.edu/library/banerjee06a.pdf},
  keywords = 	 {multiagent learning game-theory}
}

Last modified: Wed Mar 9 10:16:35 EST 2011