Reaching Pareto Optimality in Prisoner's Dilemma Using Conditional Joint Action Learning

Vidal's library

Title:	Reaching Pareto Optimality in Prisoner's Dilemma Using Conditional Joint Action Learning
Author:	Dipyaman Banerjee and Sandip Sen
Book Tittle:	Working Notes of the AAAI Workshop on Multiagent Learning
Year:	2005
Abstract:	We consider a repeated Prisoner's Dilemma game where two independent learning agents play against each other. We assume that the players can observe each others' action but are oblivious to the payoff received by the other player. Multiagent learning literature has provided mechanisms that allow agents to converge to Nash Equilibrium. In this paper we define a special class of learner called a conditional joint action learner (CJAL) who attempts to learn the conditional probability of an action taken by the other given its own action and uses it to decide its next course of action. We prove that when played against itself, if the payoff structure of Prisoner's Dilemma game satisfies certain conditions, using a limited exploration technique these agents can actually learn to converge to the Pareto optimal solution that dominates the Nash Equilibrium, while maintaining individual rationality. We analytically derive the conditions for which such a phenomenon can occur and have shown experimental results to support our claim.

@InProceedings{banerjee05a,
  author =	 {Dipyaman Banerjee and Sandip Sen},
  title =	 {Reaching Pareto Optimality in Prisoner's Dilemma
                  Using Conditional Joint Action Learning},
  booktitle =	 {Working Notes of the {AAAI} Workshop on Multiagent
                  Learning},
  year =	 2005,
  abstract =	 {We consider a repeated Prisoner's Dilemma game
                  where two independent learning agents play against
                  each other. We assume that the players can observe
                  each others' action but are oblivious to the payoff
                  received by the other player. Multiagent learning
                  literature has provided mechanisms that allow agents
                  to converge to Nash Equilibrium. In this paper we
                  define a special class of learner called a
                  conditional joint action learner (CJAL) who attempts
                  to learn the conditional probability of an action
                  taken by the other given its own action and uses it
                  to decide its next course of action. We prove that
                  when played against itself, if the payoff structure
                  of Prisoner's Dilemma game satisfies certain
                  conditions, using a limited exploration technique
                  these agents can actually learn to converge to the
                  Pareto optimal solution that dominates the Nash
                  Equilibrium, while maintaining individual
                  rationality. We analytically derive the conditions
                  for which such a phenomenon can occur and have shown
                  experimental results to support our claim.},
  url = 	 {http://jmvidal.cse.sc.edu/library/banerjee05a.pdf},
  keywords = 	 {multiagent learning game-theory}
}

Last modified: Wed Mar 9 10:16:29 EST 2011