Prisoners dilemma

From Wikipedia

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

The prisoner's dilemma is a non-zero-sum game that demonstrates a conflict between rational individual behavior and the benefits of cooperation in certain situations. The classical prisoner's dilemma is as follows:

Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and having separated them, visit each of them and offer the same deal: If you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If you both stay silent, all we can do is give you both 6 months for a minor charge. If you both confess, you each get 5 years. Each prisoner individually reasons like this: Either my accomplice confessed or he did not. If he did, and I remain silent, I get 10 years, while if I confess I only get 5. If he remained silent, then by confessing I go free, while by remaining silent I get 6 months. In either case, it is better for me if I confess. Since each of them reasons the same way, both confess, and get 5 years. But although each followed what seemed to be rational argument to achieve the best result, if they had instead both remained silent, they would only have served 6 months.

This illustrates that if the two had been able to communicate and cooperate, they would have been able to achieve a result better than what each could achieve alone. There are many situations in the social sciences that have this same property, so understanding these situations and how to resolve them is important. In particular, it demonstrates that there are situations where naïve reasoning in one's own best interest does not always lead to a best result.

In Robert Axelrod's book The Evolution of Cooperation (1984), Axelrod explored a variant scenario he called the "Iterated Prisoner's Dilemma", where the participants have to choose their mutual strategy again and again, and have memory of their previous encounters. The example he used was two people meeting and exchanging closed bags, with the understanding that one of them contains money, and the other contains an item being bought. Either player can choose to honor the deal by putting into his bag what he agreed, or he can defect by handing over an empty bag. This mirrors the prisoner's dilemma: for each participant individually, it is better to defect, though both would be better off if both cooperated. Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, "greedy" strategies tended to do very poorly in the long run while more "altruistic" strategies did better, as judged purely by self-interest. He used this to show a possible mechanism to explain what had previously been a difficult hole in Darwinian theory: how can apparently altruistic behavior evolve from the purely selfish mechanisms of natural selection?

The best deterministic strategy was found to be "Tit for Tat". It is very simple: cooperate on the first iteration of the game. After that, do what your opponent did on the previous move. A slightly better strategy is "Tit for Tat with forgiveness". When your opponent defects, on the next move you sometimes cooperate anyway with small probability (around 1%-5%). This allows for occasionaly recovery from getting trapped in a cycle of defections. The exact probability depends on the lineup of opponents. "Tit for Tat with forgiveness" is best when miscommunication is introduced to the game. That means that sometimes your move is incorrectly reported to your opponent: you cooperate but your opponent hears that you defected.

Another important non zero-sum game type is called Chicken. This is named after the car racing game. Two cars drive towards each other for an apparent head-on collision. Each player can swerve to avoid the crash (cooperate) or keep going (defect). The main difference between Chicken and the Prisoner's Dilemma is that in both defecting is the worst possible outcome (hence an unstable equilibrium), but in the Prisoner's Dilemma the worst possible outcome is cooperating while the other person defects (so both defecting is a stable equilibrium). In both games, "both cooperate" is an unstable equilibrium. A typical payoff matrix for Chicken is:

If both players cooperate, they each get +5. If you cooperate and the other guy defects, you get +1 and they get +10. If both defect, they each get -20.

See also: John Nash, Nash equilibrium

/Talk