Prisoner's Dilemma

created by elsewere
(idea) by artfuldodger (5 y) (print)   (I like it!) 1 C! Thu Mar 16 2000 at 7:51:43
A classic example of game theory.

Two criminal partners are captured and interrogated seperately. If one confesses and the other stays silent, the confesser will get off scot free and the tight-lipped one will get a heavy sentence. If neither confess, they each get a light sentence. If both confess, they both get a heavy sentence.

What would you do?

The strategies involved in this game, when analysed mathematically, are complex and fascinating.

(idea) by Jennifer (1.1 y) (print)   (I like it!) 1 C! Wed Nov 08 2000 at 1:22:35
Assuming both prisoners are rational: Suppose if one prisoner confesses and the other stays silent, the confessor will be sentenced to 0 months in prison, and the other will get 9 months in prison. If they both confess, they both serve 6 months. If they both stay silent, they both serve one month.

If one suspect is going to confess, the other would prefer to confess and stay in jail for 6 months rather than stay silent and be in jail for 9 months. If one suspect is going to stay silent, then the other would prefer to confess, and be released immediately rather than be silent and stay in jail for one month.

For some, this goes against common sense, as the solution which has the most greater good is when both stay silent, as the total jail time would be minimized, while the situation that occurs when both confess has the most total jail time.

This dilemma helps illustrate the practical applications of game theory to problems like the arms race or oligopolistic competition.
(idea) by Zaratustra (4.9 y) (print)   (I like it!) Thu Nov 30 2000 at 3:34:52
In the late 1970's, political scientist Robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation. Contestants in the tournament submitted computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied.

The winner of the contest was tit for tat: Clam up on the first round. On the next rounds, always do what the other prisoner did on the previous round. How's that for trust and cooperation.

A later tentative of Perl Journal promoting a contest on a three-way Prisoner's Dilemma was met with a somewhat of a failure: The winner program was aided by 'sucker' programs, that simply confessed all the time while it clammed up.

Book: Axelrod, The Evolution of Cooperation.

Link: http://itknowledge.com/tpj/contest-tpd.html

(idea) by thax (2.7 y) (print)   (I like it!) 1 C! Tue Dec 12 2000 at 0:13:20
Play it safe solution to the Prisoner's Dilemma.

Suppose A and B are prisoners in the situation described above, and that neither is confident of their ability to predict whether the other will confess or not based on her previous decisions (because A and B are unprofessional, so they did not get their stories straight beforehand). Thus, assume that each prisoner confesses at random (with a weighted coin, say) such that s/he confesses a certain percentage of the time (this is A's "strategy" over a large number of "games"). We wish to find a strategy for A that will ensure him a moderate expected value regardless of B's decisions.

First, some definitions:

  • Let a := the outcome if A confesses and B does not. ( 0 in the above writeup by Jennifer).
  • Let b := the outcome if A does not confess and B does. ( -9 above).
  • Let c := the outcome if both confess. ( -6 above).
  • Let d := the outcome if both do not confess. ( -1 above).
  • Note that a,b and c are negative because I like to think of positive as good (who knows why :).
  • Let p := the probability that A confesses (we want to find this... ).
  • Let q := the probability that B confesses (...such that this doesn't matter).

Then, the function E:[0,1]X[0,1] -> R defined by

E(p,q) := p(1 - q) a + q(1 - p) b + pq c + (1 - p)(1 - q) d
        = (a - d) p + (b - d) q + (c - b + d - a) pq
        = (a - d) p + ((b - d) + (c - b + d - a) p) q + d

returns A's expected outcome.

Now we observe that if

((b - d) + (c - b + d - a) p) = 0
, then the q term above vanishes, which is exactly what we wanted. So
         (d - b)
p = ----------------- 
    (c - a) + (d - b)

            1
  = ----------------- .
    (c - a)     
    ------- +    1 
    (d - b)

Thus (c-a)/(d-b) >= 0, since otherwise, p would not be in [0,1], so we would have no play safe solution. (We don't worry that d=b here, because if so, p=0 is a valid solution)
Thus we have a solution iff ((c>=a) and (d>b)) or ((c<=a) and (d<b)).
Note: An alternative approach that leads to the same result is to take the partial derivative of E with respect to q, set it equal to zero and solve for p. This approach works because E is linear in q, so that dE/dq is constant with respect to q.

Some practical discussion:
Practically speaking, we can strengthen this condition:
d<b implies that A gets a heavier sentence if both do not confess (so there is not as much proof that they did it) than if A does not confess and B turns him in; thus, the second "or" term above cannot be satisfied, so we must have:
(c>=a) and (d>b)
We have just seen that d>b. Now we can assume that if A confesses, his/her sentence will be no less if B does not confess than if B does so c<=a (it could be more, because A's sentence may be shortened since s/he was more cooperative than B). So the only practical case with a solution is where A has nothing to lose by confessing. In this case, p=1, as our intuition tells us. The expected outcome is then:
E(1,q) = (a - d) 1 + ((b - d) + (c - a + d - b) p) q + d = a - d + (0)q + d = a
Of course, this case is pretty trivial, since c=a removes the "dilemma". But isn't it nice to know that the math supports it?

(idea) by neil (11.7 mon) (print)   (I like it!) 1 C! Fri Mar 30 2001 at 23:01:55

An interesting thought experiment in group versus individual rationality. Two co-conspirators are arrested and questioned separately (no communication allowed!) about a crime they have committed. Each has the choice to snitch on the other (defect), or to keep quiet (cooperate; this is from the point of view of the prisoners, not the authorities). No matter what your fellow prisoner does, it is to your advantage to defect; in game-theoretical terms, this is a dominant strategy for both players. However, this behaviour (always defecting) leads to worse payoffs for everyone than if everyone had cooperated. An example is:

  • If you snitch (defect) and your comrade keeps mum (cooperates), e gets five years and you get off with a slap on the wrist.
  • If your comrade snitches and you keep mum, you get five years and e gets off scot-free.
  • If you both keep quiet, you each get one year.
  • If you both snitch, you each get three years.

Supposing your fellow prisoner keeps quiet, snitching gets you zero years in prison as opposed to one. Supposing your fellow prisoner is going to rat on you, ratting on em gets you three years as opposed to five. Thus, assuming you have no way of collaborating, it is in your best interest to defect; it will always improve your outcome, no matter what your opponent choses. However, if both players follow the obvious strategy, the results (three years each in prison) will be worse than if each player had remained silent. In game-theoretical terms, this happens because the prisoner's dilemma game is at a Nash equilibrium when both players defect. However, the single Nash equilibrium is not a pareto optimal outcome. That is, it is possible to provide a solution that decreases or leaves fixed everyone's jail time (that is, always cooperating). This is what makes it a `dilemma'.

The tragedy of the commons, a related game, there are two Nash equilibria (always defect and `minimally effective cooperation'), the latter of which is pareto optimal. The problem here is that, in real-world situations, it is usually impossible to immediately tell when the point of `minimally effective cooperation' has been reached.

Also related is Wolf's dilemma, which involves any number of people. Each person may choose to press or not to press a button. If no one pushes the button, everyone gets $1000 (insert your own amount here). If anyone presses the button, then everyone who presses the button will get $10, and everyone else will get $0. Transitional Man discusses a form of the Prisoner's dilemma which uses a similar payoff matrix. Unlike the original (Flood, Dresher, and Tucker) prisoner's dilemma, there is not necessarily a dominant strategy here---the optimal strategy, according to Nash equilibria, is often a mixed strategy, with weights depending on the payoff values. This still does not yield a pareto optimal outcome, so the dilemma still exists.

Finally, there are single-player games related to the prisoner's dilemma. One example is: you are connected to a pain-causing electrical circuit and are given a control to vary the current; the amount of pain is directly proportional to the current. You begin at zero, and every day have the option to either increase the current by 1 microamp (an indetectable amount) and receive $1000; or leave the current where it is and receive $0. On any single day, it makes sense to accept the $1000---the increase in pain is unnoticeable. However, in the long run, continuing to do this will eventually lead to an unbearable amount of pain---which many people would trade their entire fortune to eliminate. This is similar to the tragedy of the commons, with the participants being yourself at different points of time. Minimally effective cooperation (pareto optimal and a Nash equilibrium) here is to find the highest current level that is indefinitely bearable, receive the money (defect) until you reach that level, and keep the current constant (cooperate) after that. As with the tragedy of the commons, the problem is in realising and stopping when you reach that level.

For a site with mathematical and historical information on the prisoner's dilemma and variants such as the tragedy of the commons, see http://plato.stanford.edu/entries/prisoner-dilemma/ .

(idea) by Transitional Man (3.1 hr) (print)   (I like it!) 2 C!s Wed Jan 09 2002 at 23:14:59
The Prisoner's Dilemma is often used in the teaching of Political Science to illustrate one reason why so many outcomes of real political systems are bad or flawed.

To correct Neil, a true Prisoner's Dilemma demands that an optimal solution requires all parties to co-operate. As an analogy, consider a murder where the only witnesses are other criminals. If all crooks keep silent they go free. But if even one of them talks, the others get the chair while the confessor gets to cop a plea, which lets him off with a light sentence.

Now, if the crooks could be sure that the other criminals would remain silent, they would co-operate and all would go free. The old Mafia omerta went a long way toward guaranteeing it in the old days, but serves no longer. And the cops know this, so they separate the prisoners and question them separately. In isolation, each criminal has to wonder if Lefty will sell them out. He has time to think about the consequences, which his interrogators will cheerfully remind him of. He is afraid, he does not trust. It is hard to trust under such circumstances.

This models many real life political situations, particularly in international relations. This is because we cannot read minds. Nations spend oodles of money trying to gain intelligence on what another country is doing, and most importantly, what they are thinking. Success is limited, even with agents, who may be doubles. And that is with nations we trust, even if the trust is imperfect. Only in a Tom Clancy novel is intelligence information so cut and dried.

Now consider two adversaries, seething with justifiable anger : Israel and the Palestinians. While both sides have their extremists in powerful positions, the rank and file in both communities wants peace, even if they disagree on the details. In order to gain peace, and the economic and poltical advantages it brings, they must co-operate. If Israel leaves its borders open, then it is more vulneralbe to Hamas suicide bombers. If Israel closes its borders -- which it often does-- then many Palestinians cannot work. That leads to additional poverty and resentment, feeding the hatred and helping Hamas recruit suicide bombers. If Israel could trust the Palestinian Authority to stop the suicide bombers, they could open their borders. That would build co-operation, and thus improve long term chances for peace. But they cannot, so defect.

From the Palestinian perspective, matters look similar. If they remain quiet, Israeli right-wingers will ignore them, and accelerate Jewish settlement construction. This process leads to many Palestinians being booted out of their homes, building resentment, and creating new obstacles to peace, that must be negotiated away. But if Palestinians choose violence, that strengthens the hand of Israeli extremists, and makes them less willing to negotiate. In essence, Palestinians do not trust Israel to negotiate in good faith. But the only action they see certain to catch Israeli attention leads to tit for tat that reduces the desire to make peace on both sides. Either way, both sides are hung on twin horns of the same dilemma. They must co-operate to gain peace. They cannot trust enough to sustain that co-operation.

Both sides are caught in the Prisoner's Dilemma. They are prisoners of their own history and mistrust. It is no comfort to know they are not the only people hung on this dilemma.

(idea) by LincolnQ (1.9 y) (print)   (I like it!) Fri Aug 22 2003 at 2:50:14

Zaratustra above mentioned the Prisoner's Dilemma computer contests held -- after the rounds, the highest score wins. But in game theory, this doesn't work out too well. The reason for this is that the principles of game theory are based on your payoffs being yours alone. You shouldn't even validly be able to compare yours to someone else's.

Exploiting this failure, Tit for Tat wins overall on a large scale. Why? Two reasons. One is because it never lets the other guy get _ahead_ by more than k (which is my variable for "the difference between the payoffs of the two prisoners when one confesses and the other defects"). The other reason is that it acquires its enormous point advantage from programs similar to it which are likely to cooperate.

In any case, all this is invalid when you realize that, theoretically, your only goal is to get the highest possible score, not to "beat" your opponent. You can have a matrix like this and it is still a "Prisoners' Dilemma:"

Non-comparative PD matrix
(Payoffs listed in Row, Column form)

           C         D
Coop     (3, 999)  (1, 9999)
Defect   (4, 9)    (2, 99)

(If you haven't seen payoffs before, this is called a payoff matrix. One player chooses one of the rows, the other player chooses a column, and you intersect them. The row player gets "paid" the first ordinate in the pair, and the column gets paid the second.)

Anyway, in this game, one player gets paid a lot more than the other, but that should mean nothing to either player. They're playing for abstract goodness, not comparative goodness. Even if you use a more "standard" PD matrix, you shouldn't compare them. I'll use ordinal payoffs to represent that: 1 is the worst outcome and 4 is the best, no matter what their actual value to you is.

Standard PD game matrix with ordinal payoffs

           C         D
Coop     (3, 3)    (1, 4)
Defect   (4, 1)    (2, 2)

This is a human nature thing -- we like to compete and we judge ourselves by what others do. One way to think of it is that the payoff is not an amount of money--but an "abstract happiness quotient". How happy are you with this outcome? Once "beating the other guy" is part of the "happiness quotient," the game is changed fundamentally.

Unfortunately, nobody has found a good solution to this. I don't believe there IS one, so we'll have to make do with what we have. It IS interesting, but doesn't fit with the original terms of the contract, so to speak.

See also chicken, w/u by caseyhb.

(idea) by Wntrmute (11.4 hr) (print)   (I like it!) 4 C!s Sun Oct 24 2004 at 19:15:04

The first Iterated Prisoner's Dilemma competition was held under the direction of a political scientist, Robert Axelrod, and, as the above writeups describe, was won by the tit for tat approach developed by Anatol Rapoport. That system defended its title against a potentially stronger field in Axelrod's follow-up event, with 62 participants compared to an initial field of fourteen, then held its own in the following years, as it seemed to have an unassailable combination of co-operating where possible, but punishing where not. Yet in a twentieth-anniversary competition, this time run by computer scientist Graham Kendall, and featuring 223 entries, tit for tat couldn't claim even one of the top three places. All of those positions went instead to programs developed at Southampton University, which were able to recognise each other and then assumed master/slave roles that maximised the payoff to the master program- at the expense of the slaves, which sunk rapidly to the bottom of the performance table through their heroic sacrifice.

Some see this as simply cheating, others as ingenious "meta-gaming"- beating the rules rather than the opponents. Certainly it has limitations and could fail spectacularly in rival configurations of the Dilemma game; but with a bit of thought it becomes apparent that tit for tat could also be accused of gaming the system, and that the true point of running Prisoner's Dilemma games is to find insights into other tasks. The Southampton group was motivated by an interest in the development of co-operative behaviour within multi-agent systems, and tweaks to their approach should shed light on the extent to which self-sacrificing agents can help the system as a whole- or perhaps just a favoured group within it.

The remarkable feature of tit for tat was that it would, over the course of a competition, defeat the "always defect" approach, even though it would always lose to it in a one-on-one situation! Always defecting is known as an evolutionary stable strategy, meaning that in a population of such programs, any deviation away from the approach (by co-operating at some time) can only disadvantage you and further the ends of the others. Tit for tat is indeed guaranteed to lose points in the first round against a consistent defector, but from round two onwards against such a foe, it assumes the strategy itself, so the damage is minimised. In the initial 1984 competition, each program competed against all others in a round-robin fashion, with the twist that at some point it would play against a copy of itself. With sufficiently long iterations, this allows tit for tat to win even as the sole member of an otherwise constantly defecting group: since when it plays itself, it settles into a mutually beneficial pattern of co-operation that brings more points per round than mutual defection- and hence counteracts the few points lost in the opening play against defectors. Should other programs themselves be tit for tat, or any other strain which throws out the occasional run of co-operation, then the effect is magnified still further. So whilst the defectors never give away points, they miss opportunities to gain them- opportunities which tit for tat seized to climb the scoreboard to success. As LincolnQ observes above, "your only goal is to get the highest possible score, not to "beat" your opponent", and this tit for tat does admirably.

How then did the Southampton team out-play this decades old strategy? Tit for tat lets itself set up a virtuous circle of co-operation to score more points than the more vicious approach of mutual defection. But the Southampton programs made use of what makes the dilemma a dilemma in the first place- you can score even more than you would for co-operation, if only you could get the other player to co-operate whilst you defect. The 2004 rules allowed for more than one program to be submitted, so you could submit just a such a helper- the suicidal "always co-operate" method- to exploit, except all the other always-defect programs would get to feed upon it too. The solution was to have the programs use pre-determined sequences of co-operation and defection to act as a signature. On encountering an appropriate 5-10 turn sequence, the programs would assume their roles, with the slaves switching to constant co-operation to offer up bonus points to the always-defecting masters. If, however, the trigger sequence wasn't received from the opponent, the always-defect approach was activated, to minimise points available to the rival (which may have picked up some by defecting whilst the Southampton code had to co-operate as part of its opening dance). In this way, the masters would rack up more points than tit-for-tat, which would often be stuck with the paltry rewards of mutual defection.

The most interesting aspects of this approach are the issue of just what density of colluders is required in the population- it turned out that there were probably too many in the 2004 event; and whether it would be possible for other programs to spoof or manipulate the trigger sequences needed. In evolutionary games where code is mixed or programs are eliminated, it's possible for the slaves to be removed from the game, stranding the masters with the potential of being exploited during their now-useless setup phase, and then performing no better than always-defect after those steps. Nor can the fate of those slaves be entirely ignored, as their sacrifice could reduce the average performance per colluding program to below that of tit for tat, depending on the ratios present in the population.

The second round of Kendall's competition is scheduled for April 2005, and perhaps yet another level of refinement will emerge then, alongside answers to some of these issues. But even if not, this latest twist shows that even after two decades, the Prisoner's Dilemma remains both beautiful in its simplicity and intriguing in i