A classic example of game theory.

Two criminal partners are captured and interrogated seperately. If one confesses and the other stays silent, the confesser will get off scot free and the tight-lipped one will get a heavy sentence. If neither confess, they each get a light sentence. If both confess, they both get a heavy sentence.

What would you do?

The strategies involved in this game, when analysed mathematically, are complex and fascinating.

Assuming both prisoners are rational: Suppose if one prisoner confesses and the other stays silent, the confessor will be sentenced to 0 months in prison, and the other will get 9 months in prison. If they both confess, they both serve 6 months. If they both stay silent, they both serve one month.

If one suspect is going to confess, the other would prefer to confess and stay in jail for 6 months rather than stay silent and be in jail for 9 months. If one suspect is going to stay silent, then the other would prefer to confess, and be released immediately rather than be silent and stay in jail for one month.

For some, this goes against common sense, as the solution which has the most greater good is when both stay silent, as the total jail time would be minimized, while the situation that occurs when both confess has the most total jail time.

This dilemma helps illustrate the practical applications of game theory to problems like the arms race or oligopolistic competition.
In the late 1970's, political scientist Robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation. Contestants in the tournament submitted computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied.

The winner of the contest was tit for tat: Clam up on the first round. On the next rounds, always do what the other prisoner did on the previous round. How's that for trust and cooperation.

A later tentative of Perl Journal promoting a contest on a three-way Prisoner's Dilemma was met with a somewhat of a failure: The winner program was aided by 'sucker' programs, that simply confessed all the time while it clammed up.

Book: Axelrod, The Evolution of Cooperation.

Link: http://itknowledge.com/tpj/contest-tpd.html

Play it safe solution to the Prisoner's Dilemma.

Suppose A and B are prisoners in the situation described above, and that neither is confident of their ability to predict whether the other will confess or not based on her previous decisions (because A and B are unprofessional, so they did not get their stories straight beforehand). Thus, assume that each prisoner confesses at random (with a weighted coin, say) such that s/he confesses a certain percentage of the time (this is A's "strategy" over a large number of "games"). We wish to find a strategy for A that will ensure him a moderate expected value regardless of B's decisions.

First, some definitions:

  • Let a := the outcome if A confesses and B does not. ( 0 in the above writeup by Jennifer).
  • Let b := the outcome if A does not confess and B does. ( -9 above).
  • Let c := the outcome if both confess. ( -6 above).
  • Let d := the outcome if both do not confess. ( -1 above).
  • Note that a,b and c are negative because I like to think of positive as good (who knows why :).
  • Let p := the probability that A confesses (we want to find this... ).
  • Let q := the probability that B confesses (...such that this doesn't matter).

Then, the function E:[0,1]X[0,1] -> R defined by

E(p,q) := p(1 - q) a + q(1 - p) b + pq c + (1 - p)(1 - q) d
        = (a - d) p + (b - d) q + (c - b + d - a) pq
        = (a - d) p + ((b - d) + (c - b + d - a) p) q + d

returns A's expected outcome.

Now we observe that if

((b - d) + (c - b + d - a) p) = 0
, then the q term above vanishes, which is exactly what we wanted. So
         (d - b)
p = ----------------- 
    (c - a) + (d - b)

            1
  = ----------------- .
    (c - a)     
    ------- +    1 
    (d - b)

Thus (c-a)/(d-b) >= 0, since otherwise, p would not be in [0,1], so we would have no play safe solution. (We don't worry that d=b here, because if so, p=0 is a valid solution)
Thus we have a solution iff ((c>=a) and (d>b)) or ((c<=a) and (d<b)).
Note: An alternative approach that leads to the same result is to take the partial derivative of E with respect to q, set it equal to zero and solve for p. This approach works because E is linear in q, so that dE/dq is constant with respect to q.

Some practical discussion:
Practically speaking, we can strengthen this condition:
d<b implies that A gets a heavier sentence if both do not confess (so there is not as much proof that they did it) than if A does not confess and B turns him in; thus, the second "or" term above cannot be satisfied, so we must have:
(c>=a) and (d>b)
We have just seen that d>b. Now we can assume that if A confesses, his/her sentence will be no less if B does not confess than if B does so c<=a (it could be more, because A's sentence may be shortened since s/he was more cooperative than B). So the only practical case with a solution is where A has nothing to lose by confessing. In this case, p=1, as our intuition tells us. The expected outcome is then:
E(1,q) = (a - d) 1 + ((b - d) + (c - a + d - b) p) q + d = a - d + (0)q + d = a
Of course, this case is pretty trivial, since c=a removes the "dilemma". But isn't it nice to know that the math supports it?

An interesting thought experiment in group versus individual rationality. Two co-conspirators are arrested and questioned separately (no communication allowed!) about a crime they have committed. Each has the choice to snitch on the other (defect), or to keep quiet (cooperate; this is from the point of view of the prisoners, not the authorities). No matter what your fellow prisoner does, it is to your advantage to defect; in game-theoretical terms, this is a dominant strategy for both players. However, this behaviour (always defecting) leads to worse payoffs for everyone than if everyone had cooperated. An example is:

  • If you snitch (defect) and your comrade keeps mum (cooperates), e gets five years and you get off with a slap on the wrist.
  • If your comrade snitches and you keep mum, you get five years and e gets off scot-free.
  • If you both keep quiet, you each get one year.
  • If you both snitch, you each get three years.

Supposing your fellow prisoner keeps quiet, snitching gets you zero years in prison as opposed to one. Supposing your fellow prisoner is going to rat on you, ratting on em gets you three years as opposed to five. Thus, assuming you have no way of collaborating, it is in your best interest to defect; it will always improve your outcome, no matter what your opponent choses. However, if both players follow the obvious strategy, the results (three years each in prison) will be worse than if each player had remained silent. In game-theoretical terms, this happens because the prisoner's dilemma game is at a Nash equilibrium when both players defect. However, the single Nash equilibrium is not a pareto optimal outcome. That is, it is possible to provide a solution that decreases or leaves fixed everyone's jail time (that is, always cooperating). This is what makes it a `dilemma'.

The tragedy of the commons, a related game, there are two Nash equilibria (always defect and `minimally effective cooperation'), the latter of which is pareto optimal. The problem here is that, in real-world situations, it is usually impossible to immediately tell when the point of `minimally effective cooperation' has been reached.

Also related is Wolf's dilemma, which involves any number of people. Each person may choose to press or not to press a button. If no one pushes the button, everyone gets $1000 (insert your own amount here). If anyone presses the button, then everyone who presses the button will get $10, and everyone else will get $0. Transitional Man discusses a form of the Prisoner's dilemma which uses a similar payoff matrix. Unlike the original (Flood, Dresher, and Tucker) prisoner's dilemma, there is not necessarily a dominant strategy here---the optimal strategy, according to Nash equilibria, is often a mixed strategy, with weights depending on the payoff values. This still does not yield a pareto optimal outcome, so the dilemma still exists.

Finally, there are single-player games related to the prisoner's dilemma. One example is: you are connected to a pain-causing electrical circuit and are given a control to vary the current; the amount of pain is directly proportional to the current. You begin at zero, and every day have the option to either increase the current by 1 microamp (an indetectable amount) and receive $1000; or leave the current where it is and receive $0. On any single day, it makes sense to accept the $1000---the increase in pain is unnoticeable. However, in the long run, continuing to do this will eventually lead to an unbearable amount of pain---which many people would trade their entire fortune to eliminate. This is similar to the tragedy of the commons, with the participants being yourself at different points of time. Minimally effective cooperation (pareto optimal and a Nash equilibrium) here is to find the highest current level that is indefinitely bearable, receive the money (defect) until you reach that level, and keep the current constant (cooperate) after that. As with the tragedy of the commons, the problem is in realising and stopping when you reach that level.

For a site with mathematical and historical information on the prisoner's dilemma and variants such as the tragedy of the commons, see http://plato.stanford.edu/entries/prisoner-dilemma/ .

The Prisoner's Dilemma is often used in the teaching of Political Science to illustrate one reason why so many outcomes of real political systems are bad or flawed.

To correct Neil, a true Prisoner's Dilemma demands that an optimal solution requires all parties to co-operate. As an analogy, consider a murder where the only witnesses are other criminals. If all crooks keep silent they go free. But if even one of them talks, the others get the chair while the confessor gets to cop a plea, which lets him off with a light sentence.

Now, if the crooks could be sure that the other criminals would remain silent, they would co-operate and all would go free. The old Mafia omerta went a long way toward guaranteeing it in the old days, but serves no longer. And the cops know this, so they separate the prisoners and question them separately. In isolation, each criminal has to wonder if Lefty will sell them out. He has time to think about the consequences, which his interrogators will cheerfully remind him of. He is afraid, he does not trust. It is hard to trust under such circumstances.

This models many real life political situations, particularly in international relations. This is because we cannot read minds. Nations spend oodles of money trying to gain intelligence on what another country is doing, and most importantly, what they are thinking. Success is limited, even with agents, who may be doubles. And that is with nations we trust, even if the trust is imperfect. Only in a Tom Clancy novel is intelligence information so cut and dried.

Now consider two adversaries, seething with justifiable anger : Israel and the Palestinians. While both sides have their extremists in powerful positions, the rank and file in both communities wants peace, even if they disagree on the details. In order to gain peace, and the economic and poltical advantages it brings, they must co-operate. If Israel leaves its borders open, then it is more vulneralbe to Hamas suicide bombers. If Israel closes its borders -- which it often does-- then many Palestinians cannot work. That leads to additional poverty and resentment, feeding the hatred and helping Hamas recruit suicide bombers. If Israel could trust the Palestinian Authority to stop the suicide bombers, they could open their borders. That would build co-operation, and thus improve long term chances for peace. But they cannot, so defect.

From the Palestinian perspective, matters look similar. If they remain quiet, Israeli right-wingers will ignore them, and accelerate Jewish settlement construction. This process leads to many Palestinians being booted out of their homes, building resentment, and creating new obstacles to peace, that must be negotiated away. But if Palestinians choose violence, that strengthens the hand of Israeli extremists, and makes them less willing to negotiate. In essence, Palestinians do not trust Israel to negotiate in good faith. But the only action they see certain to catch Israeli attention leads to tit for tat that reduces the desire to make peace on both sides. Either way, both sides are hung on twin horns of the same dilemma. They must co-operate to gain peace. They cannot trust enough to sustain that co-operation.

Both sides are caught in the Prisoner's Dilemma. They are prisoners of their own history and mistrust. It is no comfort to know they are not the only people hung on this dilemma.

Zaratustra above mentioned the Prisoner's Dilemma computer contests held -- after the rounds, the highest score wins. But in game theory, this doesn't work out too well. The reason for this is that the principles of game theory are based on your payoffs being yours alone. You shouldn't even validly be able to compare yours to someone else's.

Exploiting this failure, Tit for Tat wins overall on a large scale. Why? Two reasons. One is because it never lets the other guy get _ahead_ by more than k (which is my variable for "the difference between the payoffs of the two prisoners when one confesses and the other defects"). The other reason is that it acquires its enormous point advantage from programs similar to it which are likely to cooperate.

In any case, all this is invalid when you realize that, theoretically, your only goal is to get the highest possible score, not to "beat" your opponent. You can have a matrix like this and it is still a "Prisoners' Dilemma:"

Non-comparative PD matrix
(Payoffs listed in Row, Column form)

           C         D
Coop     (3, 999)  (1, 9999)
Defect   (4, 9)    (2, 99)

(If you haven't seen payoffs before, this is called a payoff matrix. One player chooses one of the rows, the other player chooses a column, and you intersect them. The row player gets "paid" the first ordinate in the pair, and the column gets paid the second.)

Anyway, in this game, one player gets paid a lot more than the other, but that should mean nothing to either player. They're playing for abstract goodness, not comparative goodness. Even if you use a more "standard" PD matrix, you shouldn't compare them. I'll use ordinal payoffs to represent that: 1 is the worst outcome and 4 is the best, no matter what their actual value to you is.

Standard PD game matrix with ordinal payoffs

           C         D
Coop     (3, 3)    (1, 4)
Defect   (4, 1)    (2, 2)

This is a human nature thing -- we like to compete and we judge ourselves by what others do. One way to think of it is that the payoff is not an amount of money--but an "abstract happiness quotient". How happy are you with this outcome? Once "beating the other guy" is part of the "happiness quotient," the game is changed fundamentally.

Unfortunately, nobody has found a good solution to this. I don't believe there IS one, so we'll have to make do with what we have. It IS interesting, but doesn't fit with the original terms of the contract, so to speak.

See also chicken, w/u by caseyhb.

The first Iterated Prisoner's Dilemma competition was held under the direction of a political scientist, Robert Axelrod, and, as the above writeups describe, was won by the tit for tat approach developed by Anatol Rapoport. That system defended its title against a potentially stronger field in Axelrod's follow-up event, with 62 participants compared to an initial field of fourteen, then held its own in the following years, as it seemed to have an unassailable combination of co-operating where possible, but punishing where not. Yet in a twentieth-anniversary competition, this time run by computer scientist Graham Kendall, and featuring 223 entries, tit for tat couldn't claim even one of the top three places. All of those positions went instead to programs developed at Southampton University, which were able to recognise each other and then assumed master/slave roles that maximised the payoff to the master program- at the expense of the slaves, which sunk rapidly to the bottom of the performance table through their heroic sacrifice.

Some see this as simply cheating, others as ingenious "meta-gaming"- beating the rules rather than the opponents. Certainly it has limitations and could fail spectacularly in rival configurations of the Dilemma game; but with a bit of thought it becomes apparent that tit for tat could also be accused of gaming the system, and that the true point of running Prisoner's Dilemma games is to find insights into other tasks. The Southampton group was motivated by an interest in the development of co-operative behaviour within multi-agent systems, and tweaks to their approach should shed light on the extent to which self-sacrificing agents can help the system as a whole- or perhaps just a favoured group within it.

The remarkable feature of tit for tat was that it would, over the course of a competition, defeat the "always defect" approach, even though it would always lose to it in a one-on-one situation! Always defecting is known as an evolutionary stable strategy, meaning that in a population of such programs, any deviation away from the approach (by co-operating at some time) can only disadvantage you and further the ends of the others. Tit for tat is indeed guaranteed to lose points in the first round against a consistent defector, but from round two onwards against such a foe, it assumes the strategy itself, so the damage is minimised. In the initial 1984 competition, each program competed against all others in a round-robin fashion, with the twist that at some point it would play against a copy of itself. With sufficiently long iterations, this allows tit for tat to win even as the sole member of an otherwise constantly defecting group: since when it plays itself, it settles into a mutually beneficial pattern of co-operation that brings more points per round than mutual defection- and hence counteracts the few points lost in the opening play against defectors. Should other programs themselves be tit for tat, or any other strain which throws out the occasional run of co-operation, then the effect is magnified still further. So whilst the defectors never give away points, they miss opportunities to gain them- opportunities which tit for tat seized to climb the scoreboard to success. As LincolnQ observes above, "your only goal is to get the highest possible score, not to "beat" your opponent", and this tit for tat does admirably.

How then did the Southampton team out-play this decades old strategy? Tit for tat lets itself set up a virtuous circle of co-operation to score more points than the more vicious approach of mutual defection. But the Southampton programs made use of what makes the dilemma a dilemma in the first place- you can score even more than you would for co-operation, if only you could get the other player to co-operate whilst you defect. The 2004 rules allowed for more than one program to be submitted, so you could submit just a such a helper- the suicidal "always co-operate" method- to exploit, except all the other always-defect programs would get to feed upon it too. The solution was to have the programs use pre-determined sequences of co-operation and defection to act as a signature. On encountering an appropriate 5-10 turn sequence, the programs would assume their roles, with the slaves switching to constant co-operation to offer up bonus points to the always-defecting masters. If, however, the trigger sequence wasn't received from the opponent, the always-defect approach was activated, to minimise points available to the rival (which may have picked up some by defecting whilst the Southampton code had to co-operate as part of its opening dance). In this way, the masters would rack up more points than tit-for-tat, which would often be stuck with the paltry rewards of mutual defection.

The most interesting aspects of this approach are the issue of just what density of colluders is required in the population- it turned out that there were probably too many in the 2004 event; and whether it would be possible for other programs to spoof or manipulate the trigger sequences needed. In evolutionary games where code is mixed or programs are eliminated, it's possible for the slaves to be removed from the game, stranding the masters with the potential of being exploited during their now-useless setup phase, and then performing no better than always-defect after those steps. Nor can the fate of those slaves be entirely ignored, as their sacrifice could reduce the average performance per colluding program to below that of tit for tat, depending on the ratios present in the population.

The second round of Kendall's competition is scheduled for April 2005, and perhaps yet another level of refinement will emerge then, alongside answers to some of these issues. But even if not, this latest twist shows that even after two decades, the Prisoner's Dilemma remains both beautiful in its simplicity and intriguing in its implications.


References
  • I first heard about the Southampton result through a Slashdot article, and the discussion generated there doubtless had some influence on this writeup:
    http://slashdot.org/article.pl?sid=04/10/14/134202&tid=185
  • Wired article as linked by Slashdot:
    http://www.wired.com/news/culture/0,1284,65317,00.html
  • Press-release from the University:
    http://www.soton.ac.uk/Press/PressReleases/Name,4551,en.php
  • Background reading- chapter 1 of Five Golden Rules: Great Theories of 20th-Century Mathematics- and why they matter by John L. Casti.

Cooperation in the Prisoner's dilemma

As a game, the Prisoner's dilemma is deceptively simple- there are just two players, each choosing, once, to either cooperate or defect. But the question of whether cooperation can be ensured (for mutual benefit) is a fascinating one. The scenario is as follows:

Two prisoners are independently interrogated by the police for a crime they are guilty of, which they may individually either confess to or deny. The authorities lack firm evidence to convict either prisoner of this crime so they can only receive a minor jail sentence for previous misdemeanours if neither confesses. If both prisoners confess to the crime, then there is no doubt over their guilt and each receive the standard sentence. However, as an incentive to confess, the prisoners are told that if they confess whilst the other denies, then they will walk free for their honesty whilst the other is made an example of and receives a still harsher sentence.

In Bimatrix form, one version is

                                         Defect       Co-operate
                                    +--------------+--------------+
                                    |              |              |
                     Defect         |   1  , 1     |   4  ,  0    |
                                    |              |              |
                                    +--------------+--------------+
                                    |              |              |
                     Co-operate     |   0  , 4     |   3 ,  3     |
                                    |              |              |
                                    +--------------+--------------+

Where row one denotes a confession of guilt by Player 1 (in the standard terminology of the prisoner’s dilemma, to defect) and row two denotes a denial of guilt by Player 1 (referred to as the strategy of co-operation, in the sense that they co-operate with the other prisoner rather than the authorities). Similarly, the first column represents defection by Player 2, and the second cooperation by player 2.

The Prisoner’s dilemma is at the heart of many problems in social science. For instance, it can be recast in terms of economics as follows. Two companies may between them control the supply of a particular good, and are able to supply at either a high or low level. Were both to restrict supply by producing at a low level (mutual co-operation), then the price can be kept artificially high. There is insufficient demand in the market to justify high supply (mutual defection) by both companies, and thus if this occurs each will receive a smaller profit. However, restriction of supply by a single company will maintain the higher price point. Hence if one defects (high supply at high price) and the other co-operates (high price, but for a low supply) then the defecting company gains a significant market-share (and profit) advantage over the co-operating one, and so each company is motivated to defect.

In dominated strategy, it was shown that a non-cooperative analysis of the dilemma implies that the rational play is mutual defection, since to strive for a greater payoff by mutual cooperation incurs the risk of receiving no payoff at all. Nonetheless, there is considerable experimental evidence for people choosing to co-operate more than this game-theoretic analysis would predict. A recent experiment of this kind1 with university students revealed that defection rates of non-economics majors was under 40%. Whilst economics majors defected 60% of the time in the standard game, when given the opportunity to make (non-binding) deals with the other participants before play, both categories dropped to a defection rate of around 30%. Thus the question arises as to why it is possible for co-operative behaviour to emerge from a system where it is not a priori assumed by the players. If one seeks to use game theory to explain behaviour of individuals, these discrepancies between theory and practice must be resolved. Several interpretations are possible.

One way to resolve this problem is to argue that the mutual defection strategy is the correct response to the incorrect problem- that is, the payoff matrix presented is inaccurate in capturing the utility to each player; appropriate modifications will then yield a problem where mutual cooperation is the Nash equilibrium, thus explaining experimental observations.

Typically, such arguments relate to the notion of utility and introduce an additional cost to defection when the other player co-operates. Mechanisms ranging from guilt, inequality aversion or simple fear of repercussions from external authorities (be it the state or other criminals) have been advanced to explain this psychological difference between the objective payoff and subjective payoff to the defecting player.

A more interesting modification is to preserve the payoff matrix, but consider the relative merits of co-operation. The aversion to co-operation is due to the possibility of receiving a sucker payoff of 0 when the other player defects and you co-operate. Obviously for this to occur, one must play against an opponent who opts to defect. Earlier, rational behaviour was considered in terms of average expected outcome over a large number of plays. Since your payoff increases by 2 for mutual co-operation but only decreases by 1 when suckered by a defector, if you expect the probability of another player co-operating to be high enough, it may seem preferable to also co-operate, if only for purely selfish reasons.

We may attempt to formalise this as follows. We introduce a meta-game, whereby a player can assign themselves to one of two categories: category D, members of which always defect; or category C, where the strategy is to defect against players from category D, but to co-operate with others from category C. We can then consider whether it is rational to opt for category C or D. If the proportion of the population in category C is p, then a player from category D expects a payoff of 1 whereas one from category C expects 3×p + 1×(1 - p) = 1 + 2p ≥ 1; category C is preferable. This is unsurprising- if one can perfectly identify other co-operators, it pays to co-operate with them. But this analysis has performed a sleight-of-hand: if one can entirely trust other players to co-operate, then there was never any dilemma, since a co-operative game emerges. Without that trust, there is a category superior to C: consider a category C', whose members pretend to be of category C, then defect anyway. This ensures a return of 1 against other defectors, or other players from C', but gains a payoff of 4 rather than 3 from members of category C; who now must factor in the potential of a sucker payoff. In line with the assumptions of non-cooperative game theory, the ability to label oneself as being of a particular category is of no use without a guarantee of honesty.

Certainly then in the one-shot game, there is no avoiding the rationality of defection. An explanation for observed behaviour therefore must require further factors, such as the desire to establish trust for future benefit over a series of games. For this, we consider the iterated prisoner's dilemma.


1R. Frank, T. Gilovich and D. Regan (1993) ‘Does studying economics inhibit cooperation?’ Journal of Economic Perspectives Vol 7 Issue 2 (spring).


Part of A survey of game theory- see project homenode for details and links to the print version.

Log in or registerto write something here or to contact authors.