Tuesday, December 13, 2005

Game Theory, and why Hobbes was wrong

Hobbes’ Thesis
The philosopher Hobbes has been shocking people for centuries with his beautifully-reasoned arguments that Man requires a Supreme Leader – to whom all one’s rights and individuality must be delegated.

Failure to do this will, says Hobbs, leave humans in the ‘State of Nature’ – acted upon only by instinctive forces. Such a life Hobbes famously describes as ‘nasty, brutish and short’. He equates it to a state of war, in which every man is essentially – prefiguring Darwin - in conflict with every other individual for resources like food, shelter, partners and other basic needs.

The solution to this, as proposed by Hobbes, is a King. Of course, monarchs were the natural solution in force at the time of writing, and at that time a ferocious and often bloody debate was raging over the relative supremacy of Parliament and Monarchy, and Hobbes believed that the latter was the proper way forward.

The King was an individual to whom the populace collectively surrendered all their rights and authority, and who then wielded that collective power on their behalf. In the process he acted to remove the conflict between individuals, resulting in a state of Peace – in contrast to the uncontrolled state of war that, thought Hobbes, Nature made inevitable.

It’s an extraordinary thesis, and hugely compelling. I’m extremely impressed by Hobbes’ ability to cut through sentiment and emotions and see society’s raw dynamics so clearly. I think he’s wrong, though. Humans in the natural state are not inevitably brutal to each other.

He couldn’t have known this, as the key information has only recently become clear.

Game Theory and Trust
The Prisoner’s Dilemma is a classic piece of Game Theory. It’s a thought experiment that pits two hypothetical people against each other, both faced with a situation in which they actions of the other – which they can’t control – will affect their future.

They may each choose one of two possible avenues, but it’s the choice of the other that determines whether they have chosen correctly, and the outcome of that choice.

This scenario is normally presented thus:

Two suspects A, B are arrested by the police. The police have insufficient evidence for a conviction, and having separated both prisoners, visit each of them and offer the same deal: if one testifies for the prosecution (defects) against the other and the other remains silent, the silent accomplice receives the full 10-year sentence and the betrayer goes free. If both stay silent (cooperate), the police can only give both prisoners 1 year for a minor charge. If both betray each other (defect), they receive a 5-year sentence each.

This may seem at first both complicated and spurious, but in fact it represents in microcosm the sort of choices that we face every day. We constantly have to assess cost/risk/benefit in almost everything we do: each time we purchase a chocolate bar from a corner store we trust that the storekeeper is selling genuine merchandise and not ripping us off. The storekeeper in turn trusts that our money is not counterfeit.

In more complex transactions, especially high-value purchases, we routinely take greater risks: we trust that the car we buy will not fall apart once it leaves the showroom. The seller trusts that our ability to pay the large sums is genuine. We hand over money in the expectation of future goods. We hand over goods in the expectation of future payment. We constantly trust strangers not to rob, injure or cheat us, even when they could get away with it. Why do we take these risks?

The Iterated Prisoner’s Dilemma
A single Prisoner’s Dilemma ‘game’ is not particularly enlightening. It really only allows one to contemplate the problem of working with others, and the issue of trust. Things become more interesting when repeated games are played, and previous behaviour can be used to estimate the likely future actions of one’s partner in crime: if they have consistently defected in previous games, they’re likely to do it again.

With repeated iterations of the game, it becomes possible to develop a strategy for winning (i.e. minimising one’s jail time). A strategy is essentially an algorithm – a set of rules that provide responses to specific events.

An Iterated Prisoner’s Dilemma Algorithm (IPD) really has only one input – the actions of the other player over the previous games. The sequence of the other player’s actions can be analysed in various ways in order to decide the algorithm’s single output: whether to Defect (turn the player’s partner in) or Co-operate (remain silent).

Game theorists spent a long time developing and testing algorithms for IPD, and pitting them against others: simple ones that respond with the minimum of thinking; algorithms with complex statistical analyses of the accumulated data; heuristic algorithms using neural nets to learn from their mistakes and successes.

And after a great number of experiments, a clear winner emerged: a strategy that consistently outperformed all the others, no matter which opponent it was partnered with – from complex algorithms to random responders. And what was it that this spectacular world-beater did, that swept all before it?

In plain language, the algorithm – called ‘Tit-For-Tat’ by the developers - was ‘Do unto your partner whatever they last did to you’.

That’s it. No analysis, no heuristics, no statistics – just handing back whatever is dished out to you. And Tit-For-Tat beats everything else.

The implications of this to the real world are hard to be certain of. It appears to indicate that over-analysing is not all that wise, and that past performance is an unreliable guide for prediction, no matter how you cut it.

Changing the Context
Tit-For-Tat is the champion when pitted against any other algorithms. So what happens when everyone is playing TFT? Interesting things, as it turns out.

If the background environment consists entirely of Tit-For-Tat opponents/partners, another algorithm scores even better. Once again, it’s not one of the statistical analysts. This algorithm has the name ‘Forgive Once’. Its operation is very similar to TFT, with one exception: as its name suggests, it will forgive (by ignoring it) a single defection by its partner. Apart from this, it will feed back what’s done to it, just like TFT.

And the trend continues: in an environment consisting entirely of Forgive Once partners, a new algorithm called ‘Forgive Twice’ turns out to be the winner. Work out what that one does. And it doesn’t stop there.

What inferences can we draw from this?

  • Although the mechanisms involved in the Prisoner’s Dilemma are simple and numerical, they map well onto real-world situations.

  • People don’t usually make arithmetic calculations when assessing risk/benefit. They go with ‘what feels right’. But those feelings are the result of instinct and experience, and tend to conform accurately to the calculated optimum.

  • Iterated PD games themselves map well onto everyday real-world experiences, in which an individual must constantly engage in cost/risk/benefit transactions with others. So ubiquitous are these mini-transactions that we tend not to notice them.

  • Many such transactions don’t involve the transfer of material goods, and the interactions take place using non-vocal communication. Being highly verbal animals, we tend not to notice the rich non-verbal interchanges that we carry out continuously with others.

  • Some measure of how important such interchanges are can be gleaned from the difficulties that arise when the communication channels are interrupted or blocked: as in cars, where tempers often flare simply because the appropriate permission-seeking and -granting behaviour is not present.

If the IPD is accepted as a valid model for human behaviour, the fundamental first lesson is that cooperation has a survival advantage.

In the early days of humanity, punishment for infractions of any kind tended to be utterly disproportionate to the actual damage done: agonising death was routinely handed out as a penalty for minor crimes. This could be described as Massive Retaliation.

A breakthrough occurred in the legal Code of Hammurabi, which introduced lex talionis – the principle of retribution referred to as ‘an eye for an eye’, which states that a punishment should exact upon the perpetrator the same level of injury experienced by the victim. This principle is still in use today in some formal judicial processes, and in some less formal social interactions.

Lex Talionis bears a strong functional resemblance to Tit-For-Tat. As a more efficient system – more satisfying for punishee and neutral observer; less so emotionally for victim, but undeniably fairer – than Massive Retaliation, Lex Talionis became widely adopted.

As we have seen, though, in an environment where all participants are playing Tit-For-Tat, a still more efficient algorithm – and therefore modus vivendi – becomes possible.

The Old Testament of the Christian Bible is in part a narrative of the rise of Lex Talionis from the previous state of massive retaliation. This natural progression is at the core of Hobbes’ error: he was unaware of the natural processes that make fairer systems of justice arise, overtaking the bestial ‘State of Nature’ he envisaged.

The Biblical narrative continues in the New Testament. In many ways it was an idea whose time had come, but Christianity was - in the ancient middle eastern world at any rate - in the right place to put forward the idea, and has been unfairly credited with inventing it. The idea was the Forgive Once algorithm.

It was expressed as ‘turn the other cheek’, and it’s worth noting that, despite this USP, Christianity remained a small and relatively unsuccessful desert sect for centuries until Emperor Constantine and his mum, Helena Augusta, suddenly decided to renounce the far more interesting Roman pantheon. The rest is history.

It’s possible that the ‘Forgive Once’ aspect of the new religion may have appealed to a Constantine wearied by the excesses of war. At any rate, it came with the territory, and the fact that the new algorithm is more efficient made its success inevitable.

Our present-day society operates on a variety of ‘Forgive n’ algorithms. Formal transactions tend to be based on Tit-For-Tat (which can be considered to be ‘Forgive Zero’). More ad hoc transactions – especially between friends, and between regular transactors – will involve multiple ‘forgivenesses’. These may be offerings in the expectation of future profit – as in the case of presents to good customers; or effectively symmetrical, as in the case of a temporary loan. Regularly, acts of pure altruism take place, in which not only is there no benefit to the initiator, he must actually lose out: each time someone allows a car from a side-street into the queue, they perform a small sacrifice with no possibility of direct recompense. Why do they do it?

The short answer is that they do it because in the long run such behaviour carries survival benefits, and so selection pressure favours it. The overall trend toward increasing trust continues as each new level is reached: Forgive(1) is overtaken by Forgive(2) and so on, until the population is dominated by increasingly generous and trusting people.

(Note that this treads perilously close to Group Selection, but avoids the problem: TFT is immediately beneficial to the individual, and tends to spread. F(1) has benefits in the TFT environment that results, and so on.)

If the point hasn’t already been banged home enough – which it probably has: Hobbes’ Natural State turns out to be a lot less nasty and brutish than he anticipated. In the absence of an all-powerful monarch or other supreme authority, Man is nevertheless able to adopt a fair, trusting and mutually-beneficial modus vivendi for peaceful interaction with his fellows. This type of morality is neither God-given nor imposed – we get it for nothing because of its survival advantage, and the IPD shows how.

Flies in the Ointment
Of course, in such a fair and trust-based environment, the occasional rogue individual is able to make considerable headway at others’ expense. His rampage is usually limited, however: along with the tendency to trust, humans have an acute ability to detect cheating and falsehood, and good mechanisms for branding the Cheater. The society will tolerate a limited number of truly amoral people, especially the true psychopath with no moral scruples, willing to go to any length to disguise his true nature (in contrast, the average crook is often limited by his own guilty conscience).

A significant social problem is that, ironically, laws tend to be framed in the context of the normal, moral person, and may be weak in dealing with truly 'evil' people except when their transgressions are suitably profound – as in multiple murderers and so on. In this situation, the psychopath may be able to live comfortably within the law, despite causing great harm to others.

A mechanism for detecting such people, and removing them from society - even deleting them from it entirely - would benefit all of society, much more than punishment of those who merely break the letter of the law.

No comments: