Paul the octopus

Recommended Posts

Hi everybody,

you might have heard of an octopus who correctly predicted the winners of some matches in the Football World Cup.

I think it was 7 matches he guessed right.

I was discussing this with a friend last Sunday, and I was rather amazed when he told me there was nothing special in the octopus' achievement: he could have just guessed them right by chance.

And his explanation was that if you toss a coin 7 times, any sequence of heads/tails has the same probability of happening: we only pay special attention to the one with all tails or all heads.

I later examined the problem in detail. As you all know, there are 128 possible sequences, and in fact it's true that any individual sequence (where order matters) has exactly 1/128 chances to happen.

On the other hand, if you believe the octopus acted completely at random, you should explain how he picked the right sequence out of a set of 128 possible sequences, most of which contain a mix of wrong/right. He only had about 0.8% chances to pick this one, as opposed to 99.2% chances of failing at least one prediction, and still he did pick it!

So now I'm left with this fundamental doubt about probability.

Say your experiment is 'toss a coin 7 times'. If you repeat the experiment N times, on average you expect to find 7 heads (or 7 tails) N/128 times. But if you only do the experiment once, can you reach any conclusion about the coin being fair (or in our case, the octopus being psychic) based on the likelihood of the sequence you observe? Or do you always need to repeat the experiment at least a few times?

Is there any statistic test one can use to determine that?

I'm sure this is a very simple problem for mathematicians, but I do find probability concepts puzzling at times.

Thanks

L.

Share on other sites
Say your experiment is 'toss a coin 7 times'. If you repeat the experiment N times, on average you expect to find 7 heads (or 7 tails) N/128 times. But if you only do the experiment once, can you reach any conclusion about the coin being fair (or in our case, the octopus being psychic) based on the likelihood of the sequence you observe? Or do you always need to repeat the experiment at least a few times?

Is there any statistic test one can use to determine that?

That sort of testing testing is well, most of what statistical inference is, actually. Likelihood functions (amongst other things, but likelyhood is easy to work with), help form confidence intervals and hypothesis tests. So that for a given amount of data, you can say with some given percentage of confidence that the coin is fair, the octopus is working on random or otherwise.

Things with a probability of less then 128, do happen - but somehow I doubt that the octopus wasn't trained in some way.

Share on other sites

A repetition of the experiment (i.e. a repetition of a set of seven throws) is a typical case of multiple hypothesis testing. As you surmised correctly, if you repeat it enough it is bound to happen by chance (even if the coin is not biased), depending on the accepted p (with p giving the type I error probability, i.e. rejection of the null hypothesis that the coin is not biased).

The correct way (excluding adjustments for now) would be to increase N, i.e. the number of throws.

Now to test that you could e.g. use the chi-square test, using the null hypothesis that the coin is unbiased. The critical value, at which point you can reject the null hypothesis with a given p is depending on sample size.

Quick example: let us say we use n=7 (7 throws)

The expected (e) outcome is 3.5 head (h) and 3.5 tail (t) according to our null hypothesis. The observed (o) outcome is 7 h and 0 t.

Then $\chi ^{2}=\frac{\left (oh-eh \right )^{2}}{eh}+\frac{\left (ot-et \right )^{2}}{et}=7$ However the low sample size may force us to use the Yates correction, reducing the the $\chi ^{2}$ to 5.14.

With one degree of freedom the chi square distribution table tells us that we can reject the null with p <0.01 or 0.025, depending whether we used the corrected or uncorrected value.

So if you did 400 sets of these throws, you would expect to get these results either once (with p=0.025) or four times.

The problem with the chi square is that it requires a sufficiently high sample size. Here, a power analysis can be conducted to calculate the necessary sample size required for a given statistical power.

Share on other sites

Thank you both for your replies.

I seem to understand that there is quite strong evidence that the octopus was not acting randomly.

I also found this on Wikipedia, which seems to be very close to my example:

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Example_2_-_Clairvoyant_Card_Game

This octopus remains a mystery to me.

Share on other sites

There are a lot of alternative explanations. It may have reacted to the onlookers, was trained to grab from the German flag decided once against it, confused the flags, or simply liked the color better. Considering the intelligence of these animals I would not be surprised if the reactions of the watchers played a big role, though.

Share on other sites

Dotted round the world there are probably more than 128 pets (not all of them octopodes) which are being asked to predict the outcome of the matches.

We didn't hear about the ones who got it wrong.

Share on other sites

Dotted round the world there are probably more than 128 pets (not all of them octopodes) which are being asked to predict the outcome of the matches.

We didn't hear about the ones who got it wrong.

We need a People for the Ethical Treatment of Gambling Pets organization.

Share on other sites

There are a lot of alternative explanations. It may have reacted to the onlookers, was trained to grab from the German flag decided once against it, confused the flags, or simply liked the color better. Considering the intelligence of these animals I would not be surprised if the reactions of the watchers played a big role, though.

Hi CharonY,

Even if we assume the octopus was 'steered' or influenced in some way to choose the right box, this still leaves an important question: who knew the outcomes of the matches before they had been played?

Unless this octopus is actually a ruthless businessman, who fixed the matches in advance to secure himself a career in show-business.

Share on other sites

Not necessarily. Right now we are calculating the likelihood in terms of how many correct answers he chose in terms of game outcomes.

However, the real decision is most certainly remote from the game itself. What the octopus did was choosing from two containers, and that outcome happened to match the game outcome.

Let us assume the flag in the background had an influence. Paul chose the German flag five times and other flags twice. If we group the answers according to that, we got 5-2 instead of 7-0. Here it appears that there is a bias towards the German flag. Thus an interpretation could be that we was supposed to choose the German flag, but messed up (or was not interested) in two sets.

Share on other sites

OK, I got it now.

I hadn't considered that the two containers were not identical, because they had flags attached to them. This automatically invalidates the assumption of equal probability of the two outcomes, thus ruling out any comparison with coin flipping.

If someone had good reasons to believe that Germany was likely to win most matches, and, as you suggested, had trained Paul to go for the German flag, the game was already biased from the start.

So no miracle here. That's a relief.

Share on other sites

Also. Would this have hit the media if it hadn't have happened? How many other people were trying to achieve the same predictions who did not make it to the news?

Share on other sites

Fair enough. But Paul wasn't reported on after he had predicted all of the games successfully, he was reported on after a few, and continued to be a success. He still had to make three or four correct predictions while the press was watching.

Share on other sites

Fair enough. But Paul wasn't reported on after he had predicted all of the games successfully, he was reported on after a few, and continued to be a success. He still had to make three or four correct predictions while the press was watching.

The odds aren't all that bad on that, even assuming a maximum entropy (equal probability)

Share on other sites

For four correct prediction (assuming an equal distribution of right an answers as null) it barely passes an alpha of 5%. For three it would be below that.

Create an account

Register a new account