Jump to content

Misuse? of statistics (was Navigation Ability)


Enthalpy

Recommended Posts

Hello you all!

The free video game Sea Hero Quest challenges the player's navigation ability
wikipedia
Many people played, so their performance enables statistics, the trick that lets scam look scientific.

One study found correlations with countries' GDP per capita and gender gap
biorxiv.org or hal.archives-ouvertes.fr or (restricted) sciencedirect.com
but I see only a correlation there. Strong counter-examples in the paper's map tell that the relation is weak: South Africa, Russia and Malaysia fare better than the richer Italy and Spain. So could there be better relationships?

Do countries of seafarers obtain better results? Not really. The Filippines, Indonesia, Portugal are missing in the top list, while the Swiss navigate rarely (to win the America's cup). I wish there were results from Pacific islands.

Far better: in the countries of rank 1 and 2 of 5, Vikings had settled, some over Britain. It would only need a second people in the Pacific to explain the scores of Korea, Taiwan and Malaysia.

Why shouldn't you propose your own correlation? There must be zillions that fit as nicely. How often people dance, how much iodine they ingest, whether their ancestors bred with Neanderthals or Denisovans, how expensive healthcare is, whether babies are breastfed, how good public transports are, how much alcohol people drink, how much magnetite ore the country has, how often people travel by plane, how high tides are, the proportion of astronomers, how much latex is consumed per capita...

I'm sure you find something. Have fun!
Marc Schaefer, aka Enthalpy

Link to comment
Share on other sites

  • 8 months later...

Statistics strike again...

The human ACE-1 enzyme is somehow linked with the evolution of Covid-19 and it has a known polymorphism, the D-allele.

Researchers took data from 25 European countries and compared the cases of Covid-19 per million habitants in the country wih the frequency of the D-allele. The raw graph is reproduced there
ncbi.nlm.nih.gov (supposedly copyrighted)

If I had obtained such a graph, I'd have said "no relationship" and switched to other thoughts. Admire the random variation by nearly 100 at identical allele frequency while the best fit tries to justify a variation by 10.

But the authors computed correlations (Spearman r=-0.510, p=0.01) to claim that the D-allele is a "co-factor". Papers of that kind give me mixed feelings about peer-review and research journals.

==========

This is a typical case where an obscure statistical method helps get published. What could be done differently here?

For instance, choose 100 polymorphisms whose frequency is known in this set of countries. Randomly, not even with a known relationship with Covid-19. Check how many among these 100 give a correlation as good (Spearman r=-0.510, p=0.01 if you like) as ACE-1.

Not bad neither: keep both data sets, but swap randomly the D-allele frequencies of the countries. Some swaps will give a far better correlation than the real data. But how many swaps are better, how many are worse?

As well, after an apparent correlation is found on one set of countries, check on a second set of countries if it holds. If possible, a second set not known to the researchers.

I also wish, and this isn't even statistics, that the cases of Covid-19 per million inhabitants were measured at a uniform delay after the first 100 cases were observed. Before and after the increase settles, if possible. This would reduce a big noise source.

==========

As we're seeking correlations, let's have a look at the distribution of Covid cases on the map of Germany, there
n-tv.de seek "Inzidenz nach Kreisen"

The distribution is extremely unequal, so what correlations could we find?

The mainly Catholic regions are hugely more hit than the mainly Protestant ones. That correlation works extremely better than the ACE-1 polymorphism. We must urgently replace hosts with regular bread, and ban holy water, as these certainly spread the virus. Alas, double-checking on Catholic Belgium versus Protestant Netherlands ruins that explanation. But if aggregating the three countries, as Germany has more inhabitants, I'm confident that some statistical method would validate the religion correlation. Add Italy and Spain to the study, plus Norway and Denmark, if this helps.

Next one: the hardly hit regions use dialects. The ones commonly speaking standard German are spared. This is even confirmed by Switzerland and Austria. So to combat Covid, police should not enforce the confinement, nor possibly the use of masks in the future, but the use of good German - provided that policemen themselves, err, well.

No, dialects are politically too sensitive in Germany. But what about wineyards? The hardly hit region are the ones where wineyard grows. Again a perfect correlation. Don't look after Chinese markets for exotic animals: the pneumonia was obviously endemic to grapes (no more ludicruous than fish transmitting a pneumonia to humans). What, we can't forbid wine?

OK then, the hard-hit German regions belonged to the Western Roman empire. Again a perfect correlation. And it holds elsewhere: Switzerland and Austria too are hit hard, Italy and Spain and France even worse, Denmark and Norway far less. So to combat Covid, we could just rename all cities and streets whose name derives from Latin. Easy!

Or maybe the regions hit harder are richer, so more people could afford skiing in Tirol, and they brought the virus back. It explains the European West-East gradient too. But that one isn't fun.

Why shouldn't you seek your own correlations? There must be zillions just as good. Have fun!

Link to comment
Share on other sites

5 hours ago, Enthalpy said:

Next one: the hardly hit regions use dialects. The ones commonly speaking standard German are spared. This is even confirmed by Switzerland and Austria. So to combat Covid, police should not enforce the confinement, nor possibly the use of masks in the future, but the use of good German

I can't tell if this is serious or not.

I have read both your posts a couple of times and have no idea what point you are trying to make. Are you simply trying to illustrate that well known saying "correlation is not causation" ?

Link to comment
Share on other sites

(Renaming, since there has been a response)

 

Quote

But the authors computed correlations (Spearman r=-0.510, p=0.01) to claim that the D-allele is a "co-factor". Papers of that kind give me mixed feelings about peer-review and research journals.

I recall a presentation some years ago on a problem (in the life sciences, in this example, but potentially elsewhere) that you'd draw some samples from your test subjects, and because it was so hard to get the experiment set up and approved, you would end up running all sorts of tests on the subjects. Not being in the field I can't recall what the tests were, but apparently you would test for dozens of different effects. The problem being that you were looking for a p-value > 0.05, and statistically speaking, you would do enough tests (>20) where a false positive would be expected to pop up. 

So you have the same problem here. if you start looking for correlations, you will eventually find them, without them being causal. (one of my favorites is that buying certain types of cars correlates with voting for a particular party is mistaken for causation , i.e. the situation where one might claim buying a Ford pickup truck causes you to vote republican)

This is one reason why you don't rely on one study, and also why you need to find a causative agent that you can independently test.

 

On 8/9/2019 at 11:03 AM, Enthalpy said:

 

One study found correlations with countries' GDP per capita and gender gap
biorxiv.org or hal.archives-ouvertes.fr or (restricted) sciencedirect.com
but I see only a correlation there.  

So they say there's correlation, and you see a correlation, but word this as if you disagree? 

 

Link to comment
Share on other sites

  • 3 weeks later...

Fewer people catch Covid-19 on Mondays. You can observe it on statistics for Germany, there
n-tv.de
Diagram titled "Fallzahlen-Trend Deutschland", click on "Differenz absolut", you get the number of new badly sick people per day.

Mondays were 9, 16, 23, 30 of March 2020, 6, 13, 20, 27 of April 2020. This failed on 16 March, and the dip was a day early on 22 March. Tuesday 14 April dropped lower than Monday as it followed Easter Monday.

Does this imply that people get contaminated on weekdays, but not on Saturdays and Sundays when staying at home? Not necessarily. The number of deaths too drops on Thursdays and Fridays despite the delay from contamination to possible death varies. See the diagram titled "Todesfälle in Deutschland". There must rather be some weekly fluctuation in the reporting tasks.

Since Monday 27 April, more companies and shops are allowed to open in Germany. Some neighbour countries keeping strong mandatory isolation alledge that the number of cases increased consequently in Germany, but this is only the weekly fluctuation. Comparing with the week before, you observe a steady decline, nothing special since 27 April.

Link to comment
Share on other sites

  • 1 year later...

Trying to infer logic and means of actions against Covid from statistics is deceiving and disappointing.

When Czechs wore makeshift face masks and were spared, I told "Ze big solution". But the next strain of Covid took a horrible toll there, putting the country at once among the worst hit.

Poland too was spared but then badly hit by that same strain. Maybe avoiding the contaminations during one wave makes things only worse during a following one. My thoughts are with Australia, that has had little cases but where the more contagious delta variant spreads presently.

==========

I still stand behind my old claim, that confinement acted far too late on the incidence in Italy and Spain to be considered a cause. The mean incubation time is allegedly well under two weeks, but strict confinement brought no improvement before a month, if any. The same happened in France in Autumn 2020.

But we do see many more cases after mass public gatherings: BLM demonstrations in the USA, Euro soccer championship. Are humans the main vector?

Slovakia tested 2/3 of its population in Oct 2020. One can't even see an effect on the curves.

I computed and wrote elsewhere "If the contact tracing application worked, it would mark every inhabitant"
20minutes.fr
and this started to happen in England.

The French expert Didier Raoult, much denigrated for disagreeing with the government, told

  • Waves correspond to virus variants (that's commonplace meanwhile)
  • Masks and confinement bring little or nothing (looks true now, still not widely admitted)
  • Means to treat the patients make the difference (horribly true, compare the countries)
  • Vaccines are not the main solution (still looks wrong).

Common vaccines do little or very little against the contamination by the present delta variant, but they still reduce efficiently the number of hospitalisations and deaths. Best example is the UK, where the new cases run as high as in Spring but deaths are cut by /20. This corresponds to the vaccination proportion among people at risk. But if the general population remains as contagious, vaccinating it seems useless.

==========

I still try to understand why Sweden, with no confinement and nearly unrestricted public life, went through the pandemic better than the UK, Spain, Italy, France. They had many cases, initially many deaths but not the worst proportion, and since Spring nearly no deaths.

  • The amount of damage is predefined, public policies do nothing? Maybe.
  • Catching earlier strains of Covid disarms the following ones?
  • Everyone got flu previously, this protects against Covid?
  • People who survived flu resist Covid too?
  • Statistics from the USA, South America... tell climate and weather don't matter.

Hugo Zeber and Svante Pääbo found an interesting correlation between a gene cluster inherited from Neanderthals and severe symptoms
nature.com
This seems to hold, and contributes to explain why Sweden is less affected than Southern Europe and pre-Colombian people.

An other correlation: the Covid makes less damage in countries where saunas serve regularly, map from Wiki

DeadPer100000Europe.png.f62009439e942c642096ad6bde8cf1ae.png

Nordic people would consider that obvious. Though, many Russian died despite saunas since this map.

Link to comment
Share on other sites

  • 3 weeks later...

I wrote "Maybe avoiding the contaminations during one wave makes things only worse during a following one" and this graph illustrates it.

EUCountriesCumulDeath202109small.png.69a6cb8551dc6429f19acd31c9597dc5.png
Notice the log scale. Bigger versions at wikipedia

The EU countries less hit after May 2020 (Poland, Czech republic, Slovakia, Hungary and more) caught up or exceeded the others in May 2021.

Exceptions are (as of Sep 2021!) Norway, Denmark, Finland.

Sweden, with a similar population but without confinement and little social distancing, is more badly hit than Norway and Denmark, but less than the average of the EU. Or maybe time will upset the situation again.

Slovakia tested 2/3 of its population in Oct 2020. It's the curve lowest on the graph in Sep 2020. All countries were hit at that time, so the Slovaks didn't get infected at test centres, but the tests didn't bring anything. Governments that make tests mandatory should observe and meditate that curve.

A persistent impression: we apply inadequate mental schemes to Covid, which doesn't spread like flu or cold do. Different vectors? Encrypted version while spreading?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.