Anscombe's quartet and a statistical test to differentiate two cases

Recommended Posts

I would like to know whether or not there is a statistic that can differentiate between the case at the top left versus the top right.  Clearly R2 does not do so.  One could plot the residuals, and the non-random distribution sometimes becomes apparent.  However, what I was hoping to find is some number, preferably one that would be calculated by a statistics program, that could be compared in the two situations.  I am reading Motulsky's book Intuitive Biostatistics (that is where I first saw the Anscombe quartet, but I have not found anything in his book yet.  I am presently using ProStat, which has both a calculation of COD (which I am pretty sure is R2), as well as a calculation of "Corrl" which is said by the user manual to indicate "how closely the two variables approximate a linear relationship to each other."  I note the presence of squared differences in the numerator of COD, which are not found in Corrl.

Edited by BabcockHall

Share on other sites

It has been suggested to me that Pearson's R is a good statistic for this situation.  Thoughts?

Share on other sites

Hi I've been thinking about you question and I'm not sure there is any specific test to be extracted from tabulated data; which is partly why Anscombe recommends (strongly) sketching a plot first.

There are just so many different possible lines you could draw through a given set of points that comparing them pair by pair or even class by  class is an overwhelming task.

Further there is the question of endslopes.

If you try to fit a linear line then you cannot have zero slope at the origin or a turnover to an asympote.

A second order quadratic can do the first but not the second, you require at least a cubic to achieve this.

There may also be points that have more certain values than others.

For example consider a plank resting on two supports.

At the support points the plank can have zero deflection (or it is not resting on its support!)

Depending upon the support restraint it may also have a curvature or zero curvature.

So these values can never be rejected.

Share on other sites

The Kolmogorov-Smirnov test should work as it compares the entire sample distribution against an empirical distribution of any shape. A visual check would be better though.

Share on other sites

I thank you both for some helpful comments.  In our case we were plotting a straight line for gel electrophoresis data on proteins.  The standard curve, which is mobility versus logarithm of molecular weight for the standards, had noticeable curvature.  I am still looking into the biophysics, but the information that I have presently is that a slight deviation at high molecular weights is expected.  I am not looking to explain the results, so much as to describe it, in the sense of making a more formal statement to the effect that a linear fit leads to non-random residuals.

Share on other sites

Since you're not worried about inference you could just try fitting different curves to it and seeing which has the lowest least squared error. The problem is over-fitting the data: for a high enough polynomial you'll be able to find a 'perfect' fit. Stick to simple curves.

Share on other sites

One of the reasons I am asking is because R2 is a little bit like electronegativity in chemistry; one teaches students about it, and they want to use it for everything, even when there are better tools.  In this instance R2 is not ideal, because it is indifferent to the direction of the residual, only to its magnitude.

Create an account

Register a new account