Jump to content

Statistics: reporting nonparametric test results


Function

Recommended Posts

Hello

 

I was wondering: when you use a parametric test on data (e.g. unpaired Student's t-test), you may report a certain t(df), and p value, and you may report mean and a certain confidence interval to give an idea of the direction of the (in)significance or the trend of the result.

 

But what when you get to use a nonparametric test? Let's say you use the nonparametric equivalent of the Student's t-test mentioned above: the Mann-Whitney U test. What will you report? I may recall incorrectly that nonparametric tests are based on the median and IQR, rather than on means and CIs. Then again, I might recall correctly that they are based on (mean) ranks ...

 

So let's say you have a certain U = 1,183.00, z = -3.488 and p < 0.001 (statistics that I would all report).

 

How will you give your readers an idea of the direction of the significance? Will you still report means and CI? Or will you report medians and IQRs?

 

Thank you very much for your insights.

 

Regards

 

Function

Edited by Function
Link to comment
Share on other sites

Typically people use non-parametric tests because they have a skewed data-set, and for this reason the median and IQR are commonly used as they are more representative indicators of central tendency and spread in this case.

 

Sometimes you see it stated that there is no assumption of normality in non-parametric methods - but the assumption is stronger than this: no distributional assumptions of any kind are made (hence the name non-parametric - literally no parameters are being estimated). Typically these methods are based on ranking of data, but there are some cases where the median is used .

 

In terms of reporting the pertinent question is why you used a non-parametric test. Most likely it is because of skewed data so reporting median and IQR is appropriate.

Link to comment
Share on other sites

Thank you!

 

About your comment on (non-)normality of distribution: from when can a QQ-plot which "in your eyes" seems as if the data surely do approach a certain normal distribution according to Gauss, overrule a significant Kolmogorov-Smirnov (n > 50) or Shapiro-Wilk (n < 50)?

 

If K-S or S-W report difference from the distribution with a Gaussian distribution with a p < 0.0001, are you bound to reporting it as such, or can a QQ-plot still save your data by calling it 'normal enough' for use of a parametric test (with higher power than nonparametric)?

Link to comment
Share on other sites

Well, that is the question: how normal is normal enough?

 

A lot will depend on what kind of analysis you want to perform: some tests that assume normality are quite robust to the assumption (so even if your data is quite skewed or heavy/light tailed the test still performs well), while others are more sensitive. A two-tailed, two-sample t-test (which i guess you are doing?) is quite robust to non-normality - but bear it in mind if the p-value is marginal at your significance level. Also the sample size is pertinent here: smaller sample sizes are less robust to non-normality. I found this blog which explains it quite well.

 

Typically statisticians will go by a normal QQ plot (from which you should be able to read whether there is right or left skew and heavy or light tails). If these deviate only mildly from non-normality and you have a reasonable sample size a t-test will perform well enough. Quite what 'mildly' and 'reasonably' mean seems more art than science, which is why people often prefer the KS or SW test - they give a veneer of objectivity. But they can be misleading, both tests make their own assumptions which need examining, and they also give binary results (at some chosen level of significance) so you cannot explore how close the data is to normality. If you insist on using such tests SW is more powerful than KS.

 

Another thing you could try is to transform your data (log transformations are common) and see if it then more closely resembles a normal distribution: this is often preferred to non-parametric methods as it retains more power to detect a type 2 error.

 

The last thing to consider is what kind of statistical error you would prefer avoiding: is it better to incorrectly say an experimental medicine (Investigational Medicinal Product or IMP in your medical lingo) works when it actually doesn't (type 1 error) or is it better to say an IMP doesn't work when actually it does (type 2 error). Usually the former of these is considered worse, but now it's a question of ethics. If you prefer the latter you might choose to avoid navigating the maze of statistical considerations and just opt for less powerful non-parametric methods - but you might be missing something.

 

Sorry for the long post but there is a massive body of literature on this subject and no quick answers. I'll try to find some web sites that give you practice interpreting QQ plots, as this is the only way of learning what the different looks of plots mean.

 

Here are a couple that look OK:

http://people.reed.edu/~jones/Courses/P14.pdf

http://emp.byui.edu/BrownD/Stats-intro/dscrptv/graphs/qq-plot_egs.htm

Edited by Prometheus
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.