# Shapiro-Wilk and Bartlett test

## Recommended Posts

Hi,

I have to do some statistic, and I don't understand how works the shapiro-wilk test or the barltett test.

I think I understand why do we use it (to see if an sample follow a normal distribution for shapiro and see if our vaiance are equal for Bartlett).

But I am completely unable to do the math, can somebody help me ?

Thanks

##### Share on other sites
3 hours ago, sangui said:

Hi,

I have to do some statistic, and I don't understand how works the shapiro-wilk test or the barltett test.

I think I understand why do we use it (to see if an sample follow a normal distribution for shapiro and see if our vaiance are equal for Bartlett).

But I am completely unable to do the math, can somebody help me ?

Thanks

Could you show some example of what you are unable to do?

Shapiro-Wilk and Bartlett tests are not really that closely related. So if you bunch them together like this, it may mean that you are in need of some of the basic concepts that underlie them both.

##### Share on other sites

For example I have to look if those sample follow a normal distribution.

Temperature_1 Temperature_2 Temperature_3 Temperature_4 Temperature_5
2.56 2.28 2.73 2.44 2.54
2.92 2.78 2.97 2.81 2.67
2.00 2.74 2.00 2.08 2.43
2.83 2.47 2.13 2.90 2.10
2.61 2.52 2.09 3.05 2.78
3.10 2.16 1.90 2.69 2.85
2.42 2.70 3.04 3.03 2.76
2.28 2.70 2.57 2.91 2.47

I'm suppose to use shapiro-wilk .

On this one I need to find if the variance are equal.

Temperature_1 Temperature_2 Temperature_3
2.42 3.05 1.95
2.83 2.21 2.23
2.25 2.18 2.54
3.02 2.35 2.56

I think I must use Bartlett.

##### Share on other sites
41 minutes ago, taeto said:

Shapiro-Wilk and Bartlett tests are not really that closely related. So if you bunch them together like this, it may mean that you are in need of some of the basic concepts that underlie them both.

Isn't it common practice to check for both normal distribution and equal variance, before applying tests to test if the nulhypothesis can be rejected? I remember using Levene's and Shapiro (and on the one occasions I had large sample size, kolmogorov-smirnov).

Aren't they related in the sense that you need these assumptions for follow up tests (ANOVA's for instance)?

##### Share on other sites
5 minutes ago, Dagl1 said:

Aren't they related in the sense that you need these assumptions for follow up tests (ANOVA's for instance)?

Actually, it's exactly why I need to understand those test ^^

##### Share on other sites
Posted (edited)
17 minutes ago, Dagl1 said:

Isn't it common practice to check for both normal distribution and equal variance, before applying tests to test if the nulhypothesis can be rejected? I remember using Levene's and Shapiro (and on the one occasions I had large sample size, kolmogorov-smirnov).

Aren't they related in the sense that you need these assumptions for follow up tests (ANOVA's for instance)?

Yes, it seems to be what the assignment is about. But sangui said that he cannot do the math. And the math differs a lot. Bartlett is a standard $$\chi^2$$ test, whereas the Shapiro-Wilk testor W needs looking up in a table. That makes the mechanics a little different. The value of W is actually not quite easy to compute, so it would be a place to start, if necessary.

Edited by taeto

##### Share on other sites
11 minutes ago, taeto said:

Yes, it seems to be what the assignment is about. But sangui said that he cannot do the math. And the math differs a lot. Bartlett is a standard χ2 test, whereas the Shapiro-Wilk testor W needs looking up in a table. That makes the mechanics a little different. The value of W is actually not quite easy to compute, so it would be a place to start, if necessary.

Ahh I see, ye I didn't really consider the math differences when reading your post, apologies!

##### Share on other sites
14 minutes ago, taeto said:

Yes, it seems to be what the assignment is about. But sangui said that he cannot do the math. And the math differs a lot. Bartlett is a standard χ2 test, whereas the Shapiro-Wilk testor W needs looking up in a table. That makes the mechanics a little different. The value of W is actually not quite easy to compute, so it would be a place to start, if necessary.

It's my fault I haven't been clear.

I'm suppose to learn how to use those test, and my teacher doesn't gave me more information. So, I don't really know the difference.

##### Share on other sites
3 minutes ago, Dagl1 said:

Ahh I see, ye I didn't really consider the math differences when reading your post, apologies!

Don't be silly. I am happy you point out the obvious connection. When I see things from a math viewpoint, I sometimes forget the, well, obvious 🧐

3 minutes ago, sangui said:

It's my fault I haven't been clear.

I'm suppose to learn how to use those test, and my teacher doesn't gave me more information. So, I don't really know the difference.

At least Bartlett is not so difficult, so let us go through it 🙂

But first, what is the meaning of your first table? We see five columns T1 to T5 (where T stands for "Temperature_") and eight rows with an entry for every column. Do you expect that you have to test whether all the 40 entries in the rows and columns follow the same normal distribution? Or just that the entries in the same column follow a normal distribution, possibly not the same for all columns? Or the entries in the same row?

And then, what is the similar meaning of the second table?

The two tables are not related in any way by coming from the same experiment or anything like that, is that right?

##### Share on other sites
Posted (edited)
54 minutes ago, taeto said:

But first, what is the meaning of your first table? We see five columns T1 to T5 (where T stands for "Temperature_") and eight rows with an entry for every column. Do you expect that you have to test whether all the 40 entries in the rows and columns follow the same normal distribution? Or just that the entries in the same column follow a normal distribution, possibly not the same for all columns? Or the entries in the same row?

We must see if all entries follow the same normal distribution.

And it's the same for the second table.

I don't think those table are related (but I'm not sure, I just have the exercice and the value of Shapiro for the first and Bartlett for the second).

I'm sorry to don't be more precise but, I haven't a lot of information (I need to understand this test for the following of my study, but my teacher choose to don't work on it).

Thank you for your help.

Edited by sangui

##### Share on other sites
4 minutes ago, sangui said:

We must see if all entries follow the same normal distribution.

I don't think those table are related (but I'm not sure, I just have the exercice and the value of Shapiro for the first and Bartlett for the second).

If we take Bartlett first, then the purpose of the test is to figure out for several sets of data, and assuming that each set is normal distributed, whether they also have the same variance. If we think about the second table, it is possible that it represents four sets of data, one for each row, each set of data containing three values, for T1, T2 and T3.

Or it is (more) possible that the table represents three sets of data, each containing four values.

Let us say that it makes sense that the second table represents experiments in which for each of three temperatures T1,T2,T3 there were made four measurements. Then for T1 it means that values  2.42, 2.83, 2.25, 3.02 were measured, for T2 they were 3.05, 2.21, 2.18, 2.35, and for T3 it was 1.95, 2.23, 2.54, 2.56.

We can calculate the estimate variances of each of these samples in the standard way, as 1/3 of the difference between the average of the squares minus the square of the averages. I trust that this is familiar to you? Then we have three estimated variances V1, V2, V3, one for each T.

We also have to compute the estimate of the common variance V in case they were actually all equal. That will be $$V = 1/(12 - 3) \sum_{i=1,2,3} (4-1)Vi$$, where the 3 means that we have three data sets, the 4 means that we have four data in each set, and 12 is the total number of data in the table.

I have not made the computations, since I have no good calculator handy, and I would probably make confusing mistakes, sorry.

Finally you have to compute the Bartlett testor itself. First we need the number $$D = (12-3)\log V - \sum_{i=1,2,3} (4-1)\log V_i.$$ We can see from the formula above for V that it would be a pretty good match if all the V1,V2,V3 are the same, because then V would be equal to all of them, and this $$D$$ would be zero. So $$D$$ having a small value is good.

To compute the final Bartlett testor we also need to have  $$C=1 + (\sum_{i=1,2,3} \frac{1}{4-1} - \frac{1}{12-3})/(3(3-1)).$$ The testor becomes $$B = D/C$$.

Now you have to check $$B$$ against a $$\chi^2$$ distribution with 3 - 1 = 2 degrees of freedom.

Having typed all of that, maybe it is not as easy as I first thought. But try to compute as many of the numbers as you are able to.

##### Share on other sites

Stupid question but : what is frac ?

##### Share on other sites
Posted (edited)
2 hours ago, sangui said:

Stupid question but : what is frac ?

Just now, taeto said:

That is not a stupid question at all. It seems there is a particular quirk to this site, which forces you sometimes to reload a page to view something that was typeset in latex. But if you reload the page, and this thing still persist, then please reply with precise information about the piece of text where it occurs, typically somewhere that was supposed to be mathematical·

Edited by taeto

##### Share on other sites
Posted (edited)

Thanks

Is it possible than D is negative ?

I don't understand why do we use the sum for C, we don't have anything to sum ? i=1,2,3

Edited by sangui

##### Share on other sites
Posted (edited)
8 hours ago, sangui said:

Is it possible than D is negative ?

I don't understand why do we use the sum for C, we don't have anything to sum ? i=1,2,3

No, D cannot be negative.

And the sum for C is $$\sum_i \frac{1}{n_i-1}$$ where $$n_i$$ is the number of elements in the $$i$$'th data set. In our case each T contains 4 numbers. So we have to add 1/3 to itself three times, and the sum adds up to 1. And then when you have that sum, you subtract $$\frac{1}{n-3}$$, where $$n$$ is the sum of the $$n_i$$ and 3 is the number of data sets. Sorry that this was not so clearly written.

Edited by taeto

##### Share on other sites

I must be wrong somewhere because I found this.

My B=D/C=0.13257072

ANd in my correction : Bartlett’s K-squared =0.305

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account