Jump to content

Why is 30 the cut off for z vs t distribution?


CuriousBanker

Recommended Posts

It is not a magic number, just an empirical rule of the thumb. If you look at something like coin flips and compare the normal to the binomial for 30 flips, it will appear close enough.

 

Why not 29 or 31 or 35? Is there like a logical reason for 30 or was it just decided upon by somebody?

Link to comment
Share on other sites

 

In the old days (B.C: before computers) when calculations were done by hand, analysts would use the normal distribution if the degrees of freedom were greater than 30 (for 30 df, the proper multiplier is 2.04; for 60 df, it's 2.00). Otherwise, the t distribution was used. This says as much about the availability of tables of the t distribution as anything else.

 

Today, tables of distributions have been replaced by computer programs. The computer thinks nothing about looking up the t distribution with 2351 degrees of freedom, even if it is almost identical to the standard normal distribution. There is no magic number of degrees of freedom above which the computer switches over to the standard normal distribution. Computer programs that compare sample means use Student's t distribution for every sample size and the standard normal distribution never comes into play.

 

We find ourselves in a peculiar position. Before computers, analysts used the standard normal distribution to analyze every large data set. It was an approximation, but a good one. After computers, we use t distributions to analyze every large data set. It works for large non-normal samples because a t distribution with a large number of degrees of freedom is essentially the standard normal distribution. The output may say t test, but it's the large sample theory that makes the test valid and large sample theory says that the distribution of a sample mean is approximately normal, not t!

- http://www.jerrydallal.com/LHSP/student2.htm

 

My stats knowledge is old. I keep forgetting the stuff, because I never use it, despite using it plenty. Whatever. Anyway, I think the logic of that website's author is acceptable. This may be a historical issue rather than a mathematical issue... But as a mathematical issue using more than 30 df could be considered labor intensive, thus not economical without a CPU to crunch data.

 

Don't take the following as truth:

If I remember stats correctly, something having a 30 df is darn close to a normal distribution curve in shape, thus you can really use a normal dist. curve past 30 df. Something like that. The point is that computers are enabled to crunch data real fast and keep things within the realm of a t-test rather than make-shifting things over to a normal dist. curve for ease of data analysis I took a look at this thread about 10 hours or so ago. But now there is the best answer I can give you at the moment unless another math nerd pops in.

Edited by Genecks
Link to comment
Share on other sites

Why not 29 or 31 or 35? Is there like a logical reason for 30 or was it just decided upon by somebody?

 

No, it is arbitrary. The nice thing is that you can define mathematically how closely it fits normality based exactly on number of trials.

 

But, this is just one of many 'rules of thumb' that tend to be easy guidelines to remember. Pretty much all science and mathematics have them.

 

They are meant to be easy to remember so that when trying to estimate or 'back of the envelope' calculate something, you don't have to go to the exact formula. If the answer you need is okay with a large margin of error -- a rule of thumb can make the calculation a lot easier, or maybe even unnecessary.

 

Let's give an example. Say you had an unfair coin, that you knew it was weighted 75% to come up on one side. But you didn't know which side was the weighted side (that is, which side it would come up with 75% of the time). About how may flips would it take to be pretty sure of which side is the weighted one?

 

In this case, 30 is a pretty good answer. It would be pretty unlikely, with a 75% weighted coin to come up with a 15 H and 15 T distribution. Not impossible (you can do the calculation if you want) but pretty unlikely. Now, doing 31 flips would improve the confidence in your final answer a little more, and 29 flips a little less. But 30 is a nice, 'round', easy-to-remember number.

 

Now, let's change the question a little bit: Say you knew a coin was weighted unfairly so that heads was more likely. How many flips would it take to estimate with 95% confidence, what the probability of heads is?

 

This is a much more difficult question. And the answer actually depends on how unfair the coin really is. It will take a lot more flips to be 95% confident that the coin will come up with heads 99% of the time rather than 60% of the time. Simply because again if you take 30 flips, if the coin is 60% unfair then 30 flips will yield something like 15 to 18 H and 15 to 12 T. But 99% unfair coin will most likely be 30 H and 0 T. But, so will a 98% coin. So, you actually have to do a lot more flips to distinguish exactly how unfair a coin is when it is very unfair. In this case, just using the rule of thumb may not get you a good answer.

 

In short, you have to learn when a rule of thumb will work and when it won't. None of them are hard and fast laws -- they are just trying to make easy-to-remember situations to eliminate some calculations.

 

If there is every any doubt, you shouldn't use a rule of thumb. And, I would never consider a rule-of-thumb a conclusive answer. If I needed to use a rule of thumb because someone needed a 'good guess' type answer quickly, I would always go back and do the full math later. Too often people have used rules of thumb as a final answer when the shortcuts behind the rule of thumb were never really valid in the first place.

Link to comment
Share on other sites

  • 1 month later...
Why not 29 or 31 or 35? Is there like a logical reason for 30 or was it just decided upon by somebody?
The version of that rule of thumb handed to me, helping with some field research in ecology, was 29, not 30.

 

Just an anecdote. It was presented as a lesson from experience in that particular field, to guide the rookie when planning a research program and uncertain of the number of repetitions or data collection events or whatever would be likely to yield the magic 95% confidence level for the answers to normal questions.

 

That kind of estimate was important for budgeting money, time, effort, etc. When we collected census data for tree species distribution across a series of islands and nearby mainlands in our research area, for example, we planned census visits to 29 islands and nearest mainlands. That turned out to be a small overkill - around 25 or 26 we had it - but impressively close - saved us from wasting a lot of work on too few, without costing us much extra effort.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.