# Hypothesis testing using samples

## Recommended Posts

So the hypothesis test procedure pits the null hypothesis ($H_0$) against the alternative hypothesis ($H_a$). If a given sample provides more evidence that $H_a$ is true, than $H_0$ is rejected. If there isn't enough evidence to reject $H_0$, then we can't conclude anything.

Two errors result from this:

Type I error: the error of rejecting $H_0$ when $H_0$ is true.
Type II error: the error of failing to reject $H_0$ when $H_0$ is false.

1. First example:

"

The U.S. Bureau of Transportation Statistics reports that for 2009, 72% of all
domestic passenger flights arrived on time (meaning within 15 minutes of the scheduled
arrival). Suppose that an airline with a poor on-time record decides to offer its
employees a bonus if, in an upcoming month, the airline’s proportion of on-time
flights exceeds the overall 2009 industry rate of .72. Let p be the actual proportion of
the airline’s flights that are on time during the month of interest. A random sample
of flights might be selected and used as a basis for choosing between
H0: p = .72 and Ha: p >.72
In this context, a Type I error (rejecting a true H0) results in the airline rewarding its
employees when in fact the actual proportion of on-time flights did not exceed .72.
A Type II error (not rejecting a false H0) results in the airline employees not receiving
a reward that they deserved.

"

The book claims that type II error, the failing to reject $H_0$ when $H_0$ is false, means that the airline employees did not receive the reward that they deserve. But isn't that not always true? Failing to reject $H_0$ means that $p< 0.72$ or $p > 0.72$, i.e. $p\neq0.72$, hence there could be less than 72% of domestic passenger flights being on time as well as there being more than 72% of domestic passenger flights being on time, yet the book only considers one possibility. Is there a reason for considering only one outcome? This is repeated in other examples that analyze type I and type II errors.

2. Another example I don't quite understand:

The probability of a Type I error is denoted by a and is called the significance
level of the test. For example, a test with a .01 is said to have a significance
level of .01.
The probability of a Type II error is denoted by b.

"

Women with ovarian cancer usually are not diagnosed until the disease is in an advanced
stage, when it is most difficult to treat. The paper “Diagnostic Markers for
Early Detection of Ovarian Cancer” (Clinical Cancer Research [2008]: 1065–1072)
describes a new approach to diagnosing ovarian cancer that is based on using six different
blood biomarkers (a blood biomarker is a biochemical characteristic that is
measured in laboratory testing). The authors report the following results using the six
biomarkers:
• For 156 women known to have ovarian cancer, the biomarkers correctly identified
151 as having ovarian cancer.
• For 362 women known not to have ovarian cancer, the biomarkers correctly
identified 360 of them as being ovarian cancer free.
We can think of using this blood test to choose between two hypotheses:

H0: woman has ovarian cancer
Ha: woman does not have ovarian cancer

Note that although these are not “statistical hypotheses” (statements about a population
characteristic), the possible decision errors are analogous to Type I and Type II errors.
In this situation, believing that a woman with ovarian cancer is cancer free would
be a Type I error—rejecting the hypothesis of ovarian cancer when it is in fact true.
Believing that a woman who is actually cancer free does have ovarian cancer is a
Type II error—not rejecting the null hypothesis when it is in fact false. Based on the
study results, we can estimate the error probabilities. The probability of a Type I error,
a, is approximately 5/156 .032. The probability of a Type II error, b, is approximately
2/363 .006.
"

From the above example, the type II error exists when we fail to reject $H_0$ when $H_0$ is false. This means that as the statistician conducting the research, the statistician could accept $H_0$ or reject $H_0$ because failure to reject $H_0$ means $H_0$ could be true or false. Therefore if by luck the statistician rejects $H_0$, then no error is made. The only error made in type II error is if he accepts $H_0$ when it is false. So why doesn't this factor into the calculation of error of type II error b?

3. Another example:

From the book:

"
The Environmental Protection Agency (EPA) has adopted what is known as the Lead
and Copper Rule, which defines drinking water as unsafe if the concentration of lead
is 15 parts per billion (ppb) or greater or if the concentration of copper is 1.3 parts
per million (ppm) or greater. With m denoting the mean concentration of lead, the
manager of a community water system might use lead level measurements from a
sample of water specimens to test

H0: m = 15 versus Ha: m > 15

The null hypothesis (which also implicitly includes the m > 15 case) states that the
mean lead concentration is excessive by EPA standards. The alternative hypothesis
states that the mean lead concentration is at an acceptable level and that the water
system meets EPA standards for lead. (How is this correct?)

...
"

Shouldn't $H_a$ be m < 15? Because if $H_a$: m > 15, then it's still not save by EPA standards. So we actually want $H_a$, right?

Edited by DylsexicChciken

##### Share on other sites

Whats the book.? The first example doesn't read as I remember the concept

BTW - Philip Stark at UC Berkeley has an excellent online stats text book. It covers his first year Stats option for Scientists iirc

##### Share on other sites

Whats the book.? The first example doesn't read as I remember the concept

BTW - Philip Stark at UC Berkeley has an excellent online stats text book. It covers his first year Stats option for Scientists iirc

This is my book:

http://www.amazon.com/Introduction-Statistics-Analysis-Available-Titles/dp/0840054904/ref=sr_1_1?s=books&ie=UTF8&qid=1416491336&sr=1-1&keywords=Introduction+to+Statistics+and+Data+Analysis+peck

##### Share on other sites

I think you need to reread section 10.1 - especially sections such as this

As a consequence, the conclusion when testing H 0: mu = 8000 versus H a : m < 8000 is always the same as the conclusion for a test where the null hypothesis is H 0 : m >= 8000. For these reasons, it is customary to state the null hypothesis H 0 as a claim of equality

##### Share on other sites

The book claims that type II error, the failing to reject $H_0$ when $H_0$ is false, means that the airline employees did not receive the reward that they deserve. But isn't that not always true? Failing to reject $H_0$ means that $p< 0.72$ or $p > 0.72$, i.e. $p\neq0.72$, hence there could be less than 72% of domestic passenger flights being on time as well as there being more than 72% of domestic passenger flights being on time, yet the book only considers one possibility. Is there a reason for considering only one outcome? This is repeated in other examples that analyze type I and type II errors.

Failing to reject H0 does not mean that $p \neq 0.72$; it means p could be 0.72.

If you reject H0, yes, it could be that the proportion is smaller than 0.72. But you'd use a one-sided hypothesis test which only rejects when $p > 0.72$.

2. Another example I don't quite understand:

The probability of a Type I error is denoted by a and is called the significance

level of the test. For example, a test with a .01 is said to have a significance

level of .01.

The probability of a Type II error is denoted by b.

"

Women with ovarian cancer usually are not diagnosed until the disease is in an advanced

stage, when it is most difficult to treat. The paper “Diagnostic Markers for

Early Detection of Ovarian Cancer” (Clinical Cancer Research [2008]: 1065–1072)

describes a new approach to diagnosing ovarian cancer that is based on using six different

blood biomarkers (a blood biomarker is a biochemical characteristic that is

measured in laboratory testing). The authors report the following results using the six

biomarkers:

• For 156 women known to have ovarian cancer, the biomarkers correctly identified

151 as having ovarian cancer.

• For 362 women known not to have ovarian cancer, the biomarkers correctly

identified 360 of them as being ovarian cancer free.

We can think of using this blood test to choose between two hypotheses:

H0: woman has ovarian cancer

Ha: woman does not have ovarian cancer

Note that although these are not “statistical hypotheses” (statements about a population

characteristic), the possible decision errors are analogous to Type I and Type II errors.

In this situation, believing that a woman with ovarian cancer is cancer free would

be a Type I error—rejecting the hypothesis of ovarian cancer when it is in fact true.

Believing that a woman who is actually cancer free does have ovarian cancer is a

Type II error—not rejecting the null hypothesis when it is in fact false. Based on the

study results, we can estimate the error probabilities. The probability of a Type I error,

a, is approximately 5/156 .032. The probability of a Type II error, b, is approximately

2/363 .006.

"

From the above example, the type II error exists when we fail to reject $H_0$ when $H_0$ is false. This means that as the statistician conducting the research, the statistician could accept $H_0$ or reject $H_0$ because failure to reject $H_0$ means $H_0$ could be true or false. Therefore if by luck the statistician rejects $H_0$, then no error is made. The only error made in type II error is if he accepts $H_0$ when it is false. So why doesn't this factor into the calculation of error of type II error b?

I don't understand your reasoning here. Are you suggesting that the statistician sees a statistically insignificant result, and hence fails to reject H0, the null might be true or false and hence the statistician might decide to reject? Because that's not how testing is done.

I can't tell if you're trying to distinguish between "accepts H0" and "fails to reject H0". They're synonymous, though the latter is a better description.

3. Another example:

From the book:

"

The Environmental Protection Agency (EPA) has adopted what is known as the Lead

and Copper Rule, which defines drinking water as unsafe if the concentration of lead

is 15 parts per billion (ppb) or greater or if the concentration of copper is 1.3 parts

per million (ppm) or greater. With m denoting the mean concentration of lead, the

manager of a community water system might use lead level measurements from a

sample of water specimens to test

H0: m = 15 versus Ha: m > 15

The null hypothesis (which also implicitly includes the m > 15 case) states that the

mean lead concentration is excessive by EPA standards. The alternative hypothesis

states that the mean lead concentration is at an acceptable level and that the water

system meets EPA standards for lead. (How is this correct?)

...

"

Shouldn't $H_a$ be m < 15? Because if $H_a$: m > 15, then it's still not save by EPA standards. So we actually want $H_a$, right?

Yeah, I think they got mixed up here. Typically you'd make the hypotheses as they describe them, so the alternative hypothesis represents an unacceptable level of lead. When you reject the null, you know something is wrong with the water.

##### Share on other sites
Failing to reject H0 does not mean that ; it means p could be 0.72.

If you reject H0, yes, it could be that the proportion is smaller than 0.72. But you'd use a one-sided hypothesis test which only rejects when .

The book's example assumes the case when it is true that we have committed a type II error, in that p = 0.72 is false and we fail to reject it. Then the actual value of p would be $p< 0.72$ or $p> 0.72$. So the proportion of on time flights can be greater or less than 0.72, but the book only considers the case when $p> 0.72$. So the question is why did they choose only one of the consequences? Is it because they just happen to choose one example of consequence?

I don't understand your reasoning here. Are you suggesting that the statistician sees a statistically insignificant result, and hence fails to reject H0, the null might be true or false and hence the statistician might decide to reject? Because that's not how testing is done.

I can't tell if you're trying to distinguish between "accepts H0" and "fails to reject H0". They're synonymous, though the latter is a better description.

Now that I thought about it again, it makes a little more sense. This was my problem if you're still interested:

When we fail to reject $H_0$ given that $H_0$ is false. The fact that $H_0$ is false is something a statistician won't know for certain. The statistician only knows the probability that $H_0$ is false and he fails to reject it. This is error is denoted by b, the probability of type II error. So therefore, the statistician could have either chosen to accept $H_0$ or reject $H_0$ because the samples he collected is inconclusive about the truth of $H_0$. The probability formula for b is (# of failed rejections given that $H_0$ is false) / sample size. The sample size should be 362, so I believe the book made another error. As you can see, this calculation does not take into account the fact that the statistician could have accepted or rejected it. But I now understand that b is just the probability when we fail to reject a false $H_0$, therefore whether or not the statistician ultimately accepts $H_0$ or rejects $H_0$, the error of failing to reject a false $H_0$ is still made and the formula for b is only concerned with failing to reject a false $H_0$, regardless of what happens afterwards.

Edited by DylsexicChciken

##### Share on other sites

I don't know this book.

Has the text said anything about the acceptance criteria or decision rules or whatever?

These are an indispensible part of hypothesis testing and your text in green doesn't seem to include them, so I don't see how you can come to any conclusion.

As regards one tailed and two tailed tests, it is possible for only one tail of the two tailed test to fall within the considered area for typeII error comsideration.

##### Share on other sites

The book's example assumes the case when it is true that we have committed a type II error, in that p = 0.72 is false and we fail to reject it. Then the actual value of p would be $p< 0.72$ or $p> 0.72$. So the proportion of on time flights can be greater or less than 0.72, but the book only considers the case when $p> 0.72$. So the question is why did they choose only one of the consequences? Is it because they just happen to choose one example of consequence?

If $p < 0.72$ but we fail to reject, I wouldn't consider that a type II error. That's intentional, since we're testing for the alternative that $p > 0.72$. A one-tailed test would specifically try not to reject when $p < 0.72$.

When we fail to reject $H_0$ given that $H_0$ is false. The fact that $H_0$ is false is something a statistician won't know for certain. The statistician only knows the probability that $H_0$ is false and he fails to reject it.

The statistician does not know the probability that H0 is false; that's not what a p value is.

So therefore, the statistician could have either chosen to accept $H_0$ or reject $H_0$ because the samples he collected is inconclusive about the truth of $H_0$.

That's not the typical practice. If the sample is inconclusive, we "accept" (fail to reject) H0. We don't have the choice to reject it -- we have no evidence to justify the choice.

Significance testing is designed to avoid rejecting H0 unless we're really sure. When the evidence is inconclusive, we don't reject.

I think you're making this much more complicated than it needs to be by interpreting "fail to reject" as "I could accept or reject." That's not the case. "Accept" and "fail to reject" are synonymous, and if you fail to reject, you don't have any choice of what to do. You fail to reject. You're done.

##### Share on other sites

I don't know this book.

Has the text said anything about the acceptance criteria or decision rules or whatever?

These are an indispensible part of hypothesis testing and your text in green doesn't seem to include them, so I don't see how you can come to any conclusion.

As regards one tailed and two tailed tests, it is possible for only one tail of the two tailed test to fall within the considered area for typeII error comsideration.

I looked through the book and I don't think I see anything called acceptance criteria or decision rules. I am still reading an early section on hypothesis testing, so it might be somewhere later on.

The statistician does not know the probability that H0 is false; that's not what a p value is.

I was referring to the value b (of the test procedures on the variable being tested, p) the probability value of type II error. Type I error, a, and Type II error, b, are inversely proportional. The statistician has only control over type I error, a, therefore the statistician can control type II error, b, indirectly by minimizing or maximizing type I error, a. So even knowing this, in practice the statistician still won't be able to find the value of b(so far I haven't learned how to calculate b, if there even is a way)? I am still in an early part of the hypothesis testing chapter, so the information I learned so far should be about basic concepts of error and interpreting a problem.

Edited by DylsexicChciken

##### Share on other sites

I'm just shutting down for the night, so look again when I have has a chance to post something.

##### Share on other sites

I was referring to the value b (of the test procedures on the variable being tested, p) the probability value of type II error. Type I error, a, and Type II error, b, are inversely proportional. The statistician has only control over type I error, a, therefore the statistician can control type II error, b, indirectly by minimizing or maximizing type I error, a. So even knowing this, in practice the statistician still won't be able to find the value of b(so far I haven't learned how to calculate b, if there even is a way)? I am still in an early part of the hypothesis testing chapter, so the information I learned so far should be about basic concepts of error and interpreting a problem.

The probability of type II error depends on the size of the true effect, so you can't calculate it. You can, however, calculate it for different assumed sizes of true effect, so you could say "If the true effect is this big, then I have a 50% chance of detecting it."

##### Share on other sites

If a type I error a for a hypothesis test is 0.05, and the p-value=0.01. We reject $H_0$ because p-value $\leq$ a. What is the reasoning or intuition for this?

Edited by DylsexicChciken

##### Share on other sites

OK I said I'd post some more. I have tried to show things in diagrams and I am assuming normal distributions. Bignose will no doubt wish to generalise this if he comments again.

If we go through my diagram, hopefully it will become clear.

Most texts show a diagram like sketch A, but do not make it clear that there are two distributions in play, not one.

And few show you the sequence sketch B through sketch F

I cannot stress this enough.

My sketch A refers to the distribution of all possible sample distributions. (of the size of sample we are taking)

All the others refer to the population distribution.

Unlike in your previous thread we either have the population mean and standard deviation or we are assuming it as H0

So in my example the population mean is postulated as being 8.0.

So in this case the mean of all possible sample distributions should be the mean of the population, and we will take this as our null hypothesis and examine the errors that may arise if this is not true.

This allows us to set the cut off points which are also called critical values.

In my example they are 7.9 and 8.1

These cutoff points are where the acceptance criteria / decision rules I mentioned arise.

If we take a sample and the mean of the values falls between 7.9 and 8.1 we accept the H0

Since H0 and H1 are mutually exclusive accepting H0 means rejecting H1

So we don't test for H1

That is our acceptance criterion is

$7.9 \le {\mu _{sample}} \le 8.1$

Outside this acceptance range we reject H0 and the area of the tails gives us the probability of a TYPE I error

This reappears in sketch D.

OK so now we ask what happens if the population mean is not 8, because there is something wrong that requires action.

In sketches B, C, D, E and F I have successively moves the population curve along the axis to show it in various positions in relation to the critical values I have projected down from above by the dashed lines.

Note these have not moved from the original sample basis in sketch A

So in sketch B if ${\mu _{pop}} = 7.7$ then the right hand tail only enters the acceptance region.

That is there is a small probability that a sample drawn from this population could have a mean within the acceptance region.

This only occurs for a small % of cases but if our sample mean lies between 7.9 and 8.1 we will (wrongly) accept it.

This is a TYPE II error

The area that this tail intrudes into the acceptance region yields the % or probability of this.

This is the reason you were asking why a one tailed value was calculated in one of your examples

In sketch C I have moved the population curve mean along the first critical point at 7.9

Now there is a considerable probability that a sample mean drawn from this population could fall within the acceptance region.

The right hand tail may even extend beyond the upper critical value.

In sketch D the curve has moved back to a mean of 8 and we are back to TYPE I error possibilities.

Sketches E and F are simple mirror images of C and B as the lower tail moves past the acceptance region.

##### Share on other sites

If a type I error a for a hypothesis test is 0.05, and the p-value=0.01. We reject $H_0$ because p-value $\leq$ a. What is the reasoning or intuition for this?

"If H0 were true, we would get this sort of data less than 5% of the time. But we got it. H0 must be wrong."

You can think of it as nearly a proof by contradiction. If $p = 0$ exactly, then it is a proof by contradiction: if H0 were true, we would never get this data, but we did, so H0 must be false.

Another way to phrase it is "Either we're very lucky and got unlikely results, or H0 is wrong." At some point you're more willing to reject the null than assume you have incredible luck.

##### Share on other sites

"If H0 were true, we would get this sort of data less than 5% of the time. But we got it. H0 must be wrong."

You can think of it as nearly a proof by contradiction. If $p = 0$ exactly, then it is a proof by contradiction: if H0 were true, we would never get this data, but we did, so H0 must be false.

Another way to phrase it is "Either we're very lucky and got unlikely results, or H0 is wrong." At some point you're more willing to reject the null than assume you have incredible luck.

I was thinking of it in terms of the definition of a. It took some time, but I formulated the below intuition:

If p-value $\leq$ a (the equal sign under the inequality is not showing on this forum for some reason):

Getting our observed statistic from the sample is at most as likely as the chance of rejecting a true $H_0$.

In other words, hence we have very likely a higher chance that $H_0$ is false than we have the chance of rejecting a true $H_0$.

This simplifies to: we have very likely a higher chance of rejecting a false $H_0$ than the chance of rejecting a true $H_0$.

Edited by DylsexicChciken

##### Share on other sites

This simplifies to: we have very likely a higher chance of rejecting a false $H_0$ than the chance of rejecting a true $H_0$.

Not necessarily true. The chance of rejecting a false H0 is the power of the test. It depends on how different HA is from H0. If it's not very different, it may be very difficult to tell the difference, so you have a very small chance of rejecting a false null.

## Create an account

Register a new account