Hypothesis Testing (Extended)
Hypothesis Testing (Extended Treatment)
This document provides a rigorous treatment of hypothesis testing methodology, including null and alternative hypotheses, significance levels, Type I and II errors, one-tailed and two-tailed tests, and critical regions.
Hypothesis testing is a formal procedure for making decisions about population parameters based on sample evidence. It provides a principled framework for quantifying uncertainty.
1. The Hypothesis Testing Framework
1.1 Null and alternative hypotheses
The null hypothesis is the default assumption -- typically a statement of "no effect" or "no difference." It is assumed to be true unless the evidence is sufficiently compelling to reject it.
The alternative hypothesis specifies what we believe might be true instead. It is only accepted if the evidence against is strong enough.
1.2 Test statistic
A test statistic is a function of the sample data whose distribution is known under . Common test statistics include sample proportions, sample means, and sample correlation coefficients.
1.3 Significance level
The significance level is the maximum probability of rejecting when it is actually true. Common values are (5%), (1%), and (10%).
1.4 The decision rule
- Assume is true.
- Calculate the probability of obtaining a test statistic at least as extreme as the observed value, assuming .
- If this probability (the -value) is less than , reject . Otherwise, do not reject .
1.5 Critical value approach
Alternatively, find the critical value such that (for an upper-tailed test). If the observed test statistic exceeds , reject .
1.6 Interpreting the conclusion
- "Reject ": there is sufficient evidence at the significance level to support .
- "Do not reject ": there is insufficient evidence to reject . This does not mean is true.
Common Pitfall "Accepting " is not the same as "not rejecting ." We never prove ; we merely fail to find sufficient evidence against it. The conclusion should always be stated carefully.
2. Type I and Type II Errors
2.1 Definitions
| Decision | true | false |
|---|---|---|
| Reject | Type I error | Correct decision |
| Do not reject | Correct decision | Type II error |
Type I error: Rejecting when it is true (false positive).
Type II error: Failing to reject when it is false (false negative).
2.2 The power of a test
The power of a test is the probability of correctly rejecting when it is false:
The power depends on:
- The significance level (increasing increases power).
- The sample size (increasing increases power).
- The true value of the parameter (the further from , the greater the power).
2.3 Worked example
Problem. A machine produces bolts with mean length . The standard deviation is . A sample of 16 bolts has mean . Test at the 5% significance level whether the mean length has changed.
, (two-tailed).
Under : .
Critical values: such that .
for a two-tailed 5% test.
Critical region: or .
Test statistic: .
Since , we do not reject . There is insufficient evidence at the 5% level to conclude that the mean length has changed.
2.4 Finding the probability of Type II error
Continuing the example above, suppose the true mean is .
Power .
3. One-Tailed and Two-Tailed Tests
3.1 One-tailed tests
A one-tailed test is used when specifies a direction:
- (upper-tailed): critical region in the upper tail.
- (lower-tailed): critical region in the lower tail.
The entire significance level is in one tail, making it easier to detect an effect in the specified direction.
3.2 Two-tailed tests
A two-tailed test is used when does not specify a direction:
- : critical region split between both tails, with in each.
3.3 Choosing one-tailed vs two-tailed
- Use a two-tailed test unless there is a strong prior reason to expect a specific direction.
- A one-tailed test has greater power in the specified direction but cannot detect effects in the opposite direction.
- The choice must be made before examining the data.
3.4 Critical values table
For a standard normal test at significance level :
| Test type | |||
|---|---|---|---|
| Two-tailed | |||
| Upper-tailed | |||
| Lower-tailed |
4. Hypothesis Tests for the Binomial Proportion
4.1 Setting up the test
To test whether a population proportion equals a specified value :
Under , if is the number of successes in trials, then .
4.2 Worked example: binomial test
Problem. A coin is tossed 20 times and lands heads 15 times. Test at the 5% significance level whether the coin is biased.
, (two-tailed).
Under : .
For a two-tailed test at 5%, we need the critical region in each tail to have probability .
Lower tail: .
. So (critical region: ).
Upper tail: .
. So (critical region: ).
Since falls in the critical region, we reject . There is sufficient evidence at the 5% level to conclude the coin is biased.
4.3 Finding the actual significance level
The actual significance level is the probability of being in the critical region under :
This is approximately 4.14%, which is the closest we can get to 5% with a discrete distribution.
warning For discrete distributions, the actual significance level may differ from the nominal level. The critical region is chosen so that does not exceed , and is as close as possible to .
5. Critical Regions
5.1 Definition
The critical region (or rejection region) is the set of values of the test statistic that lead to rejection of . The acceptance region is its complement.
5.2 Finding the critical region
Procedure:
- Identify the distribution of the test statistic under .
- Determine whether the test is one-tailed or two-tailed.
- Find the smallest region containing the most extreme values whose total probability under does not exceed .
5.3 Worked example: Poisson critical region
Problem. A receptionist receives on average 2 calls per 5 minutes. Over a 5-minute period, she receives 7 calls. Test at the 5% level whether the rate has increased.
, (upper-tailed).
Under : .
Critical region: smallest such that .
Critical region: . Actual significance level: .
Since , we reject . There is sufficient evidence that the call rate has increased.
5.4 Worked example: normal critical region
Problem. The masses of packets of biscuits are normally distributed with standard deviation . A sample of 10 packets has mean mass . Find the critical region for testing whether the mean mass is less than at the 1% significance level.
, .
Under : .
Critical region: .
Since , we do not reject at the 1% level.
6. Practice Problems
Problem 1
A die is rolled 30 times and a six appears 9 times. Test at the 5% significance level whether the die is biased towards showing a six.
Solution
, (upper-tailed).
.
Find smallest : .
.
.
.
Since is in the critical region, reject . Sufficient evidence the die is biased towards six.
Problem 2
A manufacturer claims that the mean lifetime of a component is 500 hours. A sample of 25 components has mean lifetime 490 hours with standard deviation 15 hours. Test the claim at the 5% significance level.
Solution
, (two-tailed).
approximately.
.
Critical values: .
Since , reject . Sufficient evidence the mean lifetime differs from 500 hours.
Problem 3
The number of accidents per week at a factory is thought to follow a Poisson distribution with mean 3. In a particular week, 8 accidents occur. Test at the 5% level whether the accident rate has increased.
Solution
, .
.
.
.
Critical region: . Since , reject . Sufficient evidence the rate has increased.
Problem 4
In a hypothesis test with and based on a sample of size 20, find: (a) the critical region at the 5% level; (b) the actual significance level; (c) the probability of a Type II error if the true .
Solution
Under : .
(a) .
.
Critical region: .
(b) Actual significance level .
(c) Under : .
.