Skip to main content

Hypothesis Testing (Extended)

Hypothesis Testing (Extended Treatment)

This document provides a rigorous treatment of hypothesis testing methodology, including null and alternative hypotheses, significance levels, Type I and II errors, one-tailed and two-tailed tests, and critical regions.

info

Hypothesis testing is a formal procedure for making decisions about population parameters based on sample evidence. It provides a principled framework for quantifying uncertainty.


1. The Hypothesis Testing Framework

1.1 Null and alternative hypotheses

The null hypothesis H0H_0 is the default assumption -- typically a statement of "no effect" or "no difference." It is assumed to be true unless the evidence is sufficiently compelling to reject it.

The alternative hypothesis H1H_1 specifies what we believe might be true instead. It is only accepted if the evidence against H0H_0 is strong enough.

1.2 Test statistic

A test statistic is a function of the sample data whose distribution is known under H0H_0. Common test statistics include sample proportions, sample means, and sample correlation coefficients.

1.3 Significance level

The significance level α\alpha is the maximum probability of rejecting H0H_0 when it is actually true. Common values are α=0.05\alpha = 0.05 (5%), α=0.01\alpha = 0.01 (1%), and α=0.10\alpha = 0.10 (10%).

1.4 The decision rule

  1. Assume H0H_0 is true.
  2. Calculate the probability of obtaining a test statistic at least as extreme as the observed value, assuming H0H_0.
  3. If this probability (the pp-value) is less than α\alpha, reject H0H_0. Otherwise, do not reject H0H_0.

1.5 Critical value approach

Alternatively, find the critical value cc such that P(test statisticcH0)=αP(\mathrm{test\ statistic} \geq c \mid H_0) = \alpha (for an upper-tailed test). If the observed test statistic exceeds cc, reject H0H_0.

1.6 Interpreting the conclusion

  • "Reject H0H_0": there is sufficient evidence at the α\alpha significance level to support H1H_1.
  • "Do not reject H0H_0": there is insufficient evidence to reject H0H_0. This does not mean H0H_0 is true.
warning

Common Pitfall "Accepting H0H_0" is not the same as "not rejecting H0H_0." We never prove H0H_0; we merely fail to find sufficient evidence against it. The conclusion should always be stated carefully.


2. Type I and Type II Errors

2.1 Definitions

DecisionH0H_0 trueH0H_0 false
Reject H0H_0Type I errorCorrect decision
Do not rejectCorrect decisionType II error

Type I error: Rejecting H0H_0 when it is true (false positive).

P(Type I)=αP(\mathrm{Type\ I}) = \alpha

Type II error: Failing to reject H0H_0 when it is false (false negative).

P(Type II)=βP(\mathrm{Type\ II}) = \beta

2.2 The power of a test

The power of a test is the probability of correctly rejecting H0H_0 when it is false:

Power=1β\mathrm{Power} = 1 - \beta

The power depends on:

  • The significance level α\alpha (increasing α\alpha increases power).
  • The sample size nn (increasing nn increases power).
  • The true value of the parameter (the further from H0H_0, the greater the power).

2.3 Worked example

Problem. A machine produces bolts with mean length 50  mm50\;\mathrm{mm}. The standard deviation is 0.5  mm0.5\;\mathrm{mm}. A sample of 16 bolts has mean 50.18  mm50.18\;\mathrm{mm}. Test at the 5% significance level whether the mean length has changed.

H0:μ=50H_0: \mu = 50, H1:μ50H_1: \mu \neq 50 (two-tailed).

Under H0H_0: XˉN(50,0.52/16)=N(50,0.015625)\bar{X} \sim N(50, 0.5^2/16) = N(50, 0.015625).

Critical values: xˉ\bar{x} such that P(Xˉ50c)=0.05P(|\bar{X} - 50| \geq c) = 0.05.

z=±1.96z = \pm 1.96 for a two-tailed 5% test.

c=50±1.96×0.125=50±0.245c = 50 \pm 1.96 \times 0.125 = 50 \pm 0.245

Critical region: Xˉ<49.755\bar{X} \lt 49.755 or Xˉ>50.245\bar{X} \gt 50.245.

Test statistic: z=50.18500.125=1.44z = \dfrac{50.18 - 50}{0.125} = 1.44.

Since 1.44<1.961.44 \lt 1.96, we do not reject H0H_0. There is insufficient evidence at the 5% level to conclude that the mean length has changed.

2.4 Finding the probability of Type II error

Continuing the example above, suppose the true mean is μ=50.2\mu = 50.2.

β=P(do not reject H0μ=50.2)\beta = P(\mathrm{do\ not\ reject}\ H_0 \mid \mu = 50.2)

=P(49.755<Xˉ<50.245XˉN(50.2,0.015625))= P(49.755 \lt \bar{X} \lt 50.245 \mid \bar{X} \sim N(50.2, 0.015625))

=P ⁣(49.75550.20.125<Z<50.24550.20.125)=P(3.56<Z<0.36)= P\!\left(\dfrac{49.755 - 50.2}{0.125} \lt Z \lt \dfrac{50.245 - 50.2}{0.125}\right) = P(-3.56 \lt Z \lt 0.36)

=Φ(0.36)Φ(3.56)0.64060.0002=0.6404= \Phi(0.36) - \Phi(-3.56) \approx 0.6406 - 0.0002 = 0.6404

Power =10.6404=0.3596= 1 - 0.6404 = 0.3596.


3. One-Tailed and Two-Tailed Tests

3.1 One-tailed tests

A one-tailed test is used when H1H_1 specifies a direction:

  • H1:μ>μ0H_1: \mu \gt \mu_0 (upper-tailed): critical region in the upper tail.
  • H1:μ<μ0H_1: \mu \lt \mu_0 (lower-tailed): critical region in the lower tail.

The entire significance level α\alpha is in one tail, making it easier to detect an effect in the specified direction.

3.2 Two-tailed tests

A two-tailed test is used when H1H_1 does not specify a direction:

  • H1:μμ0H_1: \mu \neq \mu_0: critical region split between both tails, with α/2\alpha/2 in each.

3.3 Choosing one-tailed vs two-tailed

  • Use a two-tailed test unless there is a strong prior reason to expect a specific direction.
  • A one-tailed test has greater power in the specified direction but cannot detect effects in the opposite direction.
  • The choice must be made before examining the data.

3.4 Critical values table

For a standard normal test at significance level α\alpha:

Test typeα=0.10\alpha = 0.10α=0.05\alpha = 0.05α=0.01\alpha = 0.01
Two-tailed±1.645\pm 1.645±1.960\pm 1.960±2.576\pm 2.576
Upper-tailed1.2821.2821.6451.6452.3262.326
Lower-tailed1.282-1.2821.645-1.6452.326-2.326

4. Hypothesis Tests for the Binomial Proportion

4.1 Setting up the test

To test whether a population proportion pp equals a specified value p0p_0:

H0:p=p0,H1:pp0 (or p>p0 or p<p0)H_0: p = p_0, \qquad H_1: p \neq p_0\ (\mathrm{or}\ p \gt p_0\ \mathrm{or}\ p \lt p_0)

Under H0H_0, if XX is the number of successes in nn trials, then XB(n,p0)X \sim B(n, p_0).

4.2 Worked example: binomial test

Problem. A coin is tossed 20 times and lands heads 15 times. Test at the 5% significance level whether the coin is biased.

H0:p=0.5H_0: p = 0.5, H1:p0.5H_1: p \neq 0.5 (two-tailed).

Under H0H_0: XB(20,0.5)X \sim B(20, 0.5).

For a two-tailed test at 5%, we need the critical region in each tail to have probability 0.025\leq 0.025.

Lower tail: P(Xk)0.025P(X \leq k) \leq 0.025.

P(X5)=0.02070.025P(X \leq 5) = 0.0207 \leq 0.025. So k=5k = 5 (critical region: X5X \leq 5).

Upper tail: P(Xk)0.025P(X \geq k) \leq 0.025.

P(X15)=P(X5)=0.02070.025P(X \geq 15) = P(X \leq 5) = 0.0207 \leq 0.025. So k=15k = 15 (critical region: X15X \geq 15).

Since X=15X = 15 falls in the critical region, we reject H0H_0. There is sufficient evidence at the 5% level to conclude the coin is biased.

4.3 Finding the actual significance level

The actual significance level is the probability of being in the critical region under H0H_0:

αactual=P(X5)+P(X15)=2(0.0207)=0.0414\alpha_{\mathrm{actual}} = P(X \leq 5) + P(X \geq 15) = 2(0.0207) = 0.0414

This is approximately 4.14%, which is the closest we can get to 5% with a discrete distribution.

warning

warning For discrete distributions, the actual significance level may differ from the nominal level. The critical region is chosen so that P(critical regionH0)P(\mathrm{critical\ region} \mid H_0) does not exceed α\alpha, and is as close as possible to α\alpha.


5. Critical Regions

5.1 Definition

The critical region (or rejection region) is the set of values of the test statistic that lead to rejection of H0H_0. The acceptance region is its complement.

5.2 Finding the critical region

Procedure:

  1. Identify the distribution of the test statistic under H0H_0.
  2. Determine whether the test is one-tailed or two-tailed.
  3. Find the smallest region containing the most extreme values whose total probability under H0H_0 does not exceed α\alpha.

5.3 Worked example: Poisson critical region

Problem. A receptionist receives on average 2 calls per 5 minutes. Over a 5-minute period, she receives 7 calls. Test at the 5% level whether the rate has increased.

H0:λ=2H_0: \lambda = 2, H1:λ>2H_1: \lambda \gt 2 (upper-tailed).

Under H0H_0: XPo(2)X \sim \mathrm{Po}(2).

Critical region: smallest kk such that P(Xk)0.05P(X \geq k) \leq 0.05.

P(X5)=1P(X4)=1e2 ⁣(1+2+2+43+23)=1e2×7.667=10.9473=0.0527P(X \geq 5) = 1 - P(X \leq 4) = 1 - e^{-2}\!\left(1 + 2 + 2 + \frac{4}{3} + \frac{2}{3}\right) = 1 - e^{-2} \times 7.667 = 1 - 0.9473 = 0.0527

P(X6)=1P(X5)=10.9835=0.01650.05P(X \geq 6) = 1 - P(X \leq 5) = 1 - 0.9835 = 0.0165 \leq 0.05

Critical region: X6X \geq 6. Actual significance level: 1.65%1.65\%.

Since X=76X = 7 \geq 6, we reject H0H_0. There is sufficient evidence that the call rate has increased.

5.4 Worked example: normal critical region

Problem. The masses of packets of biscuits are normally distributed with standard deviation 3  g3\;\mathrm{g}. A sample of 10 packets has mean mass 248  g248\;\mathrm{g}. Find the critical region for testing whether the mean mass is less than 250  g250\;\mathrm{g} at the 1% significance level.

H0:μ=250H_0: \mu = 250, H1:μ<250H_1: \mu \lt 250.

Under H0H_0: XˉN(250,32/10)=N(250,0.9)\bar{X} \sim N(250, 3^2/10) = N(250, 0.9).

P(Xˉ<c)=0.01    LBc250RB◆◆LB0.9RB=2.326P(\bar{X} \lt c) = 0.01 \implies \dfrac◆LB◆c - 250◆RB◆◆LB◆\sqrt{0.9}◆RB◆ = -2.326

c=2502.3260.9=2502.208=247.79c = 250 - 2.326\sqrt{0.9} = 250 - 2.208 = 247.79

Critical region: Xˉ<247.79\bar{X} \lt 247.79.

Since xˉ=248>247.79\bar{x} = 248 \gt 247.79, we do not reject H0H_0 at the 1% level.


6. Practice Problems

Problem 1

A die is rolled 30 times and a six appears 9 times. Test at the 5% significance level whether the die is biased towards showing a six.

Solution

H0:p=1/6H_0: p = 1/6, H1:p>1/6H_1: p \gt 1/6 (upper-tailed).

XB(30,1/6)X \sim B(30, 1/6).

Find smallest kk: P(Xk)0.05P(X \geq k) \leq 0.05.

P(X9)=1P(X8)P(X \geq 9) = 1 - P(X \leq 8).

P(X8)=r=08(30r)(1/6)r(5/6)30r0.9502P(X \leq 8) = \displaystyle\sum_{r=0}^{8}\binom{30}{r}(1/6)^r(5/6)^{30-r} \approx 0.9502.

P(X9)0.04980.05P(X \geq 9) \approx 0.0498 \leq 0.05.

Since X=9X = 9 is in the critical region, reject H0H_0. Sufficient evidence the die is biased towards six.

Problem 2

A manufacturer claims that the mean lifetime of a component is 500 hours. A sample of 25 components has mean lifetime 490 hours with standard deviation 15 hours. Test the claim at the 5% significance level.

Solution

H0:μ=500H_0: \mu = 500, H1:μ500H_1: \mu \neq 500 (two-tailed).

XˉN(500,152/25)=N(500,9)\bar{X} \sim N(500, 15^2/25) = N(500, 9) approximately.

z=4905003=3.33z = \dfrac{490 - 500}{3} = -3.33.

Critical values: ±1.96\pm 1.96.

Since 3.33=3.33>1.96|-3.33| = 3.33 \gt 1.96, reject H0H_0. Sufficient evidence the mean lifetime differs from 500 hours.

Problem 3

The number of accidents per week at a factory is thought to follow a Poisson distribution with mean 3. In a particular week, 8 accidents occur. Test at the 5% level whether the accident rate has increased.

Solution

H0:λ=3H_0: \lambda = 3, H1:λ>3H_1: \lambda \gt 3.

XPo(3)X \sim \mathrm{Po}(3).

P(X7)=1P(X6)10.9665=0.03350.05P(X \geq 7) = 1 - P(X \leq 6) \approx 1 - 0.9665 = 0.0335 \leq 0.05.

P(X6)=1P(X5)10.9161=0.0839>0.05P(X \geq 6) = 1 - P(X \leq 5) \approx 1 - 0.9161 = 0.0839 \gt 0.05.

Critical region: X7X \geq 7. Since X=87X = 8 \geq 7, reject H0H_0. Sufficient evidence the rate has increased.

Problem 4

In a hypothesis test with H0:p=0.4H_0: p = 0.4 and H1:p>0.4H_1: p \gt 0.4 based on a sample of size 20, find: (a) the critical region at the 5% level; (b) the actual significance level; (c) the probability of a Type II error if the true p=0.6p = 0.6.

Solution

Under H0H_0: XB(20,0.4)X \sim B(20, 0.4).

(a) P(X11)=1P(X10)10.9435=0.0565>0.05P(X \geq 11) = 1 - P(X \leq 10) \approx 1 - 0.9435 = 0.0565 \gt 0.05.

P(X12)=1P(X11)10.9790=0.02100.05P(X \geq 12) = 1 - P(X \leq 11) \approx 1 - 0.9790 = 0.0210 \leq 0.05.

Critical region: X12X \geq 12.

(b) Actual significance level 2.10%\approx 2.10\%.

(c) Under p=0.6p = 0.6: XB(20,0.6)X \sim B(20, 0.6).

β=P(X11p=0.6)0.4044\beta = P(X \leq 11 \mid p = 0.6) \approx 0.4044.