Skip to main content

Hypothesis Testing

Board Coverage

BoardPaperNotes
AQAPaper 1, 2Binomial tests in P1; normal tests in P2
EdexcelP1, P2Similar
OCR (A)Paper 1, 2Includes critical regions
CIE (9709)P1, P6Basic hypothesis testing in P6
info

Hypothesis testing requires clear, structured answers. Always state your hypotheses, test statistic, critical value/region, comparison, and conclusion in context.


1. Hypotheses

1.1 Null and alternative hypotheses

Definition.

  • The null hypothesis H0H_0 is the default assumption (usually "no effect" or "no change").
  • The alternative hypothesis H1H_1 is what we are trying to find evidence for.

1.2 One-tailed and two-tailed tests

  • One-tailed: H1:p>p0H_1: p \gt{} p_0 (right-tailed) or H1:p<p0H_1: p \lt{} p_0 (left-tailed).
  • Two-tailed: H1:pp0H_1: p \neq p_0.

The choice depends on the research question. Use a one-tailed test only when you have a specific directional prediction before seeing the data.

warning

Choosing a one-tailed test after seeing the data (because the results happen to go in one direction) is a form of pp-hacking and is statistically invalid. The tail direction must be decided before the experiment.


2. Critical Values and Significance Levels

2.1 Significance level

Definition. The significance level α\alpha is the maximum probability of incorrectly rejecting H0H_0 when it is true. Common values: 1%, 5%, 10%.

2.2 Critical value

The critical value is the boundary between the acceptance and rejection regions.

2.3 Critical region

The critical region (rejection region) is the set of values of the test statistic that lead to rejection of H0H_0.

2.4 Actual significance level

For discrete distributions, the actual significance level may differ from the nominal level α\alpha because we cannot achieve exactly α\alpha.

Example. For XB(15,0.5)X \sim B(15, 0.5), a right-tailed test at α=0.05\alpha = 0.05:

P(X12)=1P(X11)P(X \geq 12) = 1 - P(X \leq 11). We find the smallest cc such that P(Xc)0.05P(X \geq c) \leq 0.05.

P(X12)0.0176P(X \geq 12) \approx 0.0176, P(X11)0.0592P(X \geq 11) \approx 0.0592.

Critical region: X12X \geq 12. Actual significance level: 1.76%.


3. Type I and Type II Errors

3.1 Definitions

Definition.

  • Type I error: Rejecting H0H_0 when H0H_0 is true (false positive). P(TypeI)=α(thesignificancelevel)P(\mathrm{Type I}) = \alpha \mathrm{ (the significance level)}

  • Type II error: Failing to reject H0H_0 when H0H_0 is false (false negative). P(TypeII)=βP(\mathrm{Type II}) = \beta

  • The power of a test is 1β=P(rejectH0H0isfalse)1 - \beta = P(\mathrm{reject } H_0 \mid H_0 \mathrm{ is false}).

3.2 Relationship

Decreasing α\alpha (making the test stricter) generally increases β\beta (more false negatives). There is always a trade-off between Type I and Type II errors.

Intuition. Think of a courtroom: a Type I error is convicting an innocent person; a Type II error is acquitting a guilty person. Making the standard of proof higher (beyond reasonable doubt) reduces Type I errors but increases Type II errors. You cannot eliminate both simultaneously.


4. Hypothesis Testing Procedure

4.1 Standard method

  1. Define the random variable and its distribution under H0H_0.
  2. State H0H_0 and H1H_1.
  3. State the significance level α\alpha.
  4. Calculate the critical region (or critical value).
  5. Determine the test statistic from the data.
  6. Compare the test statistic to the critical value.
  7. Conclude in context.

4.2 Using pp-values

Alternatively: 1–3. Same as above. 4. Calculate the pp-value: the probability of obtaining a result at least as extreme as the observed value, assuming H0H_0 is true. 5. If pp-value <α\lt{} \alpha, reject H0H_0. Otherwise, do not reject H0H_0. 6. Conclude in context.


5. Binomial Hypothesis Tests

5.1 Single proportion test

Example. A coin is tossed 20 times and lands heads 15 times. Test at the 5% significance level whether the coin is biased towards heads.

XB(20,p)X \sim B(20, p).

H0:p=0.5H_0: p = 0.5, H1:p>0.5H_1: p \gt{} 0.5. One-tailed, α=0.05\alpha = 0.05.

Under H0H_0: XB(20,0.5)X \sim B(20, 0.5).

Find cc such that P(Xc)0.05P(X \geq c) \leq 0.05.

P(X14)=1P(X13)0.0577>0.05P(X \geq 14) = 1 - P(X \leq 13) \approx 0.0577 \gt{} 0.05. P(X15)=1P(X14)0.0207<0.05P(X \geq 15) = 1 - P(X \leq 14) \approx 0.0207 \lt{} 0.05.

Critical region: X15X \geq 15. Since X=15X = 15 is in the critical region, we reject H0H_0.

There is sufficient evidence at the 5% level that the coin is biased towards heads.


6. Normal Hypothesis Tests

6.1 Test for a mean (known variance)

Example. A machine fills bags with mean weight 500g. A sample of 30 bags gives xˉ=497\bar{x} = 497g. Test at the 5% level whether the mean weight has decreased, given σ=6\sigma = 6g.

H0:μ=500H_0: \mu = 500, H1:μ<500H_1: \mu \lt{} 500. α=0.05\alpha = 0.05.

Under H0H_0: XˉN(500,62/30)=N(500,1.2)\bar{X} \sim N(500, 6^2/30) = N(500, 1.2).

z=LB497500RB◆◆LB1.2RB=31.0954=2.739z = \dfrac◆LB◆497 - 500◆RB◆◆LB◆\sqrt{1.2}◆RB◆ = \dfrac{-3}{1.0954} = -2.739.

Critical value: P(Z<1.645)=0.05P(Z \lt{} -1.645) = 0.05.

Since 2.739<1.645-2.739 \lt{} -1.645, we reject H0H_0.

There is sufficient evidence that the mean weight has decreased.

6.2 Large sample test for a proportion

For large nn: p^N ⁣(p,p(1p)n)\hat{p} \sim N\!\left(p, \dfrac{p(1-p)}{n}\right) approximately.

Test statistic: z=LBp^p0RB◆◆LBp0(1p0)/nRBz = \dfrac◆LB◆\hat{p} - p_0◆RB◆◆LB◆\sqrt{p_0(1-p_0)/n}◆RB◆.


7. Interpreting Results

warning

"Failing to reject H0H_0" is not the same as "proving H0H_0 is true." It means the data does not provide sufficient evidence against H0H_0. The test may lack power (sample too small, effect too weak).


8. One-Tailed vs Two-Tailed Tests in Depth

8.1 Choosing between one-tailed and two-tailed

Use a one-tailed test when:

  • The research question has a specific directional prediction established before data collection.
  • Only one direction of deviation is practically meaningful.
  • The consequence of missing an effect in the unexpected direction is negligible.

Use a two-tailed test when:

  • You are interested in any difference from H0H_0, regardless of direction.
  • You want a more conservative test that is harder to reach significance with.
  • There is no strong prior reason to expect the effect in one specific direction.

Example. Testing whether a new teaching method changes exam scores:

  • One-tailed (H1:μ>μ0H_1: \mu \gt{} \mu_0): justified only if prior research strongly suggests the method improves scores, and you would not act on a decrease.
  • Two-tailed (H1:μμ0H_1: \mu \neq \mu_0): appropriate if the method is new and could either help or harm, and either outcome matters.

8.2 Critical region comparison

For a test at significance level α\alpha, the allocation of the significance level differs:

  • One-tailed: The entire α\alpha goes into one tail. The critical value is at the 1α1 - \alpha quantile (right-tailed) or α\alpha quantile (left-tailed).
  • Two-tailed: α/2\alpha/2 goes into each tail. The critical values are at the α/2\alpha/2 and 1α/21 - \alpha/2 quantiles.

This means the two-tailed test has a higher bar for each individual tail.

Example. Standard normal test at α=0.05\alpha = 0.05:

  • One-tailed (H1:μ>μ0H_1: \mu \gt{} \mu_0): reject if z>1.645z \gt{} 1.645.
  • Two-tailed (H1:μμ0H_1: \mu \neq \mu_0): reject if z>1.960z \gt{} 1.960 or z<1.960z \lt{} -1.960.

An observed z=1.80z = 1.80 is significant for the one-tailed test (1.80>1.6451.80 \gt{} 1.645) but not for the two-tailed test (1.80<1.960|1.80| \lt{} 1.960).

info

A two-tailed test at level α\alpha requires a more extreme test statistic than a one-tailed test at the same α\alpha, because the significance "budget" is split between two tails. A two-tailed test at α=0.05\alpha = 0.05 corresponds roughly to two one-tailed tests each at α=0.025\alpha = 0.025.

8.3 Effect on power

For the same α\alpha, a one-tailed test has greater power than a two-tailed test against an alternative in the predicted direction, because the critical value is closer to the null value. However, a one-tailed test has zero power to detect an effect in the opposite direction.


9. Binomial Tests with Normal Approximation

9.1 When to use the normal approximation

When nn is sufficiently large, the binomial distribution B(n,p)B(n, p) can be approximated by a normal distribution. The standard conditions are:

np>5andn(1p)>5np \gt{} 5 \quad \mathrm{and} \quad n(1 - p) \gt{} 5

Under these conditions:

XN(np,np(1p))X \approx N(np, np(1-p))

Equivalently, for the sample proportion p^=X/n\hat{p} = X/n:

p^N ⁣(p,p(1p)n)\hat{p} \approx N\!\left(p, \dfrac{p(1-p)}{n}\right)

warning

warning H0H_0), not the observed sample proportion p^\hat{p}.

9.2 Continuity correction

Since the binomial distribution is discrete and the normal distribution is continuous, a continuity correction improves the accuracy of the approximation:

  • For P(Xk)P(X \leq k), use P ⁣(ZLBk+0.5npRB◆◆LBnp(1p)RB)P\!\left(Z \leq \dfrac◆LB◆k + 0.5 - np◆RB◆◆LB◆\sqrt{np(1-p)}◆RB◆\right).
  • For P(Xk)P(X \geq k), use P ⁣(ZLBk0.5npRB◆◆LBnp(1p)RB)P\!\left(Z \geq \dfrac◆LB◆k - 0.5 - np◆RB◆◆LB◆\sqrt{np(1-p)}◆RB◆\right).
  • For P(X=k)P(X = k), use P(k0.5<X<k+0.5)P(k - 0.5 \lt{} X \lt{} k + 0.5) in the normal.

9.3 Worked example

Example. Historically, 40% of students at a school take the bus. In a survey of 120 students, 58 take the bus. Test at the 5% level whether the proportion has changed.

XB(120,p)X \sim B(120, p). H0:p=0.4H_0: p = 0.4, H1:p0.4H_1: p \neq 0.4. Two-tailed, α=0.05\alpha = 0.05.

Check conditions using p0=0.4p_0 = 0.4: np0=120×0.4=48>5np_0 = 120 \times 0.4 = 48 \gt{} 5 and n(1p0)=120×0.6=72>5n(1 - p_0) = 120 \times 0.6 = 72 \gt{} 5. Conditions satisfied.

Under H0H_0: XN(48,28.8)X \approx N(48, 28.8), so σ=28.85.367\sigma = \sqrt{28.8} \approx 5.367.

Using continuity correction:

z=580.5485.367=9.55.367=1.770z = \dfrac{58 - 0.5 - 48}{5.367} = \dfrac{9.5}{5.367} = 1.770

Two-tailed critical values: ±1.96\pm 1.96. Since 1.770<1.96|1.770| \lt{} 1.96, do not reject H0H_0.

There is insufficient evidence at the 5% level that the proportion of bus users has changed.


10. Confidence Intervals

10.1 Definition

A confidence interval gives a range of plausible values for a population parameter, together with a specified level of confidence.

Definition. A 100(1α)%100(1 - \alpha)\% confidence interval for a parameter θ\theta is an interval (L,U)(L, U) constructed from sample data such that, in repeated sampling, 100(1α)%100(1 - \alpha)\% of such intervals would contain the true value of θ\theta.

warning

A 95% confidence interval does not mean there is a 95% probability that θ\theta lies in the interval. The parameter θ\theta is fixed; it either is or is not in the interval. The 95% refers to the long-run proportion of intervals (across many repeated samples) that capture θ\theta.

10.2 95% confidence interval for a population proportion

For large nn where np^>5n\hat{p} \gt{} 5 and n(1p^)>5n(1 - \hat{p}) \gt{} 5, the sample proportion p^\hat{p} is approximately normal. The 100(1α)%100(1-\alpha)\% confidence interval for pp is:

p^±zα/2LBLBp^(1p^)RB◆◆LBnRB◆◆RB\hat{p} \pm z_{\alpha/2}\sqrt◆LB◆\dfrac◆LB◆\hat{p}(1 - \hat{p})◆RB◆◆LB◆n◆RB◆◆RB◆

For a 95% confidence interval, zα/2=1.96z_{\alpha/2} = 1.96:

p^±1.96LBLBp^(1p^)RB◆◆LBnRB◆◆RB\hat{p} \pm 1.96\sqrt◆LB◆\dfrac◆LB◆\hat{p}(1 - \hat{p})◆RB◆◆LB◆n◆RB◆◆RB◆

The margin of error is 1.96LBp^(1p^)/nRB1.96\sqrt◆LB◆\hat{p}(1-\hat{p})/n◆RB◆, which decreases as nn increases.

10.3 Connection to hypothesis testing

There is a direct and important link between confidence intervals and two-tailed hypothesis tests:

  • A 100(1α)%100(1-\alpha)\% confidence interval contains exactly those values of p0p_0 that would not be rejected by a two-tailed test of H0:p=p0H_0: p = p_0 at level α\alpha.
  • If p0p_0 falls outside the confidence interval, then H0H_0 is rejected at level α\alpha.
  • If p0p_0 falls inside the confidence interval, then H0H_0 is not rejected at level α\alpha.

Example. Using the bus survey data: p^=58/1200.483\hat{p} = 58/120 \approx 0.483, n=120n = 120.

95%CI=0.483±1.96LBLB0.483×0.517RB◆◆LB120RB◆◆RB=0.483±1.96×0.045695\%\mathrm{ CI} = 0.483 \pm 1.96\sqrt◆LB◆\dfrac◆LB◆0.483 \times 0.517◆RB◆◆LB◆120◆RB◆◆RB◆ = 0.483 \pm 1.96 \times 0.0456

95%CI=0.483±0.0894=(0.394,0.573)95\%\mathrm{ CI} = 0.483 \pm 0.0894 = (0.394, 0.573)

Since p0=0.4p_0 = 0.4 lies inside (0.394,0.573)(0.394, 0.573), we do not reject H0:p=0.4H_0: p = 0.4 at the 5% level. This is consistent with the hypothesis test result in Section 9.3.


11. Interpreting p-Values

11.1 Formal definition

Definition. The pp-value is the probability of obtaining a test statistic at least as extreme as the observed value, assuming H0H_0 is true.

pvalue=P(teststatisticobservedH0true)p\mathrm{-value} = P(\mathrm{test statistic} \geq \mathrm{observed} \mid H_0 \mathrm{ true})

For a two-tailed test, "at least as extreme" means at least as far from the null value in either direction, so the pp-value is doubled.

11.2 Decision rule

  • If pvalue<αp\mathrm{-value} \lt{} \alpha: reject H0H_0. The result is statistically significant.
  • If pvalueαp\mathrm{-value} \geq \alpha: do not reject H0H_0. The result is not statistically significant.

11.3 Strength of evidence

The smaller the pp-value, the stronger the evidence against H0H_0:

pp-value rangeStrength of evidence against H0H_0
p0.10p \geq 0.10Little to no evidence
0.05p<0.100.05 \leq p \lt{} 0.10Weak evidence
0.01p<0.050.01 \leq p \lt{} 0.05Moderate evidence
0.001p<0.010.001 \leq p \lt{} 0.01Strong evidence
p<0.001p \lt{} 0.001Very strong evidence

11.4 Common misinterpretations

warning
  • The pp-value is not the probability that H0H_0 is true.
  • The pp-value is not the probability that the observed result occurred by chance.
  • A large pp-value does not prove H0H_0 is true; it only means the data is consistent with H0H_0.
  • Statistical significance does not imply practical or scientific importance.
  • The pp-value depends on sample size: with a very large sample, even trivially small effects can produce tiny pp-values.

11.5 Worked example

Example. A factory produces components with mean length 50 mm. A sample of 40 components gives xˉ=50.8\bar{x} = 50.8 mm. Given σ=3\sigma = 3 mm, find the pp-value for testing H0:μ=50H_0: \mu = 50 vs H1:μ50H_1: \mu \neq 50.

Under H0H_0: XˉN(50,32/40)=N(50,0.225)\bar{X} \sim N(50, 3^2/40) = N(50, 0.225).

z=LB50.850RB◆◆LB0.225RB=0.80.4743=1.687z = \dfrac◆LB◆50.8 - 50◆RB◆◆LB◆\sqrt{0.225}◆RB◆ = \dfrac{0.8}{0.4743} = 1.687

pvalue=2×P(Z>1.687)=2×(10.9542)=0.0916p\mathrm{-value} = 2 \times P(Z \gt{} 1.687) = 2 \times (1 - 0.9542) = 0.0916

Since 0.0916>0.050.0916 \gt{} 0.05, we do not reject H0H_0 at the 5% level.

Interpretation: If the true mean were 50 mm, there would be approximately a 9.2% chance of observing a sample mean at least as far from 50 mm as 50.8 mm. This is not unusual enough to provide convincing evidence against H0H_0.


Problem Set

Details

Problem 1 A die is rolled 60 times and a 6 appears 16 times. Test at the 5% level whether the die is biased.

Details

Solution 1 XB(60,p)X \sim B(60, p). H0:p=1/6H_0: p = 1/6, H1:p1/6H_1: p \neq 1/6. Two-tailed, α=0.05\alpha = 0.05.

Under H0H_0: XB(60,1/6)X \sim B(60, 1/6). μ=10\mu = 10, σ=LB60×16×56RB=50/62.887\sigma = \sqrt◆LB◆60 \times \frac{1}{6} \times \frac{5}{6}◆RB◆ = \sqrt{50/6} \approx 2.887.

Using normal approximation: z=16102.887=2.078z = \dfrac{16 - 10}{2.887} = 2.078.

Two-tailed: critical values ±1.96\pm 1.96. 2.078>1.96|2.078| \gt{} 1.96, so reject H0H_0.

There is evidence at the 5% level that the die is biased.

If you get this wrong, revise: Binomial Hypothesis Tests — Section 5.

Details

Problem 2 A manufacturer claims that 90% of their products pass quality control. In a sample of 200, 170 pass. Test the claim at the 5% significance level.

Details

Solution 2 XB(200,p)X \sim B(200, p). H0:p=0.9H_0: p = 0.9, H1:p<0.9H_1: p \lt{} 0.9. Left-tailed, α=0.05\alpha = 0.05.

p^=170/200=0.85\hat{p} = 170/200 = 0.85.

Under H0H_0: p^N(0.9,LB0.9×0.1RB◆◆LB200RB)=N(0.9,0.00045)\hat{p} \sim N(0.9, \frac◆LB◆0.9 \times 0.1◆RB◆◆LB◆200◆RB◆) = N(0.9, 0.00045).

z=LB0.850.9RB◆◆LB0.00045RB=0.050.0212=2.358z = \dfrac◆LB◆0.85 - 0.9◆RB◆◆LB◆\sqrt{0.00045}◆RB◆ = \dfrac{-0.05}{0.0212} = -2.358.

Critical value: 1.645-1.645. Since 2.358<1.645-2.358 \lt{} -1.645, reject H0H_0.

There is evidence that the proportion passing quality control is less than 90%.

If you get this wrong, revise: Normal Hypothesis Tests — Section 6.

Details

Problem 3 Explain the difference between a Type I error and a Type II error in the context of medical testing.

Details

Solution 3 Type I error: The test says a healthy person is sick (false positive). This leads to unnecessary treatment and anxiety.

Type II error: The test says a sick person is healthy (false negative). This means the person goes untreated, potentially with serious consequences.

If you get this wrong, revise: Type I and Type II Errors — Section 3.

Details

Problem 4 Find the critical region for a test of H0:p=0.3H_0: p = 0.3 vs H1:p>0.3H_1: p \gt{} 0.3 using XB(10,p)X \sim B(10, p) at the 5% level.

Details

Solution 4 Under H0H_0: XB(10,0.3)X \sim B(10, 0.3).

P(X6)=1P(X5)=10.9527=0.0473<0.05P(X \geq 6) = 1 - P(X \leq 5) = 1 - 0.9527 = 0.0473 \lt{} 0.05. P(X5)=1P(X4)=10.8497=0.1503>0.05P(X \geq 5) = 1 - P(X \leq 4) = 1 - 0.8497 = 0.1503 \gt{} 0.05.

Critical region: X6X \geq 6. Actual significance level: 4.73%.

If you get this wrong, revise: Critical Region — Section 2.3.

Details

Problem 5 The mean lifetime of a bulb is claimed to be 1000 hours. A sample of 50 bulbs gives xˉ=985\bar{x} = 985 hours with s=40s = 40 hours. Test at the 1% level whether the mean lifetime is less than 1000 hours.

Details

Solution 5 H0:μ=1000H_0: \mu = 1000, H1:μ<1000H_1: \mu \lt{} 1000. α=0.01\alpha = 0.01.

XˉN(1000,402/50)=N(1000,32)\bar{X} \sim N(1000, 40^2/50) = N(1000, 32) approximately.

z=LB9851000RB◆◆LB32RB=155.657=2.652z = \dfrac◆LB◆985 - 1000◆RB◆◆LB◆\sqrt{32}◆RB◆ = \dfrac{-15}{5.657} = -2.652.

Critical value at 1%: 2.326-2.326. Since 2.652<2.326-2.652 \lt{} -2.326, reject H0H_0.

There is evidence at the 1% level that the mean lifetime is less than 1000 hours.

If you get this wrong, revise: Normal Hypothesis Tests — Section 6.

Details

Problem 6 For XB(20,0.5)X \sim B(20, 0.5), find the critical region for a two-tailed test at the 10% significance level.

Details

Solution 6 Under H0H_0: XB(20,0.5)X \sim B(20, 0.5).

For each tail, we need P(XcL)0.05P(X \leq c_L) \leq 0.05 and P(XcU)0.05P(X \geq c_U) \leq 0.05.

Lower: P(X5)0.02070.05P(X \leq 5) \approx 0.0207 \leq 0.05, P(X6)0.0577>0.05P(X \leq 6) \approx 0.0577 \gt{} 0.05. So cL=5c_L = 5. Upper: P(X15)0.02070.05P(X \geq 15) \approx 0.0207 \leq 0.05, P(X14)0.0577>0.05P(X \geq 14) \approx 0.0577 \gt{} 0.05. So cU=15c_U = 15.

Critical region: X5X \leq 5 or X15X \geq 15. Actual significance level: 2×0.0207=0.04142 \times 0.0207 = 0.0414.

If you get this wrong, revise: Critical Values and Significance Levels — Section 2.

Details

Problem 7 A teacher claims that the average score on a test is 70%. In a class of 25, the mean score is 66% with standard deviation 12%. Test at the 5% level.

Details

Solution 7 H0:μ=70H_0: \mu = 70, H1:μ70H_1: \mu \neq 70. Two-tailed, α=0.05\alpha = 0.05.

XˉN(70,122/25)=N(70,5.76)\bar{X} \sim N(70, 12^2/25) = N(70, 5.76) approximately.

z=LB6670RB◆◆LB5.76RB=42.4=1.667z = \dfrac◆LB◆66 - 70◆RB◆◆LB◆\sqrt{5.76}◆RB◆ = \dfrac{-4}{2.4} = -1.667.

Two-tailed critical values: ±1.96\pm 1.96. 1.667<1.96|-1.667| \lt{} 1.96, so do not reject H0H_0.

There is insufficient evidence at the 5% level that the mean score differs from 70%.

If you get this wrong, revise: Normal Hypothesis Tests — Section 6.

Details

Problem 8 A drug is effective for 60% of patients. After a new treatment, 18 out of 25 patients are cured. Test whether the new treatment is more effective at the 5% level.

Details

Solution 8 XB(25,p)X \sim B(25, p). H0:p=0.6H_0: p = 0.6, H1:p>0.6H_1: p \gt{} 0.6. Right-tailed, α=0.05\alpha = 0.05.

Under H0H_0: XB(25,0.6)X \sim B(25, 0.6).

P(X19)=1P(X18)10.9264=0.0736>0.05P(X \geq 19) = 1 - P(X \leq 18) \approx 1 - 0.9264 = 0.0736 \gt{} 0.05. P(X20)=1P(X19)10.9773=0.0227<0.05P(X \geq 20) = 1 - P(X \leq 19) \approx 1 - 0.9773 = 0.0227 \lt{} 0.05.

Critical region: X20X \geq 20. Since X=18<20X = 18 \lt{} 20, do not reject H0H_0.

Insufficient evidence that the new treatment is more effective.

If you get this wrong, revise: Binomial Hypothesis Tests — Section 5.

Details

Problem 9 Explain why failing to reject H0H_0 does not mean H0H_0 is true.

Details

Solution 9 Failing to reject H0H_0 means the data is consistent with H0H_0 but does not prove it. The test may lack sufficient power to detect a real effect. For example, if a drug has a small but real benefit, a small sample may not detect it, leading us to fail to reject H0H_0 even though the drug is effective. The absence of evidence is not evidence of absence.

If you get this wrong, revise: Interpreting Results — Section 7.

Details

Problem 10 For a test of H0:μ=50H_0: \mu = 50 vs H1:μ>50H_1: \mu \gt{} 50 at the 5% level with σ=4\sigma = 4 and n=16n = 16, find the probability of a Type II error if the true mean is μ=52\mu = 52.

Details

Solution 10 Under H0H_0: XˉN(50,16/16)=N(50,1)\bar{X} \sim N(50, 16/16) = N(50, 1).

Critical value: P(Z>c)=0.05    c=50+1.645(1)=51.645P(Z \gt{} c) = 0.05 \implies c = 50 + 1.645(1) = 51.645.

Type II error = failing to reject H0H_0 when μ=52\mu = 52.

XˉN(52,1)\bar{X} \sim N(52, 1) under the true distribution.

P(Xˉ<51.645μ=52)=P ⁣(Z<51.645521)=P(Z<0.355)0.3613P(\bar{X} \lt{} 51.645 \mid \mu = 52) = P\!\left(Z \lt{} \dfrac{51.645 - 52}{1}\right) = P(Z \lt{} -0.355) \approx 0.3613.

So β0.361\beta \approx 0.361 and the power is 1β0.6391 - \beta \approx 0.639.

If you get this wrong, revise: Type I and Type II Errors — Section 3.

Details

Problem 11 A researcher tests whether a new drug changes recovery time. She uses a two-tailed test of H0:μ=μ0H_0: \mu = \mu_0 vs H1:μμ0H_1: \mu \neq \mu_0 at α=0.05\alpha = 0.05 and obtains z=1.85z = 1.85. (a) What is her conclusion? (b) If she had instead used a right-tailed test H1:μ>μ0H_1: \mu \gt{} \mu_0 at the same level, would her conclusion change? Explain.

Details

Solution 11 (a) Two-tailed test: critical values ±1.96\pm 1.96. 1.85=1.85<1.96|1.85| = 1.85 \lt{} 1.96, so do not reject H0H_0. There is insufficient evidence that recovery time has changed.

(b) One-tailed test: critical value 1.6451.645. Since 1.85>1.6451.85 \gt{} 1.645, we reject H0H_0. There is sufficient evidence that recovery time has increased.

The conclusion changes because a one-tailed test allocates the entire 5% significance level to one tail, making the critical value less extreme. This illustrates why the choice between one-tailed and two-tailed must be made before seeing the data.

If you get this wrong, revise: One-Tailed vs Two-Tailed Tests in Depth — Section 8.

Details

Problem 12 A survey of 200 households in a town finds that 45 regularly recycle. The national recycling rate is 20%. Test at the 5% level whether the recycling rate in this town differs from the national rate, using a normal approximation with continuity correction.

Details

Solution 12 XB(200,p)X \sim B(200, p). H0:p=0.20H_0: p = 0.20, H1:p0.20H_1: p \neq 0.20. Two-tailed, α=0.05\alpha = 0.05.

Check conditions using p0=0.20p_0 = 0.20: np0=200×0.20=40>5np_0 = 200 \times 0.20 = 40 \gt{} 5 and n(1p0)=160>5n(1 - p_0) = 160 \gt{} 5. Conditions satisfied.

Under H0H_0: XN(40,200×0.20×0.80)=N(40,32)X \approx N(40, 200 \times 0.20 \times 0.80) = N(40, 32), σ=325.657\sigma = \sqrt{32} \approx 5.657.

Using continuity correction:

z=450.5405.657=4.55.657=0.796z = \dfrac{45 - 0.5 - 40}{5.657} = \dfrac{4.5}{5.657} = 0.796

Two-tailed critical values: ±1.96\pm 1.96. 0.796<1.96|0.796| \lt{} 1.96, so do not reject H0H_0.

There is insufficient evidence at the 5% level that the recycling rate differs from 20%.

If you get this wrong, revise: Binomial Tests with Normal Approximation — Section 9.

Details

Problem 13 In a random sample of 150 voters, 87 support a new policy. (a) Construct a 95% confidence interval for the true proportion of support. (b) Since the interval does not contain 0.5, a politician claims "a majority of voters support the policy." Is this claim justified?

Details

Solution 13 (a) p^=87/150=0.58\hat{p} = 87/150 = 0.58.

Check: np^=150×0.58=87>5n\hat{p} = 150 \times 0.58 = 87 \gt{} 5 and n(1p^)=150×0.42=63>5n(1 - \hat{p}) = 150 \times 0.42 = 63 \gt{} 5.

95%CI=0.58±1.96LBLB0.58×0.42RB◆◆LB150RB◆◆RB=0.58±1.96×0.040395\%\mathrm{ CI} = 0.58 \pm 1.96\sqrt◆LB◆\dfrac◆LB◆0.58 \times 0.42◆RB◆◆LB◆150◆RB◆◆RB◆ = 0.58 \pm 1.96 \times 0.0403

95%CI=0.58±0.0790=(0.501,0.659)95\%\mathrm{ CI} = 0.58 \pm 0.0790 = (0.501, 0.659)

(b) The 95% CI is (0.501,0.659)(0.501, 0.659). Since the entire interval lies above 0.5, we can reject H0:p=0.5H_0: p = 0.5 at the 5% level. However, the lower bound is only 0.501, so the evidence for a majority is borderline. The claim is technically supported by the test, but the narrow margin should be communicated carefully.

If you get this wrong, revise: Confidence Intervals — Section 10.

Details

Problem 14 A 95% confidence interval for a population mean is (48.2,53.8)(48.2, 53.8). State whether H0H_0 would be rejected or not rejected at the 5% level for each of the following null values: (a) μ0=50\mu_0 = 50, (b) μ0=47\mu_0 = 47, (c) μ0=54\mu_0 = 54. Justify using the connection between confidence intervals and hypothesis tests.

Details

Solution 14 A 95% confidence interval contains exactly those values of μ0\mu_0 that would not be rejected by a two-tailed test at the 5% level.

(a) μ0=50\mu_0 = 50: 50(48.2,53.8)50 \in (48.2, 53.8), so do not reject H0H_0. (b) μ0=47\mu_0 = 47: 47(48.2,53.8)47 \notin (48.2, 53.8), so reject H0H_0. (c) μ0=54\mu_0 = 54: 54(48.2,53.8)54 \notin (48.2, 53.8), so reject H0H_0.

If you get this wrong, revise: Confidence Intervals — Section 10.

Details

Problem 15 A sample of 35 students has mean score 62.4 with known population standard deviation σ=8\sigma = 8. (a) Find the pp-value for testing H0:μ=60H_0: \mu = 60 vs H1:μ>60H_1: \mu \gt{} 60. (b) State your conclusion at the 5% significance level and interpret the pp-value.

Details

Solution 15 (a) Under H0H_0: XˉN(60,82/35)=N(60,1.829)\bar{X} \sim N(60, 8^2/35) = N(60, 1.829).

z=LB62.460RB◆◆LB1.829RB=2.41.352=1.775z = \dfrac◆LB◆62.4 - 60◆RB◆◆LB◆\sqrt{1.829}◆RB◆ = \dfrac{2.4}{1.352} = 1.775

pvalue=P(Z>1.775)=10.9620=0.0380p\mathrm{-value} = P(Z \gt{} 1.775) = 1 - 0.9620 = 0.0380

(b) Since 0.038<0.050.038 \lt{} 0.05, reject H0H_0 at the 5% level. There is sufficient evidence that the true mean score exceeds 60. The pp-value of 0.038 means that if the true mean were 60, there would be a 3.8% chance of observing a sample mean of 62.4 or higher. This provides moderate evidence against H0H_0.

If you get this wrong, revise: Interpreting p-Values — Section 11.

Details

Problem 16 For a test of H0:μ=100H_0: \mu = 100 vs H1:μ>100H_1: \mu \gt{} 100 with σ=15\sigma = 15, n=25n = 25, and α=0.05\alpha = 0.05: (a) Find the critical value in terms of xˉ\bar{x}. (b) Find the probability of a Type II error and the power of the test if the true mean is μ=108\mu = 108. (c) How would the power change if α\alpha were increased to 0.10?

Details

Solution 16 (a) Under H0H_0: XˉN(100,152/25)=N(100,9)\bar{X} \sim N(100, 15^2/25) = N(100, 9), so σXˉ=3\sigma_{\bar{X}} = 3.

Critical value: c=100+1.645×3=104.935c = 100 + 1.645 \times 3 = 104.935. Reject H0H_0 if Xˉ>104.935\bar{X} \gt{} 104.935.

(b) Type II error when μ=108\mu = 108: P(Xˉ104.935μ=108)P(\bar{X} \leq 104.935 \mid \mu = 108).

XˉN(108,9)\bar{X} \sim N(108, 9) under the true distribution.

P ⁣(Z104.9351083)=P(Z1.022)0.153P\!\left(Z \leq \dfrac{104.935 - 108}{3}\right) = P(Z \leq -1.022) \approx 0.153

So β0.153\beta \approx 0.153 and power =10.153=0.847= 1 - 0.153 = 0.847.

(c) If α=0.10\alpha = 0.10, the critical value becomes c=100+1.282×3=103.846c = 100 + 1.282 \times 3 = 103.846.

βnew=P ⁣(Z103.8461083)=P(Z1.385)0.083\beta_{\mathrm{new}} = P\!\left(Z \leq \dfrac{103.846 - 108}{3}\right) = P(Z \leq -1.385) \approx 0.083

Power =10.083=0.917= 1 - 0.083 = 0.917. Increasing α\alpha from 0.05 to 0.10 increases the power (from 0.847 to 0.917) but also increases the probability of a Type I error. This illustrates the trade-off between Type I and Type II errors.

If you get this wrong, revise: Type I and Type II Errors — Section 3.

:::

:::

:::


tip

Diagnostic Test Ready to test your understanding of Hypothesis Testing? The diagnostic test contains the hardest questions within the A-Level specification for this topic, each with a full worked solution.

Unit tests probe edge cases and common misconceptions. Integration tests combine Hypothesis Testing with other topics to test synthesis under exam conditions.

See Diagnostic Guide for instructions on self-marking and building a personal test matrix.