Chi-Squared Tests
Chi-Squared Tests
The chi-squared test is a non-parametric statistical test used to determine whether observed data deviates significantly from expected values. It has two main applications: testing goodness of fit (to a theoretical distribution) and testing for independence (between two categorical variables).
Board Coverage
| Board | Paper | Notes |
|---|---|---|
| AQA | Paper 2 | Goodness of fit and contingency tables |
| Edexcel | S3 | Goodness of fit and test for independence |
| OCR (A) | Paper 2 | Both applications covered |
| CIE (9231) | S2 | Goodness of fit; independence with tables |
on percentages or proportions. Always check the conditions (expected frequency ) before applying the test. The formula booklet provides the chi-squared distribution table. :::
1. The Chi-Squared Distribution
1.1 Definition
Definition. If are independent standard normal random variables, then
follows a chi-squared distribution with degrees of freedom, written .
1.2 Properties
- The distribution is defined only for
- It is positively skewed, becoming more symmetric as increases
- As , the distribution approaches a normal distribution
- The distribution is additive: if and are independent, then
1.3 Critical values
Critical values are found from chi-squared tables. For a test at significance level with degrees of freedom, the critical value satisfies:
2. Goodness of Fit Test
2.1 Hypotheses
- : The data follows the specified distribution
- : The data does not follow the specified distribution
2.2 Test statistic
where is the observed frequency and is the expected frequency for category .
2.3 Degrees of freedom
where is the number of categories and is the number of parameters estimated from the data.
- If no parameters are estimated:
- If the mean of a Poisson is estimated from the data:
- If both mean and standard deviation of a normal are estimated:
2.4 Conditions
For the chi-squared approximation to be valid:
- Expected frequencies should be for each category
- If any expected frequency is , merge adjacent categories before carrying out the test
- The observations must be independent
2.5 Yates' correction (continuity correction)
For a contingency table with small expected frequencies, Yates' correction adjusts the test statistic:
This correction makes the test more conservative (less likely to reject ).
2.6 Worked example: Poisson goodness of fit
Example. Over 100 days, the number of accidents per day at a factory was recorded:
| Accidents () | 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|---|
| Days () | 38 | 32 | 18 | 8 | 3 | 1 |
Test at the 5% level whether the data follows a Poisson distribution.
Step 1: Estimate from the data:
Step 2: Calculate expected frequencies using :
Step 3: Merge categories so all . Merge :
| 0 | 1 | 2 | ||
|---|---|---|---|---|
| 38 | 32 | 18 | 12 | |
| 34.99 | 36.74 | 19.29 | 8.98 |
Step 4: Calculate the test statistic:
Step 5: Degrees of freedom: (4 categories, 1 parameter estimated).
Step 6: Critical value: .
Since , do not reject .
There is insufficient evidence to suggest the data does not follow a Poisson distribution.
3. Test for Independence
3.1 Contingency tables
Definition. A contingency table (or two-way table) displays the frequency distribution of two categorical variables.
3.2 Hypotheses
- : The two variables are independent
- : The two variables are not independent
3.3 Expected frequencies
For a contingency table with entries (row , column ), the expected frequency is:
3.4 Test statistic
3.5 Degrees of freedom
where is the number of rows and is the number of columns.
3.6 Worked example
Example. A survey of 200 people records their age group and preferred news source:
| TV | Online | Newspaper | Row total | |
|---|---|---|---|---|
| Under 30 | 20 | 60 | 10 | 90 |
| 30 to 50 | 30 | 25 | 15 | 70 |
| Over 50 | 20 | 5 | 15 | 40 |
| Col total | 70 | 90 | 40 | 200 |
Test at the 5% level whether age group and preferred news source are independent.
Expected frequencies:
, ,
, ,
, ,
All expected frequencies , so the test is valid.
Degrees of freedom: .
Critical value: .
Since , reject .
There is strong evidence that age group and preferred news source are not independent.
4. Chi-Squared Test Procedure Summary
- State and
- Calculate expected frequencies
- Check that all expected frequencies are (merge categories if necessary)
- Compute the test statistic
- Determine the degrees of freedom
- Compare with the critical value at the given significance level
- Conclude in context
frequencies. The test relies on the multinomial distribution, which requires count data. :::
Problems
Details
Problem 1
A die is rolled 60 times. The observed frequencies are: 1: 8, 2: 12, 3: 9, 4: 11, 5: 13, 6: 7. Test at the 5% level whether the die is fair.Details
Solution 1
: The die is fair. : The die is not fair.Expected frequency for each face: .
.
. Critical value: .
: do not reject . No evidence the die is biased.
If you get this wrong, revise: Goodness of Fit Test — Section 2.
Details
Problem 2
In a contingency table, the observed frequencies are: Row 1: 30, 20; Row 2: 15, 35. Test at the 5% level whether the two variables are independent.Details
Solution 2
Row totals: 50, 50. Column totals: 45, 55. Grand total: 100.Expected: , , , .
.
. Critical value: .
: reject . The variables are not independent.
If you get this wrong, revise: Test for Independence — Section 3.
Details
Problem 3
Explain why the chi-squared test statistic uses rather than simply .Details
Solution 3
The sum always, since (both sum to the total number of observations). This provides no information about the discrepancy between observed and expected.Squaring removes the sign, and dividing by standardises the contribution of each category. Categories with larger expected frequencies naturally have larger absolute deviations, so dividing by gives each category appropriate weight. This leads to a test statistic whose distribution under is approximately .
If you get this wrong, revise: Test statistic — Section 2.2.
Details
Problem 4
The number of emails received per day was recorded over 80 days: 0: 15, 1: 25, 2: 20, 3: 12, 4: 5, : 3. Test at the 5% level whether the data follows a Poisson distribution.Details
Solution 4
Estimate : .Expected (Po(1.7)): , , , , , .
Merge : , .
After merging: categories 0, 1, 2, 3, with : 15, 25, 20, 12, 8 and : 14.62, 24.85, 21.12, 11.97, 7.45.
.
. Critical value: .
: reject . Evidence the data does not follow Poisson.
If you get this wrong, revise: Worked example: Poisson goodness of fit — Section 2.6.
Details
Problem 5
A contingency table has . State the degrees of freedom and determine whether is rejected at the 5% level.Details
Solution 5
.Critical value: .
Since , reject .
If you get this wrong, revise: Degrees of freedom — Section 3.5.
Details
Problem 6
A table has observed frequencies: Row 1: 40, 60; Row 2: 55, 45. Apply Yates' correction and compare with the uncorrected statistic.Details
Solution 6
Row totals: 100, 100. Column totals: 95, 105. Grand total: 200.Expected: .
Uncorrected: .
Yates' corrected: .
. Critical value: . Both reject , but Yates' gives a more conservative result.
If you get this wrong, revise: Yates' correction — Section 2.5.
Details
Problem 7
A uniform distribution is fitted to data with 5 categories and expected frequency 20 per category. The observed frequencies are 15, 22, 18, 25, 20. Test at the 1% level.Details
Solution 7
: Uniform distribution. : Not uniform.All .
.
. Critical value at 1%: .
: do not reject .
If you get this wrong, revise: Goodness of Fit Test — Section 2.
Details
Problem 8
In a test for independence with a contingency table, the test statistic is . Test at the 5% level.Details
Solution 8
.Critical value: .
: reject . There is evidence that the variables are associated.
If you get this wrong, revise: Degrees of freedom — Section 3.5.
Details
Problem 9
State three conditions that must be satisfied before carrying out a chi-squared test, and explain the consequence of violating each.Solution 9
-
Expected frequencies : Violating this makes the approximation to the true distribution inaccurate, increasing the risk of Type I errors. Remedy: merge adjacent categories.
-
Independence of observations: Violating this means the test assumes a multinomial model that does not apply, invalidating the result. Remedy: ensure the sampling method produces independent observations.
-
Sufficiently large sample: With very small total samples, even large relative discrepancies can produce non-significant results. The test has low power. Remedy: increase sample size.
If you get this wrong, revise: Conditions — Section 2.4.
Details
Problem 10
Data is fitted to a normal distribution with mean and standard deviation both estimated from the data. There are 8 categories. The test statistic is . Determine the degrees of freedom and test at the 5% level.Details
Solution 10
Two parameters estimated (mean and standard deviation), so .Critical value: .
: do not reject . There is insufficient evidence that the data does not follow a normal distribution.
If you get this wrong, revise: Degrees of freedom — Section 2.3.
5. Yates' Correction: When and Why
5.1 The problem with small tables
The chi-squared distribution is a continuous approximation to the discrete multinomial distribution. For tables (1 degree of freedom), this approximation is poor when expected frequencies are small. The uncorrected chi-squared test tends to reject too often (it is too liberal).
Yates' correction adjusts each term by subtracting 0.5 from the absolute difference before squaring:
This reduces the test statistic, making it harder to reject .
5.2 When to apply Yates' correction
- Apply it to contingency tables
- It is most important when the total sample size is small (typically ) or when any expected frequency is below 10
- Some exam boards require it for all tables; check the specific mark scheme
- Do not apply it to tables larger than
5.3 Limitations
Yates' correction can be overly conservative — it may fail to detect a real association. For very small samples, Fisher's exact test is preferred (but this is beyond the A-Level syllabus).
6. Worked Examples
6.1 Goodness of fit: dice fairness
Example. A die is rolled 120 times with the following results:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Obs | 25 | 18 | 20 | 22 | 15 | 20 |
Test at the 5% level whether the die is fair.
: The die is fair (uniform distribution). : The die is not fair.
Expected: for all faces.
. Critical value: .
Since , do not reject . There is insufficient evidence that the die is biased.
6.2 Goodness of fit: genetic ratios
Example. In a genetics experiment, 200 plants are expected to show a 9:3:3:1 phenotypic ratio. The observed counts are 115, 38, 30, 17. Test at the 5% level.
: The 9:3:3:1 ratio holds. : The ratio does not hold.
Expected: , , , .
All .
. Critical value: .
Since , do not reject . The data is consistent with the 9:3:3:1 ratio.
6.3 Test for independence: smoking and disease
Example. A study of 300 adults records smoking status and whether they have a respiratory disease:
| Disease | No disease | Row total | |
|---|---|---|---|
| Smoker | 45 | 55 | 100 |
| Non-smoker | 30 | 170 | 200 |
| Column total | 75 | 225 | 300 |
Test at the 1% level whether smoking status and respiratory disease are independent.
: Smoking and disease are independent. : They are not independent.
Expected frequencies:
, .
, .
All .
. Critical value at 1%: .
Since , reject at the 1% level. There is very strong evidence that smoking status and respiratory disease are associated.
With Yates' correction:
Still highly significant ().
7. Degrees of Freedom: Systematic Calculation
7.1 Goodness of fit
| Distribution fitted | Parameters estimated | formula |
|---|---|---|
| Uniform (known) | 0 | |
| Binomial (known , known ) | 0 | |
| Binomial (known , estimate ) | 1 | |
| Poisson (estimate ) | 1 | |
| Normal (estimate , ) | 2 |
7.2 Test for independence
| Table size | |
|---|---|
| 1 | |
| 2 | |
| 4 | |
| 6 | |
| 12 |
7.3 Intuition for degrees of freedom
The degrees of freedom represent the number of independent pieces of information in the data, after accounting for constraints. In a contingency table:
- Each row total is fixed, so each row has one fewer free value
- Each column total is fixed, so each column has one fewer free value
- The grand total is automatically determined
This gives free cells.
8. Interpretation: What "Significant" Means
8.1 In plain language
When we reject at the 5% level, we are saying:
"If the null hypothesis were true, there would be less than a 5% chance of obtaining a test statistic at least as extreme as the one observed."
This is not the same as saying is false with 95% probability. It is a statement about the probability of the data given the hypothesis, not the probability of the hypothesis given the data.
8.2 Common misinterpretations
| Statement | Correct? | Why |
|---|---|---|
| "There is a 5% chance the null hypothesis is true" | No | This confuses with |
| "The probability of getting this result by chance is 5%" | Approximately | More precisely: the probability of getting a result at least this extreme by chance is 5% |
| "We have proved the alternative hypothesis" | No | We have only found evidence against ; the alternative could still be wrong |
| "A significant result means the effect is practically important" | Not necessarily | With a very large sample, even tiny deviations become significant |
8.3 Context matters
A significant chi-squared test tells you the observed data is unlikely under , but it does not tell you how the data differs or whether the difference is meaningful. Always inspect the observed vs expected frequencies to understand the nature of any discrepancy.
9. Relationship to the Normal Approximation
9.1 Chi-squared as a sum of squared normals
The fundamental connection: if (1 degree of freedom), then where .
This means , i.e., the square root of a chi-squared statistic with 1 df follows a half-normal distribution.
9.2 tables and the normal approximation
For a table, the chi-squared test is equivalent to a two-proportion -test. If and are the sample proportions:
and is the pooled proportion.
9.3 Large degrees of freedom
As increases, approaches . This means for large tables:
This approximation is useful when chi-squared tables do not list the required value.
10. Common Pitfalls
Using percentages instead of frequencies
The chi-squared test requires raw count data. If you are given percentages, you must convert back to frequencies using the sample size. Using percentages directly produces a test statistic that is off by a factor of and gives completely wrong -values.
Wrong degrees of freedom
For goodness of fit, forgetting to subtract the number of estimated parameters is the most common error. For independence tests, using instead of will overestimate the degrees of freedom and make the test too liberal.
Small expected values
If any expected frequency is below 5, the chi-squared approximation breaks down. The remedy is to merge adjacent categories before computing the test statistic. Do not simply discard categories — this loses information and biases the result.
Not checking all expected frequencies
After merging categories to fix one small expected value, you must recheck all remaining expected values. The merge may create new expected values below 5.
Merging non-adjacent categories
When merging categories for a goodness of fit, merge categories that are logically adjacent (e.g., "4" and "" in a Poisson fit). Merging non-adjacent categories (e.g., "0" and "5") destroys the structure of the distribution and makes the test invalid.
Confusing one-tailed and two-tailed
The chi-squared test is inherently one-tailed (right-tailed only). Large values of indicate discrepancy from . Small values (close to 0) indicate good fit and are not significant. There is no such thing as a "left-tailed" chi-squared test.
11. Problem Set
Q1. A die is rolled 240 times. The observed frequencies are 1: 52, 2: 38, 3: 40, 4: 36, 5: 44, 6: 30. Test at the 5% level whether the die is fair, and identify which face(s) contribute most to the test statistic.
: Fair die. : Not fair.
for all faces.
.
. Critical value: .
: do not reject .
Contributions: face 1 contributes , face 6 contributes . These two faces account for out of (87% of the statistic).
Q2. The number of customers arriving at a shop per hour was recorded over 120 hours: 0: 12, 1: 30, 2: 35, 3: 25, 4: 12, : 6. Test at the 5% level whether the data follows a Poisson distribution.
: Poisson. : Not Poisson.
.
Expected using Po(1.983):
Merge : , . All .
. Critical value: .
: do not reject .
Q3. A survey of 400 adults records education level and voting preference. Test at the 5% level whether the two variables are independent.
| Party A | Party B | Party C | Non-voter | Total | |
|---|---|---|---|---|---|
| No qualifications | 20 | 25 | 10 | 45 | 100 |
| A-levels | 30 | 40 | 25 | 30 | 125 |
| Degree | 45 | 30 | 40 | 10 | 125 |
| Postgraduate | 20 | 15 | 15 | 0 | 50 |
| Total | 115 | 110 | 90 | 85 | 400 |
: Independent. : Not independent.
Expected: .
. All .
, , , . , , , . , , , . , , , .
Key contributions: , , , , , .
.
. Critical value: .
: reject . Strong evidence that education level and voting preference are associated.
Q4. Explain why merging categories changes the degrees of freedom, and calculate the new df when a 6-category Poisson goodness of fit test (with estimated) has its last 3 categories merged into one.
Originally: (6 categories, 1 parameter estimated).
After merging the last 3 into 1: we now have 4 categories total (first 3 individual + 1 merged).
New .
Merging reduces the number of categories, which reduces the degrees of freedom. This makes the test slightly more conservative (harder to reject ) because the critical value is smaller for fewer df, but the merged categories also tend to reduce the test statistic.
Q5. A table has observed frequencies: Row 1: 10, 40; Row 2: 30, 20. Apply Yates' correction and test at the 5% level. Compare with the uncorrected test.
Row totals: 50, 50. Column totals: 40, 60. Grand total: 100.
Expected: , , , .
Uncorrected:
.
Yates' corrected:
.
. Critical value: .
Both reject . The corrected value (15.04) is smaller than the uncorrected (16.67), as expected.
Q6. A normal distribution is fitted to 200 observations with mean and standard deviation estimated from the data. The expected and observed frequencies for 7 categories are calculated, with all . The test statistic is . Test at the 5% level and explain your choice of degrees of freedom.
Two parameters estimated ( and ), so .
: Data follows a normal distribution. : Data does not follow a normal distribution.
Critical value: .
: do not reject . Insufficient evidence to conclude the data is non-normal.
The degrees of freedom calculation accounts for the fact that estimating parameters from the data makes the fit appear better than it truly is. Each estimated parameter reduces the df by 1 because it uses up one piece of information from the data.
8. Advanced Worked Examples
Example 8.1: Goodness-of-fit test with merging of classes
Problem. A die is rolled 120 times. The observed frequencies are:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed | 25 | 17 | 15 | 23 | 18 | 22 |
Test at the 5% level whether the die is fair.
Solution. : die is fair (). Expected: for each face.
All , so no merging needed.
. Critical value at 5%: .
: do not reject . There is insufficient evidence to conclude the die is biased.
Example 8.2: Test for independence in a 3×3 contingency table
Problem. 300 people are classified by hair colour and eye colour:
| Blue | Brown | Green | |
|---|---|---|---|
| Blonde | 40 | 20 | 10 |
| Brown | 30 | 60 | 20 |
| Black | 10 | 40 | 70 |
Test at the 1% level whether hair colour and eye colour are independent.
Solution. : hair colour and eye colour are independent.
Row totals: Blonde 70, Brown 110, Black 120. Column totals: Blue 80, Brown 120, Green 100. Grand total: 300.
Expected values: .
, , , , , , , , .
.
. Critical value at 1%: .
: reject . Very strong evidence that hair colour and eye colour are associated.
Example 8.3: Yates' continuity correction for a 2×2 table
Problem. In a drug trial, 40 out of 100 patients on drug A recovered, and 55 out of 100 on drug B recovered. Test at 5% whether the recovery rates differ, using Yates' correction.
Solution. : recovery rate is the same for both drugs.
| Recovered | Not recovered | |
|---|---|---|
| Drug A | 40 | 60 |
| Drug B | 55 | 45 |
With Yates' correction:
, .
.
. Critical value at 5%: .
: reject . Significant difference in recovery rates.
Example 8.4: Determining degrees of freedom with estimated parameters
Problem. Data is tested against a normal distribution with mean and variance estimated from the data. The data is grouped into 8 classes. One class has expected frequency 3 and is merged with an adjacent class. State the degrees of freedom.
Solution. Original classes: 8. After merging: 7 classes.
Restrictions: total frequency (1), estimated mean (1), estimated variance (1). Total restrictions: 3.
Example 8.5: Chi-squared test for a geometric distribution
Problem. Customers arrive at a till. The number of customers served before the first complaint is recorded over 200 shifts:
| Count | 0 | 1 | 2 | 3 | |
|---|---|---|---|---|---|
| Observed | 90 | 60 | 30 | 12 | 8 |
Test at 5% whether the data follows a geometric distribution.
Solution. : data follows .
.
For : .
Expected: .
, , , , .
All , so no merging needed.
.
(5 classes, 1 for total, 1 for estimated ). Critical value at 5%: .
: do not reject .
Example 8.6: Interpreting a very small expected frequency
Problem. A 2×5 contingency table has several expected frequencies below 5. What action should be taken?
Solution. Adjacent rows or columns should be merged to ensure all expected frequencies are at least 5. The degrees of freedom must be recalculated based on the new table dimensions. If merging destroys the structure of the test (e.g., merging distinct categories that have different meanings), Fisher's exact test should be used instead.
Example 8.7: Calculating the chi-squared statistic from raw proportions
Problem. In a survey of 500 people, 60% prefer tea in the North and 45% prefer tea in the South. There are 300 Northerners and 200 Southerners. Test at 5% whether preference differs by region.
Solution. : no association between region and preference.
Observed table:
| Tea | No Tea | |
|---|---|---|
| North | 180 | 120 |
| South | 90 | 110 |
Expected: , , , .
. Critical value at 5%: .
: reject . Significant association between region and tea preference.
9. Common Pitfalls
| Pitfall | Correct Approach |
|---|---|
| Using observed frequencies instead of expected in the denominator | , not |
| Forgetting to merge classes with | Always check expected frequencies first; merge adjacent classes |
| Miscounting degrees of freedom | for independence; for goodness-of-fit with estimated parameters |
| Applying Yates' correction to tables larger than 2×2 | Yates' correction is only for 2×2 contingency tables |
10. Additional Exam-Style Questions
Question 8
A tetrahedral die is rolled 200 times. The observed frequencies for faces 1--4 are 38, 62, 55, 45. Test at the 10% level whether the die is fair.
Solution
: fair (). .
.
. Critical value at 10%: .
: reject . The die appears biased at the 10% level.
Question 9
Explain why the chi-squared test is an approximate test and describe when it may not be appropriate.
Solution
The chi-squared distribution is a continuous approximation to the discrete distribution of the test statistic. The approximation is poor when:
- Expected frequencies are small (typically ), as the continuous approximation breaks down.
- The total sample size is very small.
- The number of classes is very large relative to the sample size.
In these cases, Fisher's exact test or exact multinomial methods should be used instead.
Question 10
In a test of independence on a 4×3 contingency table, the calculated statistic is 18.7. Determine whether to reject at the 5% significance level.
Solution
. Critical value at 5%: .
: reject . There is significant evidence of an association between the row and column variables.
11. Connections to Other Topics
11.1 Chi-squared tests and Poisson/geometric distributions
Goodness-of-fit tests are commonly used to test whether data follows a Poisson or geometric distribution. See Poisson and Geometric Distributions.
11.2 Chi-squared and continuous distributions
The chi-squared distribution itself is used in confidence intervals for variance. See Exponential and Continuous Random Variables.
11.3 Chi-squared and probability
Hypothesis testing relies on understanding significance levels, -values, and Type I/II errors.
12. Key Results Summary
| Test Type | Degrees of Freedom | Conditions |
|---|---|---|
| Goodness-of-fit | All , = estimated parameters | |
| Independence | All | |
| Yates' correction | Only for 2×2 tables |
| Step | Action |
|---|---|
| 1 | State and |
| 2 | Calculate expected frequencies |
| 3 | Merge classes if any |
| 4 | Compute |
| 5 | Determine degrees of freedom |
| 6 | Compare with critical value or find -value |
| 7 | Conclude in context |
13. Further Exam-Style Questions
Question 11
A teacher believes that grades in a class follow a specific distribution: 10% A, 30% B, 40% C, 20% D. In a sample of 200 students, the observed frequencies are: A: 15, B: 70, C: 80, D: 35. Test at the 5% level.
Solution
: grades follow the specified distribution.
Expected: A: 20, B: 60, C: 80, D: 40. All .
.
. Critical value at 5%: .
: do not reject . The data is consistent with the teacher's belief.
Question 12
Explain the difference between a Type I error and a Type II error in the context of a chi-squared test.
Solution
Type I error: Rejecting when is true (false positive). The probability is the significance level .
Type II error: Failing to reject when is false (false negative). The probability depends on the true distribution and sample size; it is denoted , and is the power of the test.
14. Advanced Topics
14.1 The chi-squared distribution
The chi-squared distribution with degrees of freedom is the distribution of where are independent.
Key properties:
- Mean:
- Variance:
- Additivity: if and are independent, then
14.2 Chi-squared confidence intervals for variance
For a sample of size from , the quantity .
A confidence interval for is:
14.3 Relationship to other tests
The chi-squared test is related to:
- The -test (log-likelihood ratio test), which uses
- Fisher's exact test for small samples
- The -test for proportions (for 2×2 tables, )
15. Further Exam-Style Questions
Question 13
A 3×2 contingency table yields . At what significance levels would be rejected?
Solution
.
Critical values: 1% level: , 5% level: , 10% level: .
: would not be rejected at the 10% level (or any conventional level).
The -value is slightly above 10%.
Question 14
Explain why merging classes in a chi-squared test reduces the degrees of freedom and may reduce the power of the test.
Solution
Merging reduces the number of classes , which reduces . Fewer degrees of freedom means the critical value is lower, making it easier to reject , but merging also discards information about the differences between the merged classes. If the true deviation from is in the merged classes, the test loses the ability to detect it, reducing power.
Question 15
A goodness-of-fit test of a normal distribution uses 10 classes with mean and variance estimated from the data. The calculated . Test at the 5% level.
Solution
(10 classes, 1 for total, 2 estimated parameters).
Critical value at 5%: .
: reject . There is sufficient evidence to conclude the data does not follow a normal distribution.
16. Further Advanced Topics
16.1 The chi-squared distribution properties
- is the distribution of where i.i.d.
- Mean , Variance
- For large : (by CLT)
- Additivity: (independent)
16.2 The -test (log-likelihood ratio)
An alternative to the chi-squared test using:
For large samples, .
16.3 Fisher's exact test
For 2×2 tables with small expected frequencies:
This gives the exact -value without approximation.
16.4 Post-hoc analysis
After rejecting in a goodness-of-fit test, standardised residuals identify which classes contribute most:
Values with indicate significant deviations.
17. Further Exam-Style Questions
Question 16
In a goodness-of-fit test with 8 classes, 1 parameter estimated, and , find the approximate -value.
Solution
.
From chi-squared tables: , .
, so .
The -value is approximately 0.08.
Question 17
Explain when it is appropriate to use Fisher's exact test instead of the chi-squared test.
Solution
Fisher's exact test should be used when:
- The sample size is small (typically for 2×2 tables)
- Expected frequencies are less than 5 and cannot be fixed by merging
- An exact -value is required rather than an approximation
- The table is 2×2 (for larger tables, Fisher's test becomes computationally expensive)
18. Further Exam-Style Questions
Question 19
A researcher tests whether a die is fair. In 120 rolls, the observed frequencies are:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Freq | 15 | 22 | 18 | 25 | 20 | 20 |
Carry out a chi-squared goodness-of-fit test at the 5% significance level.
Solution
: Die is fair. : Die is not fair.
Expected frequency for each face: .
.
. Critical value at 5%: .
, so we do not reject . There is insufficient evidence to suggest the die is unfair.