Statistical Distributions Board Coverage
Board Paper Notes AQA Paper 1, 2 Binomial and normal in P1; Poisson in P2 Edexcel P1, P2 Similar OCR (A) Paper 1, 2 Binomial in P1; normal and Poisson in P2 CIE (9709) P1, P6 Binomial in P1; normal and Poisson in P6
The formula booklet gives the probability mass function for the Binomial and Poisson
distributions, and the normal distribution function. You must know when to use each distribution and
how to find probabilities.
1. Discrete Random Variables
1.1 Definition
Definition. A discrete random variable X X X takes values from a countable set with
probabilities P ( X = x i ) = p i P(X = x_i) = p_i P ( X = x i ) = p i satisfying:
p i ≥ 0 p_i \geq 0 p i ≥ 0 for all i i i
∑ i p i = 1 \sum_i p_i = 1 ∑ i p i = 1
1.2 Expectation and variance
E ( X ) = μ = ∑ x i p i E(X) = \mu = \sum x_i\,p_i E ( X ) = μ = ∑ x i p i
V a r ( X ) = σ 2 = E ( X 2 ) − [ E ( X ) ] 2 = ∑ x i 2 p i − μ 2 \mathrm{Var}(X) = \sigma^2 = E(X^2) - [E(X)]^2 = \sum x_i^2\,p_i - \mu^2 Var ( X ) = σ 2 = E ( X 2 ) − [ E ( X ) ] 2 = ∑ x i 2 p i − μ 2
2. The Binomial Distribution
2.1 Derivation from Bernoulli trials
A Bernoulli trial is an experiment with exactly two outcomes: success (probability p p p ) and
failure (probability 1 − p 1-p 1 − p ).
If we perform n n n independent Bernoulli trials, the number of successes X X X follows a Binomial
distribution : X ∼ B ( n , p ) X \sim B(n, p) X ∼ B ( n , p ) .
Derivation of the PMF. Each sequence of k k k successes and n − k n-k n − k failures has probability
p k ( 1 − p ) n − k p^k(1-p)^{n-k} p k ( 1 − p ) n − k . The number of such sequences is ( n k ) \binom{n}{k} ( k n ) (choosing which k k k of the n n n
trials are successes). Therefore:
P ( X = k ) = ( n k ) p k ( 1 − p ) n − k , k = 0 , 1 , … , n P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k = 0, 1, \ldots, n P ( X = k ) = ( k n ) p k ( 1 − p ) n − k , k = 0 , 1 , … , n
2.2 Proof that E ( X ) = n p E(X) = np E ( X ) = n p
Proof. Let X i X_i X i be the indicator variable for the i i i -th trial: X i = 1 X_i = 1 X i = 1 if success, 0 0 0 if
failure.
X = X 1 + X 2 + ⋯ + X n X = X_1 + X_2 + \cdots + X_n X = X 1 + X 2 + ⋯ + X n .
E ( X i ) = 1 ⋅ p + 0 ⋅ ( 1 − p ) = p E(X_i) = 1 \cdot p + 0 \cdot (1-p) = p E ( X i ) = 1 ⋅ p + 0 ⋅ ( 1 − p ) = p .
By linearity of expectation: E ( X ) = ∑ E ( X i ) = n p E(X) = \sum E(X_i) = np E ( X ) = ∑ E ( X i ) = n p . ■ \blacksquare ■
2.3 Proof that V a r ( X ) = n p ( 1 − p ) \mathrm{Var}(X) = np(1-p) Var ( X ) = n p ( 1 − p )
Proof. E ( X i 2 ) = 1 2 ⋅ p + 0 2 ⋅ ( 1 − p ) = p E(X_i^2) = 1^2 \cdot p + 0^2 \cdot (1-p) = p E ( X i 2 ) = 1 2 ⋅ p + 0 2 ⋅ ( 1 − p ) = p .
V a r ( X i ) = E ( X i 2 ) − [ E ( X i ) ] 2 = p − p 2 = p ( 1 − p ) \mathrm{Var}(X_i) = E(X_i^2) - [E(X_i)]^2 = p - p^2 = p(1-p) Var ( X i ) = E ( X i 2 ) − [ E ( X i ) ] 2 = p − p 2 = p ( 1 − p ) .
Since the X i X_i X i are independent: V a r ( X ) = ∑ V a r ( X i ) = n p ( 1 − p ) \mathrm{Var}(X) = \sum \mathrm{Var}(X_i) = np(1-p) Var ( X ) = ∑ Var ( X i ) = n p ( 1 − p ) .
■ \blacksquare ■
2.4 Properties
The distribution is symmetric when p = 0.5 p = 0.5 p = 0.5 .
It is skewed left when p > 0.5 p \gt{} 0.5 p > 0.5 and skewed right when p < 0.5 p \lt{} 0.5 p < 0.5 .
The mode is at ⌊ ( n + 1 ) p ⌋ \lfloor(n+1)p\rfloor ⌊( n + 1 ) p ⌋ .
2.5 Direct derivation of E ( X ) = n p E(X) = np E ( X ) = n p from the PMF
The proofs in Sections 2.2 and 2.3 use indicator variables. Here we derive the same results directly
from the probability mass function using algebraic identities.
Proof. Starting from the definition of expectation applied to the binomial PMF:
E ( X ) = ∑ k = 0 n k ( n k ) p k ( 1 − p ) n − k E(X) = \sum_{k=0}^{n} k \binom{n}{k}p^k(1-p)^{n-k} E ( X ) = ∑ k = 0 n k ( k n ) p k ( 1 − p ) n − k
The k = 0 k=0 k = 0 term vanishes, so begin the sum at k = 1 k=1 k = 1 . Apply the identity
k ( n k ) = n ( n − 1 k − 1 ) k\binom{n}{k} = n\binom{n-1}{k-1} k ( k n ) = n ( k − 1 n − 1 ) :
E ( X ) = ∑ k = 1 n n ( n − 1 k − 1 ) p k ( 1 − p ) n − k = n p ∑ k = 1 n ( n − 1 k − 1 ) p k − 1 ( 1 − p ) ( n − 1 ) − ( k − 1 ) E(X) = \sum_{k=1}^{n} n\binom{n-1}{k-1}p^k(1-p)^{n-k} = np\sum_{k=1}^{n}\binom{n-1}{k-1}p^{k-1}(1-p)^{(n-1)-(k-1)} E ( X ) = ∑ k = 1 n n ( k − 1 n − 1 ) p k ( 1 − p ) n − k = n p ∑ k = 1 n ( k − 1 n − 1 ) p k − 1 ( 1 − p ) ( n − 1 ) − ( k − 1 )
Substitute j = k − 1 j = k - 1 j = k − 1 :
E ( X ) = n p ∑ j = 0 n − 1 ( n − 1 j ) p j ( 1 − p ) n − 1 − j E(X) = np\sum_{j=0}^{n-1}\binom{n-1}{j}p^j(1-p)^{n-1-j} E ( X ) = n p ∑ j = 0 n − 1 ( j n − 1 ) p j ( 1 − p ) n − 1 − j
By the binomial theorem, ∑ j = 0 n − 1 ( n − 1 j ) p j ( 1 − p ) n − 1 − j = [ p + ( 1 − p ) ] n − 1 = 1 \sum_{j=0}^{n-1}\binom{n-1}{j}p^j(1-p)^{n-1-j} = [p + (1-p)]^{n-1} = 1 ∑ j = 0 n − 1 ( j n − 1 ) p j ( 1 − p ) n − 1 − j = [ p + ( 1 − p ) ] n − 1 = 1 .
Therefore E ( X ) = n p E(X) = np E ( X ) = n p . ■ \blacksquare ■
2.6 Direct derivation of V a r ( X ) = n p ( 1 − p ) \mathrm{Var}(X) = np(1-p) Var ( X ) = n p ( 1 − p ) from the PMF
Proof. First compute E ( X ( X − 1 ) ) E(X(X-1)) E ( X ( X − 1 )) :
E ( X ( X − 1 ) ) = ∑ k = 0 n k ( k − 1 ) ( n k ) p k ( 1 − p ) n − k E(X(X-1)) = \sum_{k=0}^{n} k(k-1)\binom{n}{k}p^k(1-p)^{n-k} E ( X ( X − 1 )) = ∑ k = 0 n k ( k − 1 ) ( k n ) p k ( 1 − p ) n − k
Terms with k = 0 , 1 k = 0, 1 k = 0 , 1 are zero. Apply the identity k ( k − 1 ) ( n k ) = n ( n − 1 ) ( n − 2 k − 2 ) k(k-1)\binom{n}{k} = n(n-1)\binom{n-2}{k-2} k ( k − 1 ) ( k n ) = n ( n − 1 ) ( k − 2 n − 2 ) :
E ( X ( X − 1 ) ) = ∑ k = 2 n n ( n − 1 ) ( n − 2 k − 2 ) p k ( 1 − p ) n − k = n ( n − 1 ) p 2 ∑ k = 2 n ( n − 2 k − 2 ) p k − 2 ( 1 − p ) ( n − 2 ) − ( k − 2 ) E(X(X-1)) = \sum_{k=2}^{n} n(n-1)\binom{n-2}{k-2}p^k(1-p)^{n-k} = n(n-1)p^2\sum_{k=2}^{n}\binom{n-2}{k-2}p^{k-2}(1-p)^{(n-2)-(k-2)} E ( X ( X − 1 )) = ∑ k = 2 n n ( n − 1 ) ( k − 2 n − 2 ) p k ( 1 − p ) n − k = n ( n − 1 ) p 2 ∑ k = 2 n ( k − 2 n − 2 ) p k − 2 ( 1 − p ) ( n − 2 ) − ( k − 2 )
Substitute j = k − 2 j = k - 2 j = k − 2 :
E ( X ( X − 1 ) ) = n ( n − 1 ) p 2 ∑ j = 0 n − 2 ( n − 2 j ) p j ( 1 − p ) n − 2 − j = n ( n − 1 ) p 2 E(X(X-1)) = n(n-1)p^2\sum_{j=0}^{n-2}\binom{n-2}{j}p^j(1-p)^{n-2-j} = n(n-1)p^2 E ( X ( X − 1 )) = n ( n − 1 ) p 2 ∑ j = 0 n − 2 ( j n − 2 ) p j ( 1 − p ) n − 2 − j = n ( n − 1 ) p 2
The final equality follows from the binomial theorem:
∑ j = 0 n − 2 ( n − 2 j ) p j ( 1 − p ) n − 2 − j = 1 \sum_{j=0}^{n-2}\binom{n-2}{j}p^j(1-p)^{n-2-j} = 1 ∑ j = 0 n − 2 ( j n − 2 ) p j ( 1 − p ) n − 2 − j = 1 .
Now E ( X 2 ) = E ( X ( X − 1 ) ) + E ( X ) = n ( n − 1 ) p 2 + n p E(X^2) = E(X(X-1)) + E(X) = n(n-1)p^2 + np E ( X 2 ) = E ( X ( X − 1 )) + E ( X ) = n ( n − 1 ) p 2 + n p .
V a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2 = n ( n − 1 ) p 2 + n p − n 2 p 2 = n p − n p 2 = n p ( 1 − p ) ■ \mathrm{Var}(X) = E(X^2) - [E(X)]^2 = n(n-1)p^2 + np - n^2p^2 = np - np^2 = np(1-p) \quad \blacksquare Var ( X ) = E ( X 2 ) − [ E ( X ) ] 2 = n ( n − 1 ) p 2 + n p − n 2 p 2 = n p − n p 2 = n p ( 1 − p ) ■
3. The Normal Distribution
3.1 Motivation from the Central Limit Theorem
The Central Limit Theorem (CLT) states that the sum (or mean) of a large number of independent,
identically distributed random variables is approximately normally distributed, regardless of the
original distribution.
This is why the normal distribution appears so widely in nature: any quantity that is the sum of
many small independent effects (height, measurement error, etc.) will be approximately normal.
3.2 Definition
X ∼ N ( μ , σ 2 ) X \sim N(\mu, \sigma^2) X ∼ N ( μ , σ 2 ) has PDF
f ( x ) = ◆ L B ◆ 1 ◆ R B ◆◆ L B ◆ σ ◆ L B ◆ 2 π ◆ R B ◆◆ R B ◆ e − ◆ L B ◆ ( x − μ ) 2 ◆ R B ◆◆ L B ◆ 2 σ 2 ◆ R B ◆ f(x) = \frac◆LB◆1◆RB◆◆LB◆\sigma\sqrt◆LB◆2\pi◆RB◆◆RB◆\,e^{-\frac◆LB◆(x-\mu)^2◆RB◆◆LB◆2\sigma^2◆RB◆} f ( x ) = L ◆ B ◆1◆ R B ◆◆ L B ◆ σ ◆ L B ◆2 π ◆ R B ◆◆ R B ◆ e − L ◆ B ◆ ( x − μ ) 2 ◆ R B ◆◆ L B ◆2 σ 2 ◆ R B ◆
3.3 Properties
Bell-shaped, symmetric about μ \mu μ .
E ( X ) = μ E(X) = \mu E ( X ) = μ , V a r ( X ) = σ 2 \mathrm{Var}(X) = \sigma^2 Var ( X ) = σ 2 .
Approximately 68% of data within μ ± σ \mu \pm \sigma μ ± σ , 95% within μ ± 2 σ \mu \pm 2\sigma μ ± 2 σ , 99.7% within
μ ± 3 σ \mu \pm 3\sigma μ ± 3 σ .
3.4 Standard normal
If X ∼ N ( μ , σ 2 ) X \sim N(\mu, \sigma^2) X ∼ N ( μ , σ 2 ) , then Z = ◆ L B ◆ X − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ∼ N ( 0 , 1 ) Z = \dfrac◆LB◆X - \mu◆RB◆◆LB◆\sigma◆RB◆ \sim N(0, 1) Z = L ◆ B ◆ X − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ∼ N ( 0 , 1 ) .
Probabilities are found using the standard normal table or a calculator's inverse normal function.
3.5 Finding probabilities
P ( a < X < b ) = P ( ◆ L B ◆ a − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ < Z < ◆ L B ◆ b − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ) = Φ ( ◆ L B ◆ b − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ) − Φ ( ◆ L B ◆ a − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ) P(a < X < b) = P\!\left(\frac◆LB◆a-\mu◆RB◆◆LB◆\sigma◆RB◆ < Z < \frac◆LB◆b-\mu◆RB◆◆LB◆\sigma◆RB◆\right) = \Phi\!\left(\frac◆LB◆b-\mu◆RB◆◆LB◆\sigma◆RB◆\right) - \Phi\!\left(\frac◆LB◆a-\mu◆RB◆◆LB◆\sigma◆RB◆\right) P ( a < X < b ) = P ( L ◆ B ◆ a − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ < Z < L ◆ B ◆ b − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ) = Φ ( L ◆ B ◆ b − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ ) − Φ ( L ◆ B ◆ a − μ ◆ R B ◆◆ L B ◆ σ ◆ R B ◆ )
3.6 Normal approximation to Binomial
For large n n n with n p > 5 np \gt{} 5 n p > 5 and n ( 1 − p ) > 5 n(1-p) \gt{} 5 n ( 1 − p ) > 5 :
B ( n , p ) ≈ N ( n p , n p ( 1 − p ) ) B(n, p) \approx N(np, np(1-p)) B ( n , p ) ≈ N ( n p , n p ( 1 − p ))
with continuity correction :
P ( X ≤ k ) ≈ P ( Z < ◆ L B ◆ k + 0.5 − n p ◆ R B ◆◆ L B ◆ n p ( 1 − p ) ◆ R B ◆ ) P(X \leq k) \approx P\!\left(Z \lt{} \frac◆LB◆k + 0.5 - np◆RB◆◆LB◆\sqrt{np(1-p)}◆RB◆\right) P ( X ≤ k ) ≈ P ( Z < L ◆ B ◆ k + 0.5 − n p ◆ R B ◆◆ L B ◆ n p ( 1 − p ) ◆ R B ◆ ) .
warning
(Binomial) with a continuous one (Normal). Add or subtract 0.5 depending on the inequality
direction.
4. The Poisson Distribution
4.1 Definition
X ∼ P o ( λ ) X \sim \mathrm{Po}(\lambda) X ∼ Po ( λ ) models the number of events in a fixed interval when events occur
independently at a constant average rate λ \lambda λ .
P ( X = k ) = ◆ L B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ , k = 0 , 1 , 2 , … P(X = k) = \frac◆LB◆e^{-\lambda}\lambda^k◆RB◆◆LB◆k!◆RB◆, \quad k = 0, 1, 2, \ldots P ( X = k ) = L ◆ B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ , k = 0 , 1 , 2 , …
4.2 Derivation as a limit of the Binomial
Theorem. If n → ∞ n \to \infty n → ∞ and p → 0 p \to 0 p → 0 such that n p = λ np = \lambda n p = λ remains constant, then
B ( n , p ) → P o ( λ ) B(n, p) \to \mathrm{Po}(\lambda) B ( n , p ) → Po ( λ ) .
Proof.
P ( X = k ) = ( n k ) p k ( 1 − p ) n − k = ◆ L B ◆ n ( n − 1 ) ⋯ ( n − k + 1 ) ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ ⋅ ◆ L B ◆ λ k ◆ R B ◆◆ L B ◆ n k ◆ R B ◆ ⋅ ( 1 − ◆ L B ◆ λ ◆ R B ◆◆ L B ◆ n ◆ R B ◆ ) n − k \begin{aligned}
P(X = k) &= \binom{n}{k}p^k(1-p)^{n-k} \\
&= \frac◆LB◆n(n-1)\cdots(n-k+1)◆RB◆◆LB◆k!◆RB◆ \cdot \frac◆LB◆\lambda^k◆RB◆◆LB◆n^k◆RB◆ \cdot \left(1-\frac◆LB◆\lambda◆RB◆◆LB◆n◆RB◆\right)^{n-k}
\end{aligned} P ( X = k ) = ( k n ) p k ( 1 − p ) n − k = L ◆ B ◆ n ( n − 1 ) ⋯ ( n − k + 1 ) ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ ⋅ L ◆ B ◆ λ k ◆ R B ◆◆ L B ◆ n k ◆ R B ◆ ⋅ ( 1 − L ◆ B ◆ λ ◆ R B ◆◆ L B ◆ n ◆ R B ◆ ) n − k
Consider each factor as n → ∞ n \to \infty n → ∞ :
◆ L B ◆ n ( n − 1 ) ⋯ ( n − k + 1 ) ◆ R B ◆◆ L B ◆ n k ◆ R B ◆ → 1 \dfrac◆LB◆n(n-1)\cdots(n-k+1)◆RB◆◆LB◆n^k◆RB◆ \to 1 L ◆ B ◆ n ( n − 1 ) ⋯ ( n − k + 1 ) ◆ R B ◆◆ L B ◆ n k ◆ R B ◆ → 1 (each term n − i ≈ n n-i \approx n n − i ≈ n )
( 1 − ◆ L B ◆ λ ◆ R B ◆◆ L B ◆ n ◆ R B ◆ ) n − k → e − λ \left(1 - \dfrac◆LB◆\lambda◆RB◆◆LB◆n◆RB◆\right)^{n-k} \to e^{-\lambda} ( 1 − L ◆ B ◆ λ ◆ R B ◆◆ L B ◆ n ◆ R B ◆ ) n − k → e − λ (using
lim n → ∞ ( 1 + a / n ) n = e a \lim_{n\to\infty}(1+a/n)^n = e^a lim n → ∞ ( 1 + a / n ) n = e a )
Therefore:
P ( X = k ) → 1 k ! ⋅ λ k ⋅ e − λ = ◆ L B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ ■ P(X = k) \to \frac{1}{k!} \cdot \lambda^k \cdot e^{-\lambda} = \frac◆LB◆e^{-\lambda}\lambda^k◆RB◆◆LB◆k!◆RB◆ \quad \blacksquare P ( X = k ) → k ! 1 ⋅ λ k ⋅ e − λ = L ◆ B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ ■
4.3 Proof that E ( X ) = λ E(X) = \lambda E ( X ) = λ
Proof.
E ( X ) = ∑ k = 0 ∞ k ⋅ ◆ L B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ = ∑ k = 1 ∞ ◆ L B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ ( k − 1 ) ! ◆ R B ◆ = λ e − λ ∑ k = 1 ∞ ◆ L B ◆ λ k − 1 ◆ R B ◆◆ L B ◆ ( k − 1 ) ! ◆ R B ◆ = λ e − λ ∑ j = 0 ∞ ◆ L B ◆ λ j ◆ R B ◆◆ L B ◆ j ! ◆ R B ◆ = λ e − λ ⋅ e λ = λ ■ \begin{aligned}
E(X) &= \sum_{k=0}^{\infty}k \cdot \frac◆LB◆e^{-\lambda}\lambda^k◆RB◆◆LB◆k!◆RB◆ = \sum_{k=1}^{\infty}\frac◆LB◆e^{-\lambda}\lambda^k◆RB◆◆LB◆(k-1)!◆RB◆ \\
&= \lambda e^{-\lambda}\sum_{k=1}^{\infty}\frac◆LB◆\lambda^{k-1}◆RB◆◆LB◆(k-1)!◆RB◆ = \lambda e^{-\lambda}\sum_{j=0}^{\infty}\frac◆LB◆\lambda^j◆RB◆◆LB◆j!◆RB◆ \\
&= \lambda e^{-\lambda} \cdot e^{\lambda} = \lambda \quad \blacksquare
\end{aligned} E ( X ) = k = 0 ∑ ∞ k ⋅ L ◆ B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ = k = 1 ∑ ∞ L ◆ B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ ( k − 1 )! ◆ R B ◆ = λ e − λ k = 1 ∑ ∞ L ◆ B ◆ λ k − 1 ◆ R B ◆◆ L B ◆ ( k − 1 )! ◆ R B ◆ = λ e − λ j = 0 ∑ ∞ L ◆ B ◆ λ j ◆ R B ◆◆ L B ◆ j ! ◆ R B ◆ = λ e − λ ⋅ e λ = λ ■
4.4 Proof that V a r ( X ) = λ \mathrm{Var}(X) = \lambda Var ( X ) = λ
Proof. First compute E ( X ( X − 1 ) ) E(X(X-1)) E ( X ( X − 1 )) :
E ( X ( X − 1 ) ) = ∑ k = 2 ∞ k ( k − 1 ) ◆ L B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ = ∑ k = 2 ∞ ◆ L B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ ( k − 2 ) ! ◆ R B ◆ = λ 2 e − λ ∑ j = 0 ∞ ◆ L B ◆ λ j ◆ R B ◆◆ L B ◆ j ! ◆ R B ◆ = λ 2 e − λ ⋅ e λ = λ 2 \begin{aligned}
E(X(X-1)) &= \sum_{k=2}^{\infty}k(k-1)\frac◆LB◆e^{-\lambda}\lambda^k◆RB◆◆LB◆k!◆RB◆ = \sum_{k=2}^{\infty}\frac◆LB◆e^{-\lambda}\lambda^k◆RB◆◆LB◆(k-2)!◆RB◆ \\
&= \lambda^2 e^{-\lambda}\sum_{j=0}^{\infty}\frac◆LB◆\lambda^j◆RB◆◆LB◆j!◆RB◆ = \lambda^2 e^{-\lambda} \cdot e^{\lambda} = \lambda^2
\end{aligned} E ( X ( X − 1 )) = k = 2 ∑ ∞ k ( k − 1 ) L ◆ B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ k ! ◆ R B ◆ = k = 2 ∑ ∞ L ◆ B ◆ e − λ λ k ◆ R B ◆◆ L B ◆ ( k − 2 )! ◆ R B ◆ = λ 2 e − λ j = 0 ∑ ∞ L ◆ B ◆ λ j ◆ R B ◆◆ L B ◆ j ! ◆ R B ◆ = λ 2 e − λ ⋅ e λ = λ 2
E ( X 2 ) = E ( X ( X − 1 ) ) + E ( X ) = λ 2 + λ E(X^2) = E(X(X-1)) + E(X) = \lambda^2 + \lambda E ( X 2 ) = E ( X ( X − 1 )) + E ( X ) = λ 2 + λ .
V a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2 = λ 2 + λ − λ 2 = λ \mathrm{Var}(X) = E(X^2) - [E(X)]^2 = \lambda^2 + \lambda - \lambda^2 = \lambda Var ( X ) = E ( X 2 ) − [ E ( X ) ] 2 = λ 2 + λ − λ 2 = λ . ■ \blacksquare ■
4.5 Additivity
If X ∼ P o ( λ ) X \sim \mathrm{Po}(\lambda) X ∼ Po ( λ ) and Y ∼ P o ( μ ) Y \sim \mathrm{Po}(\mu) Y ∼ Po ( μ ) are independent, then
X + Y ∼ P o ( λ + μ ) X + Y \sim \mathrm{Po}(\lambda + \mu) X + Y ∼ Po ( λ + μ ) .
4.6 Conditions for the Poisson model
The Poisson distribution is appropriate when all of the following hold:
Events occur independently of one another.
Events occur at a constant average rate λ \lambda λ in a fixed interval of time, space, or
volume.
The probability of more than one event occurring in a sufficiently small sub-interval is
negligible .
These are sometimes called the Poisson postulates . When they are satisfied, the number of events
in any interval of length t t t follows P o ( λ t ) \mathrm{Po}(\lambda t) Po ( λ t ) .
Typical applications include: calls arriving at a call centre per hour, typing errors per page,
radioactive decays per second, and cars passing a checkpoint per minute.
tip
constant over the interval and that events do not cluster. If events tend to occur in bursts, the
Poisson model is not appropriate.
4.7 Poisson approximation to the Binomial
Practical rule. When n > 50 n \gt{} 50 n > 50 and p < 0.1 p \lt{} 0.1 p < 0.1 , we may approximate B ( n , p ) B(n, p) B ( n , p ) by
P o ( λ ) \mathrm{Po}(\lambda) Po ( λ ) where λ = n p \lambda = np λ = n p .
Justification. The theoretical result in Section 4.2 shows that as n → ∞ n \to \infty n → ∞ and p → 0 p \to 0 p → 0
with n p = λ np = \lambda n p = λ held constant, the binomial PMF converges pointwise to the Poisson PMF. The
conditions n > 50 n \gt{} 50 n > 50 and p < 0.1 p \lt{} 0.1 p < 0.1 are practical thresholds that ensure:
n n n is large enough that the discrete binomial is well-approximated by a limit distribution.
p p p is small enough that the "rare event" assumption of the Poisson model is satisfied.
λ = n p \lambda = np λ = n p is moderate (typically 0 < λ < 10 0 \lt{} \lambda \lt{} 10 0 < λ < 10 ), so that neither
distribution is heavily concentrated at a single point.
The approximation improves as n n n increases and p p p decreases while λ = n p \lambda = np λ = n p remains fixed.
warning
and n n n is large, use the normal approximation (Section 3.6) instead. The two approximations are
complementary: Poisson handles the case of many trials with rare success, while normal handles the
case of many trials with moderate success probability.
5. Choosing the Right Distribution
Situation Distribution Fixed n n n trials, success/failure Binomial B ( n , p ) B(n,p) B ( n , p ) Events in continuous interval, rare events Poisson P o ( λ ) \mathrm{Po}(\lambda) Po ( λ ) Continuous, bell-shaped Normal N ( μ , σ 2 ) N(\mu,\sigma^2) N ( μ , σ 2 )
6. Coding of Random Variables
6.1 Definition
A coding (or linear transformation) of a discrete random variable X X X is a new random variable
Y = a X + b Y = aX + b Y = a X + b where a a a and b b b are constants with a ≠ 0 a \neq 0 a = 0 .
Coding arises naturally when changing units (e.g. centimetres to metres, or Celsius to Fahrenheit)
or when shifting and scaling a distribution.
6.2 Effect on expectation
Theorem. If Y = a X + b Y = aX + b Y = a X + b , then E ( Y ) = a E ( X ) + b E(Y) = aE(X) + b E ( Y ) = a E ( X ) + b .
Proof. Applying the definition of expectation to Y Y Y :
E ( Y ) = ∑ ( a x i + b ) p i = a ∑ x i p i + b ∑ p i = a E ( X ) + b ⋅ 1 = a E ( X ) + b ■ E(Y) = \sum (ax_i + b)\,p_i = a\sum x_i\,p_i + b\sum p_i = aE(X) + b \cdot 1 = aE(X) + b \quad \blacksquare E ( Y ) = ∑ ( a x i + b ) p i = a ∑ x i p i + b ∑ p i = a E ( X ) + b ⋅ 1 = a E ( X ) + b ■
The key step is ∑ p i = 1 \sum p_i = 1 ∑ p i = 1 , since the probabilities sum to 1.
6.3 Effect on variance
Theorem. If Y = a X + b Y = aX + b Y = a X + b , then V a r ( Y ) = a 2 V a r ( X ) \mathrm{Var}(Y) = a^2\mathrm{Var}(X) Var ( Y ) = a 2 Var ( X ) .
Proof.
V a r ( Y ) = E ( Y 2 ) − [ E ( Y ) ] 2 = E [ ( a X + b ) 2 ] − [ a E ( X ) + b ] 2 = E [ a 2 X 2 + 2 a b X + b 2 ] − a 2 [ E ( X ) ] 2 + 2 a b E ( X ) + b 2 = a 2 E ( X 2 ) + 2 a b E ( X ) + b 2 − a 2 [ E ( X ) ] 2 − 2 a b E ( X ) − b 2 = a 2 E ( X 2 ) − [ E ( X ) ] 2 = a 2 V a r ( X ) ■ \begin{aligned}
\mathrm{Var}(Y) &= E(Y^2) - [E(Y)]^2 \\
&= E[(aX + b)^2] - [aE(X) + b]^2 \\
&= E[a^2X^2 + 2abX + b^2] - \\{a^2[E(X)]^2 + 2abE(X) + b^2\\} \\
&= a^2E(X^2) + 2abE(X) + b^2 - a^2[E(X)]^2 - 2abE(X) - b^2 \\
&= a^2\\{E(X^2) - [E(X)]^2\\} \\
&= a^2\mathrm{Var}(X) \quad \blacksquare
\end{aligned} Var ( Y ) a 2 [ E ( X ) ] 2 + 2 ab E ( X ) + b 2 E ( X 2 ) − [ E ( X ) ] 2 = E ( Y 2 ) − [ E ( Y ) ] 2 = E [( a X + b ) 2 ] − [ a E ( X ) + b ] 2 = E [ a 2 X 2 + 2 ab X + b 2 ] − = a 2 E ( X 2 ) + 2 ab E ( X ) + b 2 − a 2 [ E ( X ) ] 2 − 2 ab E ( X ) − b 2 = a 2 = a 2 Var ( X ) ■
Note how the terms 2 a b E ( X ) 2abE(X) 2 ab E ( X ) and b 2 b^2 b 2 cancel between E ( Y 2 ) E(Y^2) E ( Y 2 ) and [ E ( Y ) ] 2 [E(Y)]^2 [ E ( Y ) ] 2 .
Adding a constant b b b (a location shift) has no effect on variance. Only multiplying by
a a a (a scale change) affects variance, and it does so by a factor of a 2 a^2 a 2 . This is why variance is
measured in squared units of the original variable.
6.4 Effect on standard deviation
Since V a r ( Y ) = a 2 V a r ( X ) \mathrm{Var}(Y) = a^2\mathrm{Var}(X) Var ( Y ) = a 2 Var ( X ) , taking square roots gives:
S D ( Y ) = ∣ a ∣ S D ( X ) \mathrm{SD}(Y) = |a|\,\mathrm{SD}(X) SD ( Y ) = ∣ a ∣ SD ( X )
The absolute value ensures the standard deviation remains non-negative regardless of the sign of
a a a .
Problem Set
Details
Problem 1
X ∼ B ( 10 , 0.3 ) X \sim B(10, 0.3) X ∼ B ( 10 , 0.3 ) . Find
P ( X = 4 ) P(X = 4) P ( X = 4 ) ,
P ( X ≤ 3 ) P(X \leq 3) P ( X ≤ 3 ) , and
P ( X ≥ 7 ) P(X \geq 7) P ( X ≥ 7 ) .
Details
Solution 1
P ( X = 4 ) = ( 10 4 ) ( 0.3 ) 4 ( 0.7 ) 6 = 210 × 0.0081 × 0.1176 ≈ 0.2001 P(X=4) = \binom{10}{4}(0.3)^4(0.7)^6 = 210 \times 0.0081 \times 0.1176 \approx 0.2001 P ( X = 4 ) = ( 4 10 ) ( 0.3 ) 4 ( 0.7 ) 6 = 210 × 0.0081 × 0.1176 ≈ 0.2001 .
P ( X ≤ 3 ) = P ( X = 0 ) + P ( X = 1 ) + P ( X = 2 ) + P ( X = 3 ) ≈ 0.0282 + 0.1211 + 0.2335 + 0.2668 ≈ 0.6496 P(X \leq 3) = P(X=0)+P(X=1)+P(X=2)+P(X=3) \approx 0.0282 + 0.1211 + 0.2335 + 0.2668 \approx 0.6496 P ( X ≤ 3 ) = P ( X = 0 ) + P ( X = 1 ) + P ( X = 2 ) + P ( X = 3 ) ≈ 0.0282 + 0.1211 + 0.2335 + 0.2668 ≈ 0.6496 .
P ( X ≥ 7 ) = P ( X = 7 ) + P ( X = 8 ) + P ( X = 9 ) + P ( X = 10 ) ≈ 0.0090 + 0.0014 + 0.0001 + 0.0000 ≈ 0.0106 P(X \geq 7) = P(X=7)+P(X=8)+P(X=9)+P(X=10) \approx 0.0090 + 0.0014 + 0.0001 + 0.0000 \approx 0.0106 P ( X ≥ 7 ) = P ( X = 7 ) + P ( X = 8 ) + P ( X = 9 ) + P ( X = 10 ) ≈ 0.0090 + 0.0014 + 0.0001 + 0.0000 ≈ 0.0106 .
If you get this wrong, revise: The Binomial Distribution —
Section 2.
Details
Problem 2
Heights of men are normally distributed with mean 175 cm and standard deviation 8 cm. Find the probability that a randomly chosen man is taller than 185 cm.
Details
Solution 2
X ∼ N ( 175 , 64 ) X \sim N(175, 64) X ∼ N ( 175 , 64 ) .
P ( X > 185 ) = P ( Z > 185 − 175 8 ) = P ( Z > 1.25 ) = 1 − Φ ( 1.25 ) ≈ 1 − 0.8944 = 0.1056 P(X \gt{} 185) = P\!\left(Z \gt{} \dfrac{185-175}{8}\right) = P(Z \gt{} 1.25) = 1 - \Phi(1.25) \approx 1 - 0.8944 = 0.1056 P ( X > 185 ) = P ( Z > 8 185 − 175 ) = P ( Z > 1.25 ) = 1 − Φ ( 1.25 ) ≈ 1 − 0.8944 = 0.1056 .
If you get this wrong, revise: The Normal Distribution —
Section 3.
Details
Problem 3
A call centre receives an average of 4.5 calls per minute. Find the probability of receiving exactly 6 calls in a given minute, and the probability of receiving more than 8 calls.
Details
Solution 3
X ∼ P o ( 4.5 ) X \sim \mathrm{Po}(4.5) X ∼ Po ( 4.5 ) .
P ( X = 6 ) = e − 4.5 ( 4.5 ) 6 6 ! = ◆ L B ◆ 0.01111 × 8303.77 ◆ R B ◆◆ L B ◆ 720 ◆ R B ◆ ≈ 0.1281 P(X=6) = \dfrac{e^{-4.5}(4.5)^6}{6!} = \dfrac◆LB◆0.01111 \times 8303.77◆RB◆◆LB◆720◆RB◆ \approx 0.1281 P ( X = 6 ) = 6 ! e − 4.5 ( 4.5 ) 6 = L ◆ B ◆0.01111 × 8303.77◆ R B ◆◆ L B ◆720◆ R B ◆ ≈ 0.1281 .
P ( X > 8 ) = 1 − P ( X ≤ 8 ) = 1 − ∑ k = 0 8 e − 4.5 ( 4.5 ) k k ! ≈ 1 − 0.9804 = 0.0196 P(X \gt{} 8) = 1 - P(X \leq 8) = 1 - \sum_{k=0}^{8}\dfrac{e^{-4.5}(4.5)^k}{k!} \approx 1 - 0.9804 = 0.0196 P ( X > 8 ) = 1 − P ( X ≤ 8 ) = 1 − ∑ k = 0 8 k ! e − 4.5 ( 4.5 ) k ≈ 1 − 0.9804 = 0.0196 .
If you get this wrong, revise: The Poisson Distribution —
Section 4.
Details
Problem 4
X ∼ B ( 100 , 0.04 ) X \sim B(100, 0.04) X ∼ B ( 100 , 0.04 ) . Use the Poisson approximation to find
P ( X ≤ 2 ) P(X \leq 2) P ( X ≤ 2 ) .
Details
Solution 4
λ = n p = 4 \lambda = np = 4 λ = n p = 4 .
X ≈ P o ( 4 ) X \approx \mathrm{Po}(4) X ≈ Po ( 4 ) .
P ( X ≤ 2 ) = e − 4 ( 1 + 4 + 16 2 ) = e − 4 ( 1 + 4 + 8 ) = 13 e − 4 ≈ 0.2381 P(X \leq 2) = e^{-4}\left(1 + 4 + \dfrac{16}{2}\right) = e^{-4}(1 + 4 + 8) = 13e^{-4} \approx 0.2381 P ( X ≤ 2 ) = e − 4 ( 1 + 4 + 2 16 ) = e − 4 ( 1 + 4 + 8 ) = 13 e − 4 ≈ 0.2381 .
If you get this wrong, revise:
Derivation as a Limit — Section 4.2.
Details
Problem 5
Find
c c c such that
P ( − c < Z < c ) = 0.95 P(-c \lt{} Z \lt{} c) = 0.95 P ( − c < Z < c ) = 0.95 where
Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) .
Details
Solution 5
P ( − c < Z < c ) = 2 Φ ( c ) − 1 = 0.95 ⟹ Φ ( c ) = 0.975 P(-c \lt{} Z \lt{} c) = 2\Phi(c) - 1 = 0.95 \implies \Phi(c) = 0.975 P ( − c < Z < c ) = 2Φ ( c ) − 1 = 0.95 ⟹ Φ ( c ) = 0.975 .
From tables: c ≈ 1.96 c \approx 1.96 c ≈ 1.96 .
If you get this wrong, revise: Standard Normal — Section 3.4.
Details
Problem 6
The number of emails received per hour follows
P o ( 12 ) \mathrm{Po}(12) Po ( 12 ) . Find the probability of receiving between 10 and 15 emails (inclusive) in a given hour.
Details
Solution 6
X ∼ P o ( 12 ) X \sim \mathrm{Po}(12) X ∼ Po ( 12 ) .
P ( 10 ≤ X ≤ 15 ) = P ( X ≤ 15 ) − P ( X ≤ 9 ) P(10 \leq X \leq 15) = P(X \leq 15) - P(X \leq 9) P ( 10 ≤ X ≤ 15 ) = P ( X ≤ 15 ) − P ( X ≤ 9 ) .
P ( X ≤ 15 ) ≈ 0.7728 P(X \leq 15) \approx 0.7728 P ( X ≤ 15 ) ≈ 0.7728 , P ( X ≤ 9 ) ≈ 0.2424 P(X \leq 9) \approx 0.2424 P ( X ≤ 9 ) ≈ 0.2424 .
P ( 10 ≤ X ≤ 15 ) ≈ 0.7728 − 0.2424 = 0.5304 P(10 \leq X \leq 15) \approx 0.7728 - 0.2424 = 0.5304 P ( 10 ≤ X ≤ 15 ) ≈ 0.7728 − 0.2424 = 0.5304 .
If you get this wrong, revise: The Poisson Distribution —
Section 4.
Details
Problem 7
A machine produces bolts with lengths
X ∼ N ( 50 , 0.04 ) X \sim N(50, 0.04) X ∼ N ( 50 , 0.04 ) cm. Bolts with length less than 49.7 cm or greater than 50.3 cm are rejected. Find the proportion of bolts rejected.
Details
Solution 7
σ = 0.04 = 0.2 \sigma = \sqrt{0.04} = 0.2 σ = 0.04 = 0.2 .
P ( X < 49.7 ) = P ( Z < ( 49.7 − 50 ) / 0.2 ) = P ( Z < − 1.5 ) = 0.0668 P(X \lt{} 49.7) = P(Z \lt{} (49.7-50)/0.2) = P(Z \lt{} -1.5) = 0.0668 P ( X < 49.7 ) = P ( Z < ( 49.7 − 50 ) /0.2 ) = P ( Z < − 1.5 ) = 0.0668 .
P ( X > 50.3 ) = P ( Z > 1.5 ) = 0.0668 P(X \gt{} 50.3) = P(Z \gt{} 1.5) = 0.0668 P ( X > 50.3 ) = P ( Z > 1.5 ) = 0.0668 .
Proportion rejected = 0.0668 + 0.0668 = 0.1336 = 0.0668 + 0.0668 = 0.1336 = 0.0668 + 0.0668 = 0.1336 (13.36%).
If you get this wrong, revise: Finding Probabilities — Section 3.5.
Details
Problem 8
Prove that
E ( a X + b ) = a E ( X ) + b E(aX + b) = aE(X) + b E ( a X + b ) = a E ( X ) + b and
V a r ( a X + b ) = a 2 V a r ( X ) \mathrm{Var}(aX + b) = a^2\mathrm{Var}(X) Var ( a X + b ) = a 2 Var ( X ) .
Details
Solution 8
E ( a X + b ) = ∑ ( a x i + b ) p i = a ∑ x i p i + b ∑ p i = a E ( X ) + b E(aX+b) = \sum(a x_i + b)p_i = a\sum x_i p_i + b\sum p_i = aE(X) + b E ( a X + b ) = ∑ ( a x i + b ) p i = a ∑ x i p i + b ∑ p i = a E ( X ) + b . ✓
V a r ( a X + b ) = E [ ( a X + b ) 2 ] − [ E ( a X + b ) ] 2 = E [ a 2 X 2 + 2 a b X + b 2 ] − [ a E ( X ) + b ] 2 \mathrm{Var}(aX+b) = E[(aX+b)^2] - [E(aX+b)]^2 = E[a^2X^2 + 2abX + b^2] - [aE(X)+b]^2 Var ( a X + b ) = E [( a X + b ) 2 ] − [ E ( a X + b ) ] 2 = E [ a 2 X 2 + 2 ab X + b 2 ] − [ a E ( X ) + b ] 2
= a 2 E ( X 2 ) + 2 a b E ( X ) + b 2 − a 2 [ E ( X ) ] 2 − 2 a b E ( X ) − b 2 = a^2E(X^2) + 2abE(X) + b^2 - a^2[E(X)]^2 - 2abE(X) - b^2 = a 2 E ( X 2 ) + 2 ab E ( X ) + b 2 − a 2 [ E ( X ) ] 2 − 2 ab E ( X ) − b 2
= a 2 [ E ( X 2 ) − ( E ( X ) ) 2 ] = a 2 V a r ( X ) = a^2[E(X^2) - (E(X))^2] = a^2\mathrm{Var}(X) = a 2 [ E ( X 2 ) − ( E ( X ) ) 2 ] = a 2 Var ( X ) . ✓
If you get this wrong, revise: Expectation and Variance —
Section 1.2.
Details
Problem 9
X ∼ B ( 200 , 0.15 ) X \sim B(200, 0.15) X ∼ B ( 200 , 0.15 ) . Use the normal approximation with continuity correction to approximate
P ( X > 35 ) P(X \gt{} 35) P ( X > 35 ) .
Details
Solution 9
μ = 200 ( 0.15 ) = 30 \mu = 200(0.15) = 30 μ = 200 ( 0.15 ) = 30 ,
σ 2 = 200 ( 0.15 ) ( 0.85 ) = 25.5 \sigma^2 = 200(0.15)(0.85) = 25.5 σ 2 = 200 ( 0.15 ) ( 0.85 ) = 25.5 ,
σ ≈ 5.05 \sigma \approx 5.05 σ ≈ 5.05 .
P ( X > 35 ) ≈ P ( Z > 35.5 − 30 5.05 ) = P ( Z > 1.089 ) ≈ 1 − 0.8621 = 0.1379 P(X \gt{} 35) \approx P\!\left(Z \gt{} \dfrac{35.5 - 30}{5.05}\right) = P(Z \gt{} 1.089) \approx 1 - 0.8621 = 0.1379 P ( X > 35 ) ≈ P ( Z > 5.05 35.5 − 30 ) = P ( Z > 1.089 ) ≈ 1 − 0.8621 = 0.1379 .
If you get this wrong, revise:
Normal Approximation to Binomial — Section 3.6.
Details
Problem 10
If
X ∼ P o ( 3 ) X \sim \mathrm{Po}(3) X ∼ Po ( 3 ) and
Y ∼ P o ( 5 ) Y \sim \mathrm{Po}(5) Y ∼ Po ( 5 ) are independent, find
P ( X + Y = 6 ) P(X + Y = 6) P ( X + Y = 6 ) .
Details
Solution 10
By additivity:
X + Y ∼ P o ( 3 + 5 ) = P o ( 8 ) X + Y \sim \mathrm{Po}(3+5) = \mathrm{Po}(8) X + Y ∼ Po ( 3 + 5 ) = Po ( 8 ) .
P ( X + Y = 6 ) = e − 8 ( 8 ) 6 6 ! = ◆ L B ◆ e − 8 × 262144 ◆ R B ◆◆ L B ◆ 720 ◆ R B ◆ ≈ ◆ L B ◆ 0.000335 × 262144 ◆ R B ◆◆ L B ◆ 720 ◆ R B ◆ ≈ 0.1221 P(X + Y = 6) = \dfrac{e^{-8}(8)^6}{6!} = \dfrac◆LB◆e^{-8} \times 262144◆RB◆◆LB◆720◆RB◆ \approx \dfrac◆LB◆0.000335 \times 262144◆RB◆◆LB◆720◆RB◆ \approx 0.1221 P ( X + Y = 6 ) = 6 ! e − 8 ( 8 ) 6 = L ◆ B ◆ e − 8 × 262144◆ R B ◆◆ L B ◆720◆ R B ◆ ≈ L ◆ B ◆0.000335 × 262144◆ R B ◆◆ L B ◆720◆ R B ◆ ≈ 0.1221 .
If you get this wrong, revise: Additivity — Section 4.5.
Details
Problem 11
Starting from the definition
E ( X ) = ∑ k = 0 n k ( n k ) p k ( 1 − p ) n − k E(X) = \sum_{k=0}^{n} k\binom{n}{k}p^k(1-p)^{n-k} E ( X ) = ∑ k = 0 n k ( k n ) p k ( 1 − p ) n − k , derive
E ( X ) = n p E(X) = np E ( X ) = n p using the identity
k ( n k ) = n ( n − 1 k − 1 ) k\binom{n}{k} = n\binom{n-1}{k-1} k ( k n ) = n ( k − 1 n − 1 ) and the binomial theorem.
Solution 11 E ( X ) = ∑ k = 0 n k ( n k ) p k ( 1 − p ) n − k = ∑ k = 1 n n ( n − 1 k − 1 ) p k ( 1 − p ) n − k E(X) = \sum_{k=0}^{n} k\binom{n}{k}p^k(1-p)^{n-k} = \sum_{k=1}^{n} n\binom{n-1}{k-1}p^k(1-p)^{n-k} E ( X ) = ∑ k = 0 n k ( k n ) p k ( 1 − p ) n − k = ∑ k = 1 n n ( k − 1 n − 1 ) p k ( 1 − p ) n − k
= n p ∑ k = 1 n ( n − 1 k − 1 ) p k − 1 ( 1 − p ) ( n − 1 ) − ( k − 1 ) = n p ∑ j = 0 n − 1 ( n − 1 j ) p j ( 1 − p ) n − 1 − j = np\sum_{k=1}^{n}\binom{n-1}{k-1}p^{k-1}(1-p)^{(n-1)-(k-1)} = np\sum_{j=0}^{n-1}\binom{n-1}{j}p^j(1-p)^{n-1-j} = n p ∑ k = 1 n ( k − 1 n − 1 ) p k − 1 ( 1 − p ) ( n − 1 ) − ( k − 1 ) = n p ∑ j = 0 n − 1 ( j n − 1 ) p j ( 1 − p ) n − 1 − j
By the binomial theorem: ∑ j = 0 n − 1 ( n − 1 j ) p j ( 1 − p ) n − 1 − j = [ p + ( 1 − p ) ] n − 1 = 1 \sum_{j=0}^{n-1}\binom{n-1}{j}p^j(1-p)^{n-1-j} = [p+(1-p)]^{n-1} = 1 ∑ j = 0 n − 1 ( j n − 1 ) p j ( 1 − p ) n − 1 − j = [ p + ( 1 − p ) ] n − 1 = 1 .
Therefore E ( X ) = n p E(X) = np E ( X ) = n p .
If you get this wrong, revise:
Direct derivation of E ( X ) = n p E(X) = np E ( X ) = n p from the PMF —
Section 2.5.
Details
Problem 12
X ∼ P o ( 7 ) X \sim \mathrm{Po}(7) X ∼ Po ( 7 ) . Let
Y = 3 X − 2 Y = 3X - 2 Y = 3 X − 2 . Find
E ( Y ) E(Y) E ( Y ) and
V a r ( Y ) \mathrm{Var}(Y) Var ( Y ) .
Details
Solution 12
For
X ∼ P o ( 7 ) X \sim \mathrm{Po}(7) X ∼ Po ( 7 ) :
E ( X ) = 7 E(X) = 7 E ( X ) = 7 and
V a r ( X ) = 7 \mathrm{Var}(X) = 7 Var ( X ) = 7 .
Using the coding formulae E ( a X + b ) = a E ( X ) + b E(aX+b) = aE(X)+b E ( a X + b ) = a E ( X ) + b and V a r ( a X + b ) = a 2 V a r ( X ) \mathrm{Var}(aX+b) = a^2\mathrm{Var}(X) Var ( a X + b ) = a 2 Var ( X ) :
E ( Y ) = 3 ( 7 ) − 2 = 19 E(Y) = 3(7) - 2 = 19 E ( Y ) = 3 ( 7 ) − 2 = 19 .
V a r ( Y ) = 3 2 × 7 = 63 \mathrm{Var}(Y) = 3^2 \times 7 = 63 Var ( Y ) = 3 2 × 7 = 63 .
Note that the additive constant − 2 -2 − 2 affects the mean but not the variance.
If you get this wrong, revise: Coding of Random Variables —
Section 6.
Details
Problem 13
X ∼ B ( 80 , 0.03 ) X \sim B(80, 0.03) X ∼ B ( 80 , 0.03 ) . State whether the Poisson approximation is valid, giving reasons. If valid, use it to find
P ( X ≤ 1 ) P(X \leq 1) P ( X ≤ 1 ) .
Details
Solution 13
Check conditions:
n = 80 > 50 n = 80 \gt{} 50 n = 80 > 50 and
p = 0.03 < 0.1 p = 0.03 \lt{} 0.1 p = 0.03 < 0.1 . Both conditions are satisfied, so the
Poisson approximation is valid with
λ = n p = 80 × 0.03 = 2.4 \lambda = np = 80 \times 0.03 = 2.4 λ = n p = 80 × 0.03 = 2.4 .
X ≈ P o ( 2.4 ) X \approx \mathrm{Po}(2.4) X ≈ Po ( 2.4 ) .
P ( X ≤ 1 ) = P ( X = 0 ) + P ( X = 1 ) = e − 2.4 ( 1 + 2.4 ) = 3.4 e − 2.4 ≈ 3.4 × 0.0907 ≈ 0.3085 P(X \leq 1) = P(X=0) + P(X=1) = e^{-2.4}\left(1 + 2.4\right) = 3.4\,e^{-2.4} \approx 3.4 \times 0.0907 \approx 0.3085 P ( X ≤ 1 ) = P ( X = 0 ) + P ( X = 1 ) = e − 2.4 ( 1 + 2.4 ) = 3.4 e − 2.4 ≈ 3.4 × 0.0907 ≈ 0.3085 .
If you get this wrong, revise:
Poisson approximation to the Binomial — Section 4.7.
Details
Problem 14
A discrete random variable
X X X has
E ( X ) = 5 E(X) = 5 E ( X ) = 5 and
V a r ( X ) = 4 \mathrm{Var}(X) = 4 Var ( X ) = 4 . Let
W = 2 X + 3 W = 2X + 3 W = 2 X + 3 . Find
E ( W ) E(W) E ( W ) and
V a r ( W ) \mathrm{Var}(W) Var ( W ) .
Details
Solution 14
E ( W ) = 2 E ( X ) + 3 = 2 ( 5 ) + 3 = 13 E(W) = 2E(X) + 3 = 2(5) + 3 = 13 E ( W ) = 2 E ( X ) + 3 = 2 ( 5 ) + 3 = 13 .
V a r ( W ) = 2 2 × V a r ( X ) = 4 × 4 = 16 \mathrm{Var}(W) = 2^2 \times \mathrm{Var}(X) = 4 \times 4 = 16 Var ( W ) = 2 2 × Var ( X ) = 4 × 4 = 16 .
S D ( W ) = 16 = 4 \mathrm{SD}(W) = \sqrt{16} = 4 SD ( W ) = 16 = 4 .
If you get this wrong, revise: Coding of Random Variables —
Section 6.
Details
Problem 15
Starting from
E ( X ( X − 1 ) ) = ∑ k = 0 n k ( k − 1 ) ( n k ) p k ( 1 − p ) n − k E(X(X-1)) = \sum_{k=0}^{n} k(k-1)\binom{n}{k}p^k(1-p)^{n-k} E ( X ( X − 1 )) = ∑ k = 0 n k ( k − 1 ) ( k n ) p k ( 1 − p ) n − k , derive
V a r ( X ) = n p ( 1 − p ) \mathrm{Var}(X) = np(1-p) Var ( X ) = n p ( 1 − p ) for
X ∼ B ( n , p ) X \sim B(n,p) X ∼ B ( n , p ) .
Details
Solution 15
Using
k ( k − 1 ) ( n k ) = n ( n − 1 ) ( n − 2 k − 2 ) k(k-1)\binom{n}{k} = n(n-1)\binom{n-2}{k-2} k ( k − 1 ) ( k n ) = n ( n − 1 ) ( k − 2 n − 2 ) :
E ( X ( X − 1 ) ) = ∑ k = 2 n n ( n − 1 ) ( n − 2 k − 2 ) p k ( 1 − p ) n − k = n ( n − 1 ) p 2 ∑ j = 0 n − 2 ( n − 2 j ) p j ( 1 − p ) n − 2 − j = n ( n − 1 ) p 2 E(X(X-1)) = \sum_{k=2}^{n} n(n-1)\binom{n-2}{k-2}p^k(1-p)^{n-k} = n(n-1)p^2\sum_{j=0}^{n-2}\binom{n-2}{j}p^j(1-p)^{n-2-j} = n(n-1)p^2 E ( X ( X − 1 )) = ∑ k = 2 n n ( n − 1 ) ( k − 2 n − 2 ) p k ( 1 − p ) n − k = n ( n − 1 ) p 2 ∑ j = 0 n − 2 ( j n − 2 ) p j ( 1 − p ) n − 2 − j = n ( n − 1 ) p 2
Then E ( X 2 ) = E ( X ( X − 1 ) ) + E ( X ) = n ( n − 1 ) p 2 + n p E(X^2) = E(X(X-1)) + E(X) = n(n-1)p^2 + np E ( X 2 ) = E ( X ( X − 1 )) + E ( X ) = n ( n − 1 ) p 2 + n p .
V a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2 = n ( n − 1 ) p 2 + n p − n 2 p 2 = n p − n p 2 = n p ( 1 − p ) \mathrm{Var}(X) = E(X^2) - [E(X)]^2 = n(n-1)p^2 + np - n^2p^2 = np - np^2 = np(1-p) Var ( X ) = E ( X 2 ) − [ E ( X ) ] 2 = n ( n − 1 ) p 2 + n p − n 2 p 2 = n p − n p 2 = n p ( 1 − p ) .
If you get this wrong, revise:
Direct derivation of V a r ( X ) = n p ( 1 − p ) \mathrm{Var}(X) = np(1-p) Var ( X ) = n p ( 1 − p ) from the PMF
— Section 2.6.
Details
Problem 16
X ∼ B ( 120 , 0.025 ) X \sim B(120, 0.025) X ∼ B ( 120 , 0.025 ) . (a) Show that the Poisson approximation is appropriate. (b) Use it to find
P ( X = 5 ) P(X = 5) P ( X = 5 ) . (c) State why the normal approximation would not be appropriate here.
Details
Solution 16
(a)
n = 120 > 50 n = 120 \gt{} 50 n = 120 > 50 and
p = 0.025 < 0.1 p = 0.025 \lt{} 0.1 p = 0.025 < 0.1 , so the Poisson approximation is appropriate.
λ = n p = 120 × 0.025 = 3 \lambda = np = 120 \times 0.025 = 3 λ = n p = 120 × 0.025 = 3 .
(b) X ≈ P o ( 3 ) X \approx \mathrm{Po}(3) X ≈ Po ( 3 ) .
P ( X = 5 ) = ◆ L B ◆ e − 3 × 3 5 ◆ R B ◆◆ L B ◆ 5 ! ◆ R B ◆ = ◆ L B ◆ e − 3 × 243 ◆ R B ◆◆ L B ◆ 120 ◆ R B ◆ = 2.025 e − 3 ≈ 2.025 × 0.0498 ≈ 0.1008 P(X = 5) = \frac◆LB◆e^{-3} \times 3^5◆RB◆◆LB◆5!◆RB◆ = \frac◆LB◆e^{-3} \times 243◆RB◆◆LB◆120◆RB◆ = 2.025\,e^{-3} \approx 2.025 \times 0.0498 \approx 0.1008 P ( X = 5 ) = L ◆ B ◆ e − 3 × 3 5 ◆ R B ◆◆ L B ◆5 ! ◆ R B ◆ = L ◆ B ◆ e − 3 × 243◆ R B ◆◆ L B ◆120◆ R B ◆ = 2.025 e − 3 ≈ 2.025 × 0.0498 ≈ 0.1008
(c) For the normal approximation we need n p > 5 np \gt{} 5 n p > 5 and n ( 1 − p ) > 5 n(1-p) \gt{} 5 n ( 1 − p ) > 5 . Here n p = 3 < 5 np = 3 \lt{} 5 n p = 3 < 5 ,
so the normal approximation is not appropriate. The Poisson approximation is the correct choice
since p p p is small.
If you get this wrong, revise:
Poisson approximation to the Binomial — Section 4.7.
Details
Problem 17
Temperatures in a city are modelled by
X ∼ N ( 15 , 9 ) X \sim N(15, 9) X ∼ N ( 15 , 9 ) in degrees Celsius. The temperature in
Fahrenheit is
F = 9 5 X + 32 F = \frac{9}{5}X + 32 F = 5 9 X + 32 . Find
E ( F ) E(F) E ( F ) ,
V a r ( F ) \mathrm{Var}(F) Var ( F ) , and
P ( F > 68 ) P(F \gt{} 68) P ( F > 68 ) .
Details
Solution 17
E ( F ) = 9 5 E ( X ) + 32 = 9 5 ( 15 ) + 32 = 27 + 32 = 59 ∘ F E(F) = \frac{9}{5}E(X) + 32 = \frac{9}{5}(15) + 32 = 27 + 32 = 59^\circ\mathrm{F} E ( F ) = 5 9 E ( X ) + 32 = 5 9 ( 15 ) + 32 = 27 + 32 = 5 9 ∘ F .
V a r ( F ) = ( 9 5 ) 2 × 9 = 81 25 × 9 = 729 25 = 29.16 \mathrm{Var}(F) = \left(\frac{9}{5}\right)^2 \times 9 = \frac{81}{25} \times 9 = \frac{729}{25} = 29.16 Var ( F ) = ( 5 9 ) 2 × 9 = 25 81 × 9 = 25 729 = 29.16 .
S D ( F ) = 29.16 = 5.4 \mathrm{SD}(F) = \sqrt{29.16} = 5.4 SD ( F ) = 29.16 = 5.4 .
P ( F > 68 ) = P ( Z > 68 − 59 5.4 ) = P ( Z > 1.667 ) ≈ 1 − 0.9522 = 0.0478 P(F \gt{} 68) = P\!\left(Z \gt{} \dfrac{68 - 59}{5.4}\right) = P(Z \gt{} 1.667) \approx 1 - 0.9522 = 0.0478 P ( F > 68 ) = P ( Z > 5.4 68 − 59 ) = P ( Z > 1.667 ) ≈ 1 − 0.9522 = 0.0478 .
If you get this wrong, revise: Coding of Random Variables —
Section 6.
:::
:::
tip
Ready to test your understanding of Statistical Distributions ? The diagnostic test contains the hardest questions within the A-Level specification for this topic, each with a full worked solution.
Unit tests probe edge cases and common misconceptions. Integration tests combine Statistical Distributions with other topics to test synthesis under exam conditions.
See Diagnostic Guide for instructions on self-marking and building a personal test matrix.