FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 24
STATISTICS I
1. Probability 2. Conditional probability 3. Combinations and permutations 4. Random variables 5. Mean, median and mode 6. Variance, standard deviation and quartiles
1. Probability Many events are subject to chance – they are not entirely predictable. An experiment is any process in which the result of each performance depends on chance. For example, throwing a die and observing the number uppermost, or testing 10 fuses from a box of 100. The result of a single performance of a statistical experiment is called an outcome. The set of measured outcomes of an experiment is called data. Data can be displayed using histograms, stem and leaf plots, cumulative percentage plots etc., but none of these methods are considered further in this module. The set of all possible outcomes of an experiment is called the sample space – denote this by S. We must assign a probability to each outcome in S. Let us use the frequency interpretation – then if an experiment is carried out N times, under the same experimental conditions, and the outcome e1 occurs n1 n1 is called the relative frequency of e1 . times, then N n 1 . Def. The probability of e1 occurring = p1 = lim N →∞ N The sample space S can be discrete – forms a finite list, e.g. {1, 2, 3, 4, 5, 6}, or continuous – e.g. the height of a person. Events are what we observe – ranging from S itself (the certain event) to the empty set φ (the impossible event). The usual operations of set theory apply. If A and B are events then so are (a)
union A ∪ B,
(b)
intersection A ∩ B,
(c)
complement A (= S − A),
(“A or B occurs”) (“A and B occurs”) (“not A occurs”)
There are three axioms of probability (1)
P (S) = 1 ,
the certain event S has probability 1;
(2)
P (A) ≥ 0 ,
all probabilities are non-negative;
(3)
if A and B are disjoint events (i.e. A ∩ B = φ ) then P (A ∪ B) = P (A) + P (B) ,
From the above axioms it can be shown that: (4)
P (A) = P (S − A) = 1 − P (A) ;
(5)
P (φ) = 0 ; 1
(addition rule).
(6)
if A ⊆ B
(7)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
then P (A) ≤ P (B) ;
(i.e. A contained in B)
Venn diagrams are very useful to illustrate probability results. In Venn diagrams the sample space is represented by a rectangle, and the event A by a closed curve inside the rectangle.
S
11 00 00 11 00 11 A 00 11 00 11
S A
A
A
B
A
B
figure 1c
figure 1b
figure 1a
S
S
B
figure 1d
Figure 1a shows an event A and its complement A . Similarly the sets A ∪ B and A ∩ B are shown in figures 1b and 1c respectively. Two disjoint, or mutually exclusive, sets A and B are shown in figure 1d. Finally property 7 is illustrated below.
=
P(B)
P(A)
P(A or B) Ex 1. 4”.
+ P(A and B)
A fair six-sided die is tossed. Find the probability of the event “even number or number less than
The sample space here is We require
S = {1, 2, 3, 4, 5, 6}.
P {(even number) or (number < 4)} = P {{2, 4, 6} or {1, 2, 3}} = P {1, 2, 3, 4, 6} 5 1 = P {S − {5}} = P (S) − P (5) = 1 − P (5) = 1 − = 6 6 [Note that in proving the above we have used the result P {1, 2, 3, 4, 5, 6} = P (1) + P (2) + P (3) + P (4) + P (5) + P (6), all outcomes disjoint, = 1, since one number must occur. All numbers are equally likely so P (1) = P (2) = P (3) = P (4) = P (5) = P (6) =
1 6
]
Ex 2. For a student assessment 80% passed the examination in mathematics, 85% passed the laboratory work and 75% passed both. For a students chosen at random find the porobability that the student (a) passed in either mathematics or laboratory work; (b) failed in both. (a) Define P (pass maths) = P (M ) , and P (L) = 0.85 . Hence
P (pass lab) = P (L) , then the given data implies P (M ) = 0.80
P (M ∪ L) = P (pass maths or pass lab) = P (M ) + P (L) − P (M ∩ L) = 0.80 + 0.85 − 0.75 = 0.90
2
(b)
S M
L
fail both
From the Venn diagram above if follows that P (fail both) + P (M ∪ L) = P (S) = 1 →
P (fail both) = 1 − P (M ∪ L) = 1 − 0.90 = 0.10
N.B. It is important to note that in most probability problems there are many ways of obtaining the correct answer. 2. Conditional probability The probability of an outcome may depend on what has happened in previous events. Suppose we know that an event A has occurred. The probabilities of possible future events are now measured relative to the fact that A has happened. In these situations the probability of B given that A has occurred, denoted by P (B | A) , is defined by P (B | A) =
P (A ∩ B) , P (A)
where P (A) > 0 .
(The denominator on the right-hand side gives the new sample space, and the numerator is the probability that both A and B occur.) Ex 3. Someone tosses a die, covers it up and tells you the number is less than 4. How does this change the probability that the number is even? In the usual situation P (even number) = P {2, 4, 6} =
1 1 1 1 + + = . 6 6 6 2
In the case described in the question P (even|number < 4) =
P ({2}) P (even and < 4) = = P (< 4) P ({1, 2, 3})
Thus the probability of an even number drops from
1 6 3 6
=
1 3
1 1 to . 2 3
Ex 4. The probability that a regular scheduled flight departs on time is P (D) = 0.83 , the probability that it arrives on time is P (A) = 0.92 and the probability that it both departs and arrives on time is P (A ∩ D) = 0.78 . Find the probability that a plane (a) (b)
arrives on time given that it departed on time; did not depart on time given that it did not arrive on time.
(a)
P (A | D) =
0.78 P (A ∩ D) = = 0.9398 (= 0.94) P (D) 0.83
P (D ∩ A) P (A) Now the set D ∩ A denotes events which are not in D and not in A . Hence, using the Venn diagram below
(b)
P (D | A) =
3
S D
A
D∩A= S −D∪A → Thus P (D | A) =
P (D ∩ A) = P (S − D ∪ A) = 1 − P (D ∪ A)
1 − P (D ∪ A) 1 − {P (D) + P (A) − P (D ∩ A)} P (D ∩ A) = = 1 − P (A) P (A) P (A) 0.03 1 − (0.83 + 0.92 − 0.78) = = 0.375 = 1 − 0.92 0.08
Independence – events A and B are called independent when P (B | A) = P (B) , in which case the event A conveys no information about B. Combining the above definition with the definition of conditional probability leads to P (B | A) = P (B) = or Ex 5.
P (B ∩ A) P (A)
P (A ∩ B) = P (A) P (B) ,
the usual definition of independence
If two fair dice are tossed what is the probability of at least 1 six occurring? P (at least 1 six) = 1 − P (no six) = 1 − P ((first die not 6) and (second die not 6)) = 1 − P (first die not 6) P (second die not 6), 11 5 5 = =1− 6 6 36
since tosses independent
Consider now an example involving conditional probabilities. Ex 6. A manufacturing plant has three machines producing floppy disks. Machine A manufactures 60% of the disks, machine B 30% and machine C 10 %. However, 1% of disks from A are faulty, 2% from B are faulty and 5% from C are faulty. (a) What is the probability the next disk manufactured is faulty? (b) Suppose a floppy disk picked up at random is faulty. What is the probability it was manufactured on machine C? (a) Define P (A) to be the probability that a disk is manufactured by machine A, with similar definitions for P (B) and P (C). The given information implies that P (A) =
60 , 100
P (B) =
30 , 100
P (C) =
10 . 100
P (F | A) denotes the probability that a disk from machine A is faulty, in which case the given information leads to 1 2 5 P (F | A) = , and similarly P (F | B) = , P (F | C) = . 100 100 100 The sample space S can be divided into 3 mutually exclusive subspaces A, B and C (see the corresponding Venn diagram below). S
A F
B
C
4
It follows, from the figure for instance, that the set F can be split into three parts – a part in A, in B and the third part in C , and this leads to F = (F ∩ A) ∪ (F ∩ B) ∪ (F ∩ C) Hence
P (F ) = P (F ∩ A) + P (F ∩ B) + P (F ∩ C) = P (F | A)P (A) + P (F | B)P (B) + P (F | C)P (C) 60 2 30 5 10 1 + + = 0.017 , = 100 100 100 100 100 100
i.e. 1.17%
(b) Here we need to find the probability that a disk that is faulty has come from machine C, i.e. P (C | F ). Now P (F ∩ C) P (C ∩ F ) and P (F | C) = , P (C | F ) = P (F ) P (C) and since P (C ∩ F ) = P (F ∩ C) it then follows that P (C | F ) =
P (C ∩ F ) P (F ∩ C) P ( F | C)P (C) = = = P (F ) P (F ) P (F )
5 10 100 ( 100 )
0.017
=
5 ∼ 0.294 . 17
3. Combinations and permutations A permutation is an ordered arrangement of objects
– n distinct objects → n! permutations.
A combination is a subset without regard to order. Def.
The number of combinations of size r that can be drawn from n objects = n The quantity is often said “ n choose r ” or written n Cr . r
n! n = . r r!(n − r)!
Ex 7.
Find the number of ways of choosing 3 objects from 8. 8(7)6 8! 8 = = 56 . Number of ways = = 3 3! 5! 3(2)1
4. Random variables A random variable is a sample space of possible numerical values together with a probability over these values. In section 1 we considered discrete and continuous sample spaces, and in a similar way we can consider discrete and continuous random variables. Discrete random variables Def.
The probability function, PX (x) , is defined by PX (x) = P (X = x) ,
−∞ < x < ∞
Def. The (cumulative) distribution function, FX (x) , is defined by FX (x) = P (X ≤ x) , x<∞
−∞ <
To illustrate the above for a discrete distribution suppose that PX (0) = 0.1 PX (1) = 0.3
PX (2) = 0.3 PX (3) = 0.2 PX (4) = 0.1 PX (x) = 0 for all other x.
These probabilities lead to the distribution function shown in the figure below. 5
F (x) X
1 0.8 0.6 0.4 0.2 2 3 4
0 1
5
x
Continuous random variables A continuous random variable X can take any value in some interval (v1 , v2 ) . If this interval is not the infinite interval (−∞, +∞) but is the finite interval (v1 , v2 ) then it is possible to define the quantity X to have zero probability outside this finite range. −∞ < x < ∞ . As in the discrete case the distribution function is defined by FX (x) = P (X ≤ x) , Introduce the probability density function fX (x) , where the probability that the variable lies between x and x + δx equals fX (x) δx . The distribution function FX (x) and the density function fX (x) are connected through the relations Z FX (x) =
x
−∞
fX (x) dx ,
fX (x) =
d (FX (x)) . dx
It follows from the above that Z lim FX (x) = 0 ,
lim FX (x) = 1 ,
x→−∞
Ex 8.
x→+∞
P (x1 ≤ X ≤ x2 ) =
x2
x1
fX (x) dx .
The lifetime of an electronic component is a continuous random variable with density function fX (x) =
1 −x/2 , 2e
0,
x ≥ 0, x < 0.
(a) Find the distribution function. (b) Find the proportion of components for which X > 6. (a) Z x Z 0 Z x 1 −x/2 e FX (x) = fX (x) dx = 0 dx + dx −∞ −∞ 0 2 x h i 1 e−x/2 1 =0+ = (−2) e−x/2 − e−0 = −1 e−x/2 − 1 = 1 − e−x/2 . 2 −1/2 0 2 [N.B. Using the result in (a),
lim FX (x) = 1 − 0 = 1 , as expected:
x→+∞
i.e. “area under curve = 1 ”.]
(b) P (X > 6) = 1 − P (X ≤ 6) = 1 − FX (6) = 1 − [1 − e−6/2 ] = e−3 ∼ 0.05 . 5. Mean, median and mode The properties of a random variable are determined by its probability distribution (if discrete) or density function (if continuous). In order to describe the nature of a distribution it is useful to have a measure of its location (i.e. know a typical value) and a measure of its dispersion (or spread). In this section three measures of location will be defined.
6
Def.
The mean, µX , is defined by µX =
m X
xk P (X = xk )
(for X discrete),
k=1 Z +∞
and by Def.
µX =
−∞
x fX (x) dx
The median, mX , is defined by P (X ≤ mX ) ≥
1 2
(for X continuous)
and P (X ≥ mX ) ≥
1 2
(for X discrete),
1 (for X continuous) 2 (N.B. Note that for a continuous distribution the above definition gives equal chances of being above and below the median, but that for a discrete distribution mX need not be unique.) and by
P (X ≤ mX ) = FX (mX ) =
The mode, xmode , is any point for which PX (xmode ) is an overall maximum (for X discrete), and any point for which fX (xmode ) is an overall maximum (for X continuous) Consider the density function displayed in the figure below for a continuous distribution. Then calculation of the mean, median and mode would lead to the answers shown. Def.
f (x) X
µ m xmodeX
x
X
6. Variance, standard deviation and quartiles These quantities measure the spread of the variation. Def.
m X
2 2 The variance, σX , is defined by Var(X) = σX =
and by
2 σX =
Z
(xk − µX )2 P (X = xk ) (for X discrete),
k=1 +∞
−∞
(x − µX )2 fX (x) dx (for X continuous)
[N.B. It is possible to show that the last two formulae can be rewritten 2 = σX
m X
xk 2 P (X = xk ) − µ2X ,
2 and σX =
k=1
Def.
Z
+∞
−∞
x2 fX (x) dx − µX 2
]
The standard deviation, σX , is defined through the variance above.
Quartiles Consider only a continuous random variable X . In the previous section the median, mX , was defined by FX (mX ) = 1/2 . In a similar way the quartiles q1 and q3 are defined through FX (q1 ) = 1/4 and FX (q3 ) = 3/4 . The interquartile range is defined as q3 − q1 . Many other measures of spread can be used.
7
Ex 9. 8. (a)
Find the mean, median and mode for (a) the toss of a die,
µX
median mode
7 1 1 1 1 1 1 +2 +3 +4 +5 +6 = =1 6 6 6 6 6 6 2 – any point in the interval [3, 4].
–
any of the six values on the die. Z
(b) µX =
(b) the continuous distribution in Ex
0
∞
1 1 x e−x/2 dx = 2 2
(
xe−x/2 (−1/2)
∞ 0
Z −
0
∞
) ( −x/2 ∞ ) 1 e−x/2 e 1 dx = 0+2 = −2[0 − 1] = 2 −1/2 2 −1/2 0
1 1 mX = ln , which leads median: mX satisfies FX (mX ) = 1 − e−mX /2 = , → e−mX /2 = 12 , i.e. − 2 2 2 to mX = 1.386 . mode: xmode = 0, since the density function fX has a maximum at x = 0 . In sections 5 and 6 the mean and variance have been calculated for random variables whose probability distributions are known. The Formula sheet contains formulae for the mean and variance of data.
rec/01ls1
8