1 1.1 1.1.1
Introduction Syllabus Level of the course:
The course is given at an intermediate level. The course requires one year of calculus and a certain degree of mathematical maturity. This is an ambitious course in that we cover both probability and statistics in one semester. Because so much material is covered, it is impossible to go over a large enough number of examples that illustrate the subject as it is being developed. Therefore, students should expect to spend at least five hours a week reading the book, reading the references, and going over the book examples, the recommended problems, and the assigned problems. It is very important not to fall behind because the material builds up very quickly. On the positive side, the reward is that after one semester you will have a working knowledge of probability and statistics. The course requires students to use the textbook for examples and details which can’t be covered in class because of time limitations and coverage requirements. To alleviate this problem I will be holding voluntary attendance recitation sessions where the TA will go over questions and exercises. 1.1.2
Encourage students to ask questions
1.1.3
Use of statistical computer packages
Throughout the course, students will need to crunch data. Students can use the statistical package of their choice, e.g. Minitab, SPSS, S, etc.. The statistical modules embedded in Excel for Windows is powerful enough to work most of the problems in the textbook. 1.1.4
Motivation
Why study probability and statistics? One answer, of course, for many of you is that this is a required course. But why? The reason is that we live in a world where uncertainty is everywhere. Will it rain tomorrow? Which candidate will win the elections? Is treatment A better than treatment B? Is production out of control? Should we target generation Y instead of generation X? While we cannot give a definitive answer to most of these questions, we can observe the underlying process, collect data and give an answer couched in probabilistic terms. Thus, we may say that there is an eighty percent chance that it will rain tomorrow, and we may reject the claim that treatment A is better than treatment B and at the same time announce the probability that we are wrong. In general, in statistical inferences we collect data and want to make intelligent and rigorous statements about a population from 1
which the data comes from. Examples include polling, quality control, medical treatments, risk-management, etc. Data are subject to statistical variations, and we want to use data to reach fairly reliable conclusions despite of the statistical variations of the data. Thus statistical statements are couched in terms of probabilities and therefore we need to study probability to understand statistics. However, the study of probability is interesting in itself. It prepares students for courses in stochastic processes, quality control, reliability, risk management and adds to their understanding of random phenomena. Interpretation of a probability statement: “With probability 60% we will find oil in this site.” There are two interpretations: Frequency and degree of believe interpretation. The frequency interpretation would mean that if you were to look for oil in a large number of similar sites you would find it in about 60% of the sites. The degree of believe interpretation is that you are slightly more willing to believe that there is oil than there is not. Fortunately the rules for manipulating probabilities are independent of the two interpretations. 1.1.5
How to Study Probability
One way is to simply try to attack probability problems from scratch and develop a body of knowledge from the experienced gained from solving problems. While this may be fun, and the knowledge acquired through deep understanding cannot be underestimated, this is a very slow and sometimes torturous path. Many people have walked that path, and we are now in a position of taking advantage of the useful concepts and language they have developed. Thus, our approach to the study of probability will be axiomatic. From a few axioms we will develop important branches of an immense field of knowledge. Mention here the book: Fifty Challenging Problems in Probability with Solutions by Frederick Mosteller, Dover 1965, and the book: Against the Gods: The Remarkable Story of Risk, by Peter L. Bernstein.
2
Review of Set Theory
To build a probability model we usually have an experiment in mind and are interested in assigning probabilities to certain events of interest. For example, the experiment may be to roll two dice and an event of interest may be that the two numbers are even and that the first one is larger than the second. To be able to build a probability model we will need to review some basic concepts of set theory, and then state the axioms of probability. After this we will discuss the special case where the sample space has a finite number of elements. The symbol ∅ denotes the empty set, S denotes the universal set. (S in textbook.) We
2
say that E is a subset of F (written E ⊂ F ) if all the elements of E belong to F. Two sets E and F are said to be equal if E ⊂ F and F ⊂ E. The union E ∪ F consists of all the elements that belong to E, to F or to both E and F . The intersection E ∩ F consists of all the elements that belong to both E and F. Two sets E and F are said to be mutually exclusive if E ∩ F = ∅. E 0 (the complement of E) denotes all the elements in S that are not in E. The basic properties of sets are the commutative laws, the associative laws, the distributive laws, and De Morgan’s laws. Example: S the set of all possible outcomes of tossing a coin three times. A at least one head, B the first two tosses are heads, C the third toss is a tail. Find Ac , A ∩ B, and A ∪ B. Here S = {hhh, hht, hth, htt, thh, tht, tth, ttt}. In probability S is the set of all possible outcomes of an experiment. Interesting subsets of S for which we may want to assign probabilities such as A = {hhh, hht, hth, htt, thh, tht, tth} and B = {tth, ttt} are known as events. An event A is said to have occurred if any outcome ω ∈ A occurs when an experiment is conducted. Suppose an experiment is done with outcome ω = tth then all events (subsets of S) containing ω occur. 2.1
Set functions
Let F be a set of events of S. When S has a finite number of elements, F is often the set of all subsets of S. For example if S = {a, b, c} then F contains the empty set, the set of all singletons, the set of all pairs, and S itself. A real valued set function maps elements of F (subsets of S) to the real numbers. Example: S = {a, b, c}. For any A ⊂ S define the set function N (A) as the number of elements in A. Then N (∅) = 0, N ({b}) = 1, N ({a, b}) = 2, etc. Example: S = {(x, y) : 0 ≤ x ≤ a, 0 ≤ y ≤ b}. For any R = {(x, y) : x1 ≤ x ≤ x2 , y1 ≤ y ≤ y2 } ⊂ S, let A(R) = (x2 − x1 )(y2 − y1 ) be the area of set R. In probability we are often interested in set functions P (A) that map subsets of S into the interval [0, 1]. Example: If S = {a, b, c}. For any A ⊂ S define P (A) = N (A)/N (S) = N (A)/3 as the number of elements in A divided by the number of elements of S. Then P (∅) = 0, P ({b}) = 1/3, P ({a, b}) = 2/3, etc. Example: If S = {(x, y) : 0 ≤ x ≤ a, 0 ≤ y ≤ b}. For any R = {(x, y) : x1 ≤ x ≤ x2 , y1 ≤ y ≤ y2 } ⊂ S, let P (R) = A(R)/A(S) = (x2 − x1 )(y2 − y1 )/ab be the area of set R divided by the area of S.
3
2.2
The Axioms of Probability
See section 2.4 In probability we are interested in assigning numbers in [0, 1] to certain subsets of S (called events). The following axioms allow us to do this in a consistent way. A1 If A ⊂ S then P (A) ≥ 0. A2 P (S) = 1 A3 If A1 , A2 , . . . are mutually exclusive sets then P (A1 ∪ A2 ∪ . . . ∪ An ) = P (A1 ) + P (A2 ) + . . . + P (An ) for all n ≥ 1. Note: In order for a set of outcomes to be an event we need to be able to assing a probability to the set. The set S is known as the sample space. It contains all the possible outcomes of an experiment. In general an outcome of an experiment is denoted by ω. An experiment can have a finite number of outcomes, a countable number of outcomes e.g. S = {ω1 , ω2 , ω3 , . . .}, or an uncountable number of outcomes, e.g. S = {ω : 0 ≤ ω ≤ 1}. Before discussing some of the implications of these axioms let us verify that they indeed hold true for a couple of simple examples: 1. Suppose S = {h, t}. This is the sample space corresponding to the experiment of tossing a coin where h represents heads and t represents tails. The relevant subsets of S are ∅, {h}, {t}, {h, t}. If the coin is fair we would have P ({h}) = P ({t}) = 0.5, P (∅) = 0, P ({h, t}) = 1 which agrees with A1, A2, A3. 2. Suppose S = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} and for each A = {(x, y) : x1 ≤ x ≤ x2 , y1 ≤ y ≤ y2 } ⊂ S let P (A) = (x2 − x1 )(y2 − y1 ). Notice that P (S) = 1. For each z ∈ [0, 1] let Rz = {(x, y) : x = z, 0 ≤ y ≤ 1} and notice that P (Rz ) = 0 for all z ∈ [0, 1]. Also, S = ∪z∈[0,1] Rz . Because Rx ∩ Rz = ∅ for all x 6= z the sets Rz , z ∈ [0, 1] are mutually exclusive. These seems to contradict A3, except for the fact that it is only valid for countable collections of mutually exclusive subsets.
4