Introduction to the Course
Teaching staff:
Lectures: -
Statistics Vietnamese--Belgium program Vietnamese
Dr. Tran Thi Bich Department of Statistics, NEU Email:
[email protected]
16th of Aug: 17th of Aug: 18thh of Aug: 19th of Aug:
6-9:00pm 6-9:00pm 9-12:00pm; 14-17:00pm 9-12:00pm; 14-17:00pm
Tutorials: 30 minutes - 1 hour, at the end of the lecture, from lecture 2 to 6 Text book:
- Statistics for Management and Economics. 7th Edition, Keller.
Assessment: one in-class exam at the end of the course
1
2
Outline
Section 1
Introduction to statistics Basic concepts: variables and data Getting acquainted with SPSS
Introduction to Statistics and SPSS Reading materials: Chap 1, 2 (Keller)
3
4
Why is statistics important?
What is statistics
Statistics is all about collecting, organising and interpreting data
Statistics is a way to get information from data and make k ddecisions ii under d uncertainty i
Statistical analysis of data uses statistical modelling and probability; our main focus is on data and techniques for analysing data
5
Financial management (capital budgeting) Marketing management (pricing) Marketing research (consumer behaviour) Operations management (inventory) Accounting (forecasting sales) Human resources management (performance appraisal) Information systems Economics (summarising, predicting) 6
1
Basic concepts: variables and data
Types of statistics
Descriptive statistics:
Collecting, organising, summarising, and presenting data E.g: graphical techniques; numerical techniques
A variable is some characteristics of population or sample
Eg: • Height of female students • Occupation of students in this class
Data are the observed values of a variable
Eg: • Height of 10 female students: 1.6, 1.7, 1.55, 1.59, 1.5, 1.58, 1.64, 1.67, 1.58, 1.55 • Occupation of 5 students: teller, accountant, IT, marketing manager, teacher
Inferential statistics:
Estimating, predicting, and making decisions about population based on sample data E.g: estimation; hypothesis testing
7
8
Qualitative data
Types of data
Data
Qualitative is the kind of data that cannot be measured (quantified) Marital status: single, married, divorced, and widowed Study performance of students: poor, fair, good, very good, excellent
Quantitative (also called Interval)
Q lit ti Qualitative
More classification: qualitative data can be classified as Nominal aandd Ordinal O d a data da a Nominal data (also called categorical data): cannot be quantified with any meaningful unit - Marital status: single, married, divorced, and widowed
Nominal
Ordinal
Discrete
Ordinal data: a sort of nominal data but their values are in order
Continuous
- Study performance of students: poor, fair, good, very good, excellent - Opinions of consumers: strongly disagree, somewhat disagree, neither disagree nor agree, agree, strongly agree
9
10
Quantitative data
Activity 1
Quantitative (interval) data are real number (can be measured) Eg:
Mid-term test marks of 10 students: 7, 8, 10, 5, 5, 6, 8, 9, 9, 7 Weights of postal packages Monthly salary
For each of the following examples of data, determine the type: i. The number of miles joggers run per week
More classification: quantitative data can be divided into two types: yp discrete or continuous
ii. The starting salaries of graduates of advanced program
◦ Discrete data: take only integer value
iii.The months in which a firm’s employees choose to take their vacations
Eg:
iv.The occupation of graduates of advanced program
Number of children in family: 1, 2, 4, 7, 2
v. Teachers’ ranking
Number of owned houses
◦ Continuous data: can take any value Eg: Weights of postal packages Monthly salary 11
12
2
Population versus sample
Population versus sample (con.t)
Population is a set of all items or people that share some common characteristics
A sample survey is obtained by collecting information of some members of the population - Collect the height of 1,000 Vietnamese citizens - Verify the quality of a proportion of products that are produced by factory X 2 Statistics: a descriptive measure of a sample (x, s )
A census
is obtained by collecting information about every member of a population - Collect the height of Vietnamese citizens - Verify the quality of all products that are produced by factory X
A sample is a smaller group of the population.
Sampling: taking a sample from the population
An important requirement: a sample must be representative of the population. That means the profile of the sample is the same as that of the population
Parameter: a descriptive measure of a population ( , 2 )
14
13
Moving from population to sample
Reasons to take sample
Population
Sampling frame (a list of all items of the population)
A census can give accurate data but collecting information from the entire population is sometimes impossible
A census is time-consuming and expensive
A sample allows to investigate more detailed information
A certain sample size ensures that results from the sample are as accurate as those of the population
Sample 15
Types of sample
16
Getting acquainted with SPSS
Random sampling => Random sample
Quasi random sampling Quasi-random
Import the file ‘assignment 1 data set.xls’ into SPSS and get familiar with SPSS.
Systematic sample St tifi d sample Stratified l Multistage sample
Quota sample
Non-random sampling Cluster sample
17
18
3
Data presentation:
Outline
Tables and charts
Frequency distribution - Simple frequency table
Charts
- Grouped frequency table - Bar and pie charts - Histograms - Boxplot - Stem-and-leaf - Ogive
Reading materials: Chap 2, 3 (Keller)
1
Why do we have to summarise data
Tables: frequency distribution
Recap
◦ In the previous chap you know how to collect data. Data collected through surveys are called ‘raw’ data.
◦ Raw data may include thous. obs and often provide too much information => need to summarise before presenting to audience
Requirement ◦ Data summary clears away details but should give the overall pattern. ◦ Summarised information are concise but should reflect the accurate view of the original data
2
Frequency is the number of times a certain event has happened A frequency distribution records the number of times each value occurs and is presented in the form of table Types of frequency distribution: • Simple frequency distribution • Grouped frequency distribution • Cumulative, percentage, and cumulative percentage frequency distribution
Methods to summarise and present data ◦ Tables ◦ Charts ◦ Numerical summaries (measure of location and dispersion) 3
Simple frequency table: example 1
Simple frequency distribution
4
Applications:
Marks
Number of students (frequency)
• Qualitative data
4
3
• Discrete variable with few values
5
3
6
2
7
4
8
3
9
2
10
3
Example of discrete variable with few values • You are given a raw data of midterm marks of 20 students as follows: 7, 7, 10, 8, 5, 4, 5, 6, 4, 9, 8, 7, 6, 4, 8, 5, 7, 10, 10, 9 • Create a simple frequency table manually
5
6
1
Simple frequency distribution: nominal variable
Simple frequency distribution: example 2 Nationality
Number of students (frequency)
Australia
Example 2: We have a data set of 686 international students studying at UNSW, Australia. Create a frequency table Large data set => can’t create a frequency table manually Creating a simple frequency table using SPSS Go to ‘Analyse’ => ‘Tables’ => ‘Tables of frequency’ When the dialog box appears, choose a variable for the box ‘Frequencies for’, then click OK Copy the table to Excel for more manipulations
179
New Zealand
1
Hong Kong
21
Singapore
48
Malaysia
70
Indonesia
76
Philippines
6
Thailand
18
China
99
Vietnam
9
India
11
USA, Canada
14
UK, Ireland
35
Other Europe
42
Rest of the world
57
Total
686
7
8
Grouped frequency table: discrete variable with many values (cont.)
Grouped frequency table: discrete variable with many values Example 3: 3: the marks scored by 58 candidates seeking promotion in a personnel selection test were recorded as follows. Construct a frequency table using a class width of ten marks
Marks (class interval)
Number of candidates (frequency)
37
49
58
59
56
79
21 – 30
2
62
82
53
58
34
45
31 – 40
11
40
43
44
50
42
61
41 – 50
17
54
30
49
54
76
47
51 – 60
20
64
53
64
54
60
39
61 – 70
5
49
44
47
44
25
38
71 – 80
2
55
57
54
55
59
40
81 – 90
1
31
41
53
47
58
55
Total
58
59
64
56
42
38
37
33
33
47
50
Note: Decision on the number of classes and class intervals is subjective but the number should be chosen carefully
9
10
Grouped frequency table: continuous variable (cont.)
Grouped frequency table: continuous variable Example 4: 4: draw a frequency table of wages (in USD) paid to 30 people as follows:
Wages (class interval)
Number of people (frequency)
< $100
2
$100 – < $200
5
$200 – <$300
8
429
$300 – <$400
9
216
$400 – <$500
5
398
282
$500 – <$600
1
338
209
Total
30
202
277
554
145
361
457
87
94
240
144
310
391
362
437
176
325
221
374
480
120
274
153
470
303
11
Terminology: Lower value: the lowest value of one class. Upper value: the highest value of one class Class interval: range from lower to upper value Open-ended class: the first or last classes in the range may be openended. That means they have no lower or upper values (e.g: <$100). Open-ended class is designed for uncommon value: too low or too high
12
2
Cumulative, percentage, and cumulative percentage frequency distribution
Frequency distribution: summary 1.
Simple frequency distribution: easy task and can either do manually or rely on statistical software
Wages (class interval)
2.
Grouped frequency distribution: more difficult. The hardest task is to decide the number of classes and class width or class intervals. Ideal: each class reflects differences in the nature of data. The more you work on it, the more reasonable classes’ number and size you decide
< $100
2
2
6.7
6.7
$100 – < $200
5
7
16.7
23.3
3.
The upper value of the previous class should not coincide with the lower value of the following class to make sure each value should only be in one class.
Number of people (frequency)
Cumulative frequency
Percentage frequency
Cumulative percentage frequency
$200 – <$300
8
15
26.7
50.0
$300 – <$400
9
24
30.0
80.0
$400 – <$500
5
29
16.7
96.7
$500 – <$600
1
30
3.3
100.0
Total
30
13
14
Bar and pie charts
Charts
Back to the UNSW survey example, create a bar and pie charts Reduce numbers of classes for easily visual look
Tools for qualitative and discrete data: • Simple bar charts • Pie charts
Number of students (frequency)
Nationality
Percentage frequency
T l for Tools f continuous ti data: d t
A t li & NZ Australia
180
26 24% 26.24%
• • • •
China
120
17.49%
South East Asia
227
33.09%
Histograms Stem-and-leaf plots Cumulative frequency curve (ogive) Boxplots (discussed in lecture 3)
India
11
1.60%
USA & Canada
14
2.04%
UK & Ireland
35
5.10%
Other Europe
42
6.12%
Rest of the world
57
8.31%
Total
686
100.00%
15
16
Pie charts: example of UNSW
Bar charts: example of UNSW
Percentage of inter.students at UNSW
Number of inter. students at UNSW
8.31%
6.12%
250
5.10%
Frequency
200
2 04% 2.04% 150
26.24%
1.60%
100
17.49%
50
33.09%
0 Australia & NZ
China
South East Asia
India
USA & Canada
UK & Ireland
Other Europe
Rest of the world
17
Australia & NZ
China
South East Asia
India
USA & Canada
UK & Ireland
Other Europe
Rest of the world
18
3
Equal--width histograms Equal
Histograms
Raw data => frequency table => histograms
A histogram looks like a bar charts except that the bars are joined together
All bars have the same width (the same class intervals) The height of each bar represents the frequency of the class intervals Using raw data in the example 4, draw a histogram representing wages
Two types yp of histograms: g
Equal-width histogram
Unequal-width histogram
19
20
Shapes of histograms – positive skew (long tail to right)
Shapes of Histograms - symmetric
Histogram of Positive skew
Histogram of Symmetric
35
50
30 25 Frequency
Frequency
40
30
20 15 10
20
5
10
0
0
-2.4
-1.6
-0.8
0.0 Symmetric
0.8
1.6
0.0
1.5
3.0
4.5 Positive skew
6.0
7.5
2.4
21
Shapes of histograms – negative skew (long tail to left)
22
Shapes of histograms - bimodal
Histogram of Negative skew
Histogram of Bimodal
35
25
30 20
Frequency
Frequency
25 20 15
15
10
10 5
5 0
0
3.0
4.5
6.0 Negative skew
7.5
9.0
23
-1.5
0.0
1.5
3.0 Bimodal
4.5
6.0
24
4
Stem-and-leaf display
Histogram terms
Modal class – class with highest number of observations Uni-modal, bi-modal, tri-modal, multi-modal Skewness, symmetry Relative frequency histogram: replace frequency for each class by class frequency/total number of obs.
Raw data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Rearranging data: 21 24, 21, 24 24, 24 26, 26 27, 27 27, 27 30, 30 32, 32 38, 38 41
Display stem-and-leaf
2
144677
3 028 4 1
25
Ogive
26
How to draw an ogive
Ogive is a cumulative frequency curve which shows number of items less than one particular value E.g. Having frequency table of salary => draw an ogive
Ogive is line chart of cumulative frequency and can be drawn in Excel using line graph Ogive
Frequency
Cumulative frequency
<100
22
22
100-<150
44
66
150-<200
79
145
200-<250
96
241
250-<300
44
285
300-<350
15
300
350 Cumulative frequency
Class
300 250 200 150 100 50 0 <100
100-<150
150-<200
200-<250
250-<300
300-<350
Value
27
28
5
Outline
Numerical summaries: Central tendency and dispersion
Measures of location: Mean, median, mode Selection of measures of location
Measures of dispersion: Range Range, quartile range, range quartile deviation, deviation variance, variance standard deviation
Reading materials: Chap 4 (Keller)
Chebyshev’s law Coefficient of variation Coefficient of skewness
1
2
Arithmetic mean
Measures of location (central tendency)
N
A measure of location shows where the centre of the data is
Arithmetic mean from population:
X i 1
i
N n
Three most useful measures of location:
Arithmetic mean from sample:
Arithmetic mean/average Median
Where:
Mode
x
x i 1
i
n
Xi, xi - the value of each item N, n - total number of items
3
4
Advantages and disadvantages of arithmetic mean
Easy example - mean
◦ Easy to understand and calculate ◦ Values of every items are included => representative for the whole set of data
Data: 5, 7, 1, 2, 4
1 n Xi n i 1 1 5 7 1 2 4 5 1 *19 5 3.8
X
Advantages:
Disadvantages sadva tages ◦ Sensitive to outliers: Sample: (43; 38; 37; : : : ; 27; 34): => x 33.5 Contaminated sample (43; 38; 37; : : : ; 27; 1934): => x 71.5 (Source: Slide #23, Dehon’s statistics lecture, Universite libre de Bruxelles, SBS-EM)
6
1
Median
Calculate median from raw data If the data has an odd number of observations:
Median is the value of the observation which is located in the middle of the data set
◦
Steps to find median:
1.
Median x( n 1))th
Arrange the observations in order of size (normally ascending order)
2.
Find the number of observations and hence the middle observation
3.
The median is the value of the middle observation
(n 1)th 2
Middle observation:
2
If the data has an even number of observations:
◦
There are two observations located in the middle and
M edian ( x
n 2
th
x
n 1 2
th
)/2
7
Example
8
Advantages and disadvantages of median
E.g1. Raw data: 11, 11, 13, 14, 17 => find median
E.g 2. Raw data: 11, 11, 13, 14, 16, 17 => find median
Advantages: ◦ Easy to understand and calculate ◦ Not affected by outlying values => thus can be used when th mean would the ld be b misleading i l di
Disadvantages ◦ Value of one observation => fails to reflect the whole data set ◦ Not easy to use in other analysis
9
10
Mode Example to calculate mode
Mode is the value which occurs most frequently in the data set
X
Steps to find mode
Frequency
8
3
12
7
1.
Draw a frequency table for the data
16
12
2.
Identify the mode as the most frequent value
17
8
19
5
11
12
2
Mean, median and mode in normal and skewed distributions
Bimodal and multimodal data
Multimodal (several modes)
Bimodal (two modes)
13
14
Which measure of centre is best?
Measures of dispersion
Mean generally most commonly used Sensitive to extreme values If data skewed/extreme values present, median better, e.g. real estate prices Mode generally best for categorical data – e.g. restaurant service quality (below): mode is very good. (ordinal)
Measures of dispersion tell you how spread out all other values of the distribution from the central tendency
Measures of dispersion
Rating
# customers
•
The range, quartile range, and quartile deviation
Excellent
20
•
Variance and standard deviation
Very good
50
Good
30
Satisfactory
12
Poor
10
Very Poor
6 15
16
Measures of dispersion
Why do we need measures of dispersion?
Two data sets of midterm marks of 5 students: ◦ First set: 100, 40, 40, 35, 35 => Mean: 50 => Measure of location is less representative, and thus less reliable ◦ Second set: 70, 55, 50, 40, 35 => Mean: 50 => Measure of location is more representative, and thus more reliable
17
Need to know the spread of other values around the central tendency, especially important in analysing stock market.
18
3
Variance
Range Range is the difference between the largest and smallest value => Sort data before computing range Formula: Range = maximum value - minimum value Advantages of Range: easy to calculate for ungrouped data. Disadvantages:
Variance from population:
Variance from sample
Advantages:
2
s2
( X i )2 N
( x x)
2
n 1
• Take into account all values • Easy to interpret the result.
◦ Take into account only two values ◦ Affected by one or two extreme values ◦ More difficult to calculate for grouped data
Disadvantages: the unit of variance has no meaning
19
20
Standard deviation ( )
Application of this in finance
Standard deviation (S.D) is the square root of variance S.D from population:
2
s s2
S D from sample: S.D
Advantages: • Overcome the disadvantage of meaningless unit of variance • The most widely used measure of dispersion (the bigger its value => the more spread out are the data)
Variance (or S.D) of an investment, can be used as a measure of risk e.g. on profits/return. Larger variance larger risk Usually, higher rate of return, higher risk
21
Chebyshev’s law or the law of 3
Example – 2 funds over 10 years
A 8.3 -6.2 20.9 -2.7 33.6 42.9 24.4 5.2
xB 12%
x A 16% s 280.34(%)
2
( x 1s) ( x 1s )
3.1 30.5
B 12.1 12 1 -2.8 -2 8 6.4 6 4 12.2 12 2 27.8 27 8 25.3 25 3 18.2 18 2 10.7 10 7 -1.3 -1 3 11.4 11 4
2 A
For a normal or symmetrical distribution: ◦ 68.26% of all obs fall within 1 standard deviation of the mean, i.e. in the range:
Rates of return
s A2 99.37(%) 2
Fund A: higher risk, but also higher average rate of return.
◦ 95.45% of all obs fall within 2 standard deviation of the mean, i.e. in the range: ◦
( x 2s) ( x 2s)
◦ 99.73% of all obs fall within 3 standard deviation of the mean, i.e. in the range:
( x 3s ) ( x 3s ) 24
4
Boxplot
Boxplots
Here is the Boxplot of height of international students studying at UNSW
Boxplot of Height
Need MEDIAN and QUARTILES to create a boxplot MEDIAN = middle of observations, i.e. ½ way through observations QUARTILES = mark quarter points of observations, i.e. ¼ (Q1) and ¾ (Q3) of the way through data [(n+1)/4; 3(n+1)/4] INTERQUARTILE RANGE = Q3-Q1 Whiskers: max length is 1.5*IQR; stretch from box to furthest data point (within this range) Points further out from box marked with stars; called outliers
200
190
whisker
Height
170
upper quartile
180
median
box
lower quartile
160
whisker 150
25
26
Coefficient of variation (C of V)
Shapes of Boxplots
Boxplot of Symmetric, Positive skew, Negative skew, Bimodal 5.0
Skewness/ symmetry Modality Range
Data
2.5
0.0
Standard deviation can compare dispersion of two distributions with similar mean
For distributions having diff. means, we use coefficient of variation to compare their dispersions
The bigger the coefficient of variation, the wider the dispersion
Eg: two sets of data having the following information: A
B
Mean
120
125
Standard deviation
50
51
-2.5
Which one is more spread out?
-5.0 Symmetric
Positive skew
Negative skew
Bimodal
27
Coefficient of skewness (C of S)
Coefficient of variation (cont.)
Formula: Coefficient of variation = standard deviation/mean =
28
s x
This measures the shape of distribution
There are some measures of skewness.
Below is a common one: Pearson’s coefficient of skewness. Coefficient of skewness = 3 x (mean-median)/standard deviation
C off VA = 0.417 0 417 andd C off VB=0.408 0 408 => > A iis more spreadd outt than B
29
If C of S is nearly +1 or -1, the distribution is highly skewed
If C of S is positive => distribution is skewed to the right (positive skew)
If C of S is negative => distribution is skewed to the left (negative skew)
30
5
Distribution shapes
Activity 1
10
Summary statistics of two data sets are as follows
6.3756
125.93
Freq uency 4 6
292.5
Standard deviation
2
21
20
Compute the Pearson’s coefficient of skewness of these data sets and describe their shapes of distribution
40
60
80
100
age
200
300
400
500
Nearly normal
31
Measure correlation between two variables
If we have two measurements on one observation. E.g. height and weight of a person, weekly income and amount spent on rent pper week. ◦ Scatterplot (discussed in lecture 8) ◦ Covariance ◦ Correlation
Values of covariance
If cov>0, then as X increases, Y increases; as X decreases, Y decreases (positive slope)
600
wages
Skewed to the right
32
Covariance Measures the strength of linear relationship between X and Y. Calculated as
n
co v( X , Y )
(X i 1
i
X )(Yi Y )
n 1 1 n X iYi nX Y n 1 i 1
Values of covariance
If cov<0, then as X increases, Y decreases; as X decreases, Y increases (negative slope)
Scatterplot of Positive vs X values
Scatterplot of Negative vs X values
100
50
80
40
60
30
Negative
Positive
0
0
22 4839 22.4839
150
294 3 294.3
Mean Median
quency Freq 100
Set 2: Wages of staffs
50
Set 1: Ages of students studying at UNSW
8
200
40
20 10
20
0
0 0
10
20
30 X values
40
50
0
10
20
30
40
50
X values
6
Values of covariance
Coefficient of Correlation
If cov=0, then as X changes, Y doesn’t change variables are not linearly related
Also measures strength of linear relationship between X and Y. Is bounded between -1 and +1. Calculated as
Scatterplot of Zero vs X values 1.0 0.5
Zero
0.0
-0.5 -1.0
COV ( X ,Y )
X Y
-1.5
,
r
co v( X , Y ) s X sY
-2.0 -2.5 0
10
20
30
40
50
X values
Example
If correlation equals….
If r=-1, perfect negative linear relationship If r=+1, perfect positive linear relationship If r=0, no LINEAR relationship
Total
Calculate covariance and correlation for the following data.
yi
xi x
1
7
-2.5
6.25
3
9
-7.5
2
5
-1.5 15
2 25 2.25
1
1
-1.5 15
3
5
4
4
5
2
0
17.5
0
24
-20
6
1
21
24
Covariance cont’d
xi x
2
yi y
xi x * 2 yi y y i y
xi
Summary Techniques for summarizing data Bar charts, pie charts Histograms and boxplots – shape of distribution
n
cov( X , Y )
(X i 1
i
n 1 n
sx2
X )(Yi Y )
(X i 1
i
n
(Y Y )
2
20 4 5
17.5 3.5 5
Correlation implies strong negative relationship – view graph over.
i 24 4.8 n 1 5 cov( x, y ) 4 r 0.976 sx s y 3.5 4.8
s y2
i 1
Centre, spread, modality, skewness
Cumulative Relative Density Function (Ogive) Numerical measures:
X )2
n 1
◦ Central tendency – mean, median, mode ◦ Dispersion – variance, standard deviation, coefficient of variation, range, interquartile range
Two sets of data: scatterplot, covariance, correlation
7
Why do we need to study probability and probability distribution?
Section 2
Probability and Random Variables
Prob is a crucial component to obtain information about pops from samples Prob provides the link between pops and samples. Eg: ◦ From sample means => infer pop means ◦ From a known pop => measure the likelihood of obtain a particular event or sample.
Reading materials: Chap 6, 7, 8 (Keller)
1
2
Terminology (1)
Terminology (2)
A random experiment is a process that results in a number of possible outcomes. None of which can be predicted with certainty. y Eg:
The sample space of a random experiment is a list of all possible outcomes Outcomes must be mutually exclusive and exhaustive. ◦ No two outcomes can both occur on any one trial ◦ All possible outcomes must be included
◦ Roll a die: outcomes 1, 2, 3, 4, 5, 6. ◦ Flip a coin: outcomes Heads, Tails ◦ Take an exam: pass or fail
E.g. roll a die: sample space: S={1, 2, 3, 4, 5, 6}.
3
4
Probabilities
Continued
Probability of an event=
An event is a collection of one or more simple (individual) outcomes or events. E.g. roll a die: event A = odd number comes up. Then A={1, 3, 5}.
In general, use sample space S={E1, E2,…, En} where there are n possible outcomes. Probability of an event Ei occurring on a single trial is written as P(Ei)
5
Number of favorable outcomes Total number of outcomes
For the sample space S, P(S)=1 E.g. roll a die: sample space: S={1, S {1, 2, 3, 4, 5, 6}. Example of events: Obtain the number ‘1’: A= {1} and P(A)= 1/6 Obtain an odd number: B={1, 3, 5} and P(B)=1/2 Obtain a number larger than 6: C={} and P(C)=0 Obtain a number smaller than 7: D={1, 2, 3, 4, 5, 6} and P(D)=1 6
1
Probabilities of Combined Events
Two rules about probabilities
The probability assigned to each simple event Ei must satisfy:
Consider two events, A and B. P(A or B) = P(A U B) = P(A union with B) = P(A occurs, or B occurs, or both occur) P(A and B) = P(A ∩ B) = P(A intersection with B) = P(A and B both occur)
1. 0 P Ei 1 for all i n
2.
PE 1
P(Ā)=P(Ac)= P(A complement) = P(A does not occur)
i
i 1
P(A|B)=P(A occurs given that B has occurred) 7
8
Joint Probabilities
Marginal Probabilities (1)
Eg: mutual funds (http://www.howtosavemoney.com/howdo-mutual-funds-work/) Probabilities
B1 = Mutual Fund outperforms market
B2 = Mutual fund does not outperform market
A1=Top-20 Top 20 MBA program
0.11
0.29
A2 = Not top-20 MBA program
0.06
0.54
Joint probabilities = P(A ∩ B) P(Mutual fund outperforms AND top-20 MBA)=0.11 P(Mutual fund outperforms AND not top-20)=0.06 P(Mutual fund not outperform AND top-20)=0.29 P(Mutual fund not outperform AND not top-20)=0.54
Probabilities
B1
B2
A1 A2
0.11
0.29
0.06
0.54
Marginal probabilities: ◦ Computed by adding across rows or down columns ◦ Named because they are calculated in the margins of the table
9
10
Marginal Probabilities (2)
Conditional probability
Probabilities
B1
B2
A1 A2
0.11
0.29
Totals 0.40
0.06
0.54
0.60
Totals
0.17
0.83
1.00
P(A1)=P(A1 and B1)+P(A1 and B2)=0.11+0.29=0.40 P(A2)=P(A2 and B1)+P(A2 and B2)=0.06+0.54=0.60 P(B1)=P(B1 and A1)+P(B1 and A2)=0.11+0.06=0.17 P(B2)=P(B2 and A1)+P(B2 and A2)=0.29+0.54=0.83
Conditional probability that A occurs, given that B has occurred: P ( A and B ) P A | B P(B) Want to see whether a fund managed by a graduate of a top-20 MBA program will outperform the market P(B1 | A1)
11
P(B1 and A1) 0.11 0.275 0.40 P( A1) 12
2
Some rules of probability
Independence
Additive rule: for the union of two events
P(A or B) P(A) P(B) P( A and B)
Multiplicative rule: for the joint prob. of two events:
P A| B
P(Aand B) P(Aand B) P A| B P(B) P B| A P(A) P(B)
Complement rule: A and its complement, Ā, so P(A)+P(Ā)=1; therefore P(Ā)=1-P(A)
Two events are independent if P(A|B)=P(A) or P(B|A)=P(B)
Note: If A and B are independent, independent then P(A and B) = P(A)*P(B) Note: only if indep! Then P(A|B) = [P(A and B)]/P(B) =[P(A)*P(B)] /P(B) =P(A)
13
Activity 1
14
Random Variables
Check whether the event that manager graduated from a top-20 MBA program is independent from the event that the fund outperforms p the market.
Imagine tossing three unbiased coins. S= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT) 8 equally likely outcomes. Let X = number of heads that occur. X can take values 0, 1, 2, 3. Actual value of X depends on chance – call it a random variable (r.v.) Definition: a random variable is a function that assigns a numeric value to each simple event in a sample space.
15
16
Notation
Discrete vs continuous R.V.s
Denote random variables (X, Y, ...) in upper case Denote actual realised values (x, y,...) in lower case
A discrete random variable has a countable number of possible values, e.g. number of heads, number of sales etc. A co continuous t uous random a do variable va ab e has as an a infinite number of possible values – number of elements in sample space is infinite as a result of continuous variation e.g. height, weight etc.
17
18
3
Discrete probability distributions
More on discrete prob distns
Definition: A table or formula listing all possible values that a discrete r.v. can take, together with the associated probabilities. E.g. E for f our toss three h coins i example: l
If x is the value taken by a r.v. X, then p(x)=P(X=x)= sum of all the probabilities associated with the simple events for which X=x. If a r.v. X can take values xi, then
0
x
1
P(X=x) 1/8
2
3
1. 0 p xi 1 for all xi
3/8 3/8 1/8
(Check the probability in the table)
2.
px 1 i
xi
19
20
Activity 2 o o
Describing the Probability Distribution
What is the probability of at most one head? What is the probability of at least one head? ead?
Expected value or mean of a discrete random variable, X, which takes on values x with probability p(xi) is:
E ( X ) xi p( xi ) all xi
21
22
Back to the coin tossing 0
x
P(X=x) 1/8
1
2
Rules for Expectations
3
3/8 3/8 1/8
E ( X ) xi p( xi )
all xi
0* 1* 2* 3* 1 8
12 8
3 8
3 8
1 8
1.5 23
If X and Y are random variables, and c is any constant, then the following hold: E(c)=c E(cX)=cE(X) E(X-Y)=E(X)-E(Y) E(X+Y)=E(X)+E(Y) E(XY)=E(X)*E(Y) only if X and Y are independent 24
4
Variance
Variance continued
Measures spread/dispersion of distribution Let X be a discrete random variable with values xi that occur with probability p(xi), and E(X) = μ. The variance of X is defined as
2 2 E X
2 2 E X
xi2 p xi 2
2 E X 2 xi p xi all x 2
all xi
i
25
26
Tossing three coins – again… x
0
1
2
3
P(X=x)
1/8
3/8
3/8
1/8
Laws for Variances
V X xi2 p xi 2 all xi
02 18 12 83 22 83 32 18 1.52 0.75
If X and Y are r.v.s and c is a constant,
11. 2. 3. 4.
V(c)=0 V(cX)=c²V(X) V(X+c)=V(X) V(X+Y)=V(X)+V(Y) if X and Y are independent V(X-Y)=V(X)+V(Y) if X and Y are independent
5.
Std Dev X 0.75 0.866 (to 3dp) 27
28
Bivariate Distributions
Example Toss three coins. Let X be the number of heads. Let Y be the number of changes of sequence, i the i.e. th number b off times ti we change h from f H →T or T→H.
Distribution of a single variable – univariate Distribution of two variables together – bivariate So, So if X and Y are discrete random variables, variables then we say p(x,y) = P(X=x and Y=y) is the joint probability that X=x and Y=y.
29
◦ ◦ ◦ ◦
HHH: x=3, y=0 HHT: x=2, y=1 HTH: x=2, y=2 THH: x=2, y=1
TTT: x=0, y=0 TTH: x=1, y=1 THT: x=1, y=2 HTT: x=1, y=1 30
5
Example continued
Bivariate probability distribution
Outcome (S) HHH
x 3
y 0
HHT
2
1
HTH
2
2
THH
2
1
TTH
1
1
THT
1
2
HTT TTT
1 0
1 0
y
x
0
1
2
px(x)
0
1/8
0
0
1/8
1
0
2/8
1/8
3/8
2
0
2/8
1/8
3/8
3
1/8
0
0
1/8
py(y)
2/8
4/8
2/8
1
31
32
Covariance
Independence of Random Variables
Consider the r.v.s X and Y with joint pdf p(x,y); x=x1,…,xm; y=y1,…,yn. If E(X) = µx and E(Y)= µy, then the covariance cova a ce between betwee X aandd Y iss ggiven ve by
If the random variables X and Y are independent, then P(X=x P(X x and Y Y=y) y) = P(X P(X=x) x) . P(Y P(Y=y) y) p(x,y) = px(x) . py(y) In previous example, X and Y are clearly not independent: p(0, 0) = 1/8 px(0) . py(0) = 1/8 * 2/8 = 1/32 p(0, 0) ≠ px(0) . py(0)
xy cov X , Y E X x Y y m
n
xi y j p xi , y j x . y i 1 j i
33
Correlation coefficient
Return to 3 coins tossed
Associated with covariance.
cov(( x, y )
x y
;
34
X = number of heads, Y = number of sequence changes. Check for yourself:
1 1
3 2 3 1 2 x , y2 4 2
x , y 1
35
36
6
Covariance for example m
The sum of two random variables
n
xy xi y j p xi , y j x . y
o
i 1 j i
X = the number of houses sold by Albert in a week Y = the th number b off houses h sold ld by b Beatrice B t i in i a week
1 2 1 2 1 1 3 0.0. 1.1. 1.2. 2.1. 2.2. 3.0. .1 8 8 8 8 8 8 2 12 3 0 8 2 cov(x, y )=0 and
Consider two real estate agents.
xy 0. x y
o
Bivariate distribution of X and Y shown on next slide
X and Y are uncorrelated.
37
38
Bivariate distribution of X and Y
We can show (check these at home!)
E(X)=0.7
V(X)=0.41
E(Y)=0.5
V(Y)=0.45
X Y 0
1
2
py(y)
0
0 12 0.12
0 42 0.42
0 06 0.06
0 60 0.60
1
0.21
0.06
0.03
0.30
2
0.07
0.02
0.01
0.10
px(x)
0.40
0.50
0.10
1
39
40
Suppose interest is in X+Y
Repeat this for 0, 1, 2, 3, 4…
That is, the total number of houses Albert and Beatrice sell in a week. Possible values of X+Y: 0, 1, 2, 3, 4. Then, P(X+Y=2) = sum of all joint probabilities for which x+y=2; That is P(X+Y=2) = p(0,2) + p(1,1) + p(2,0) = 0.07 + 0.06 + 0.06 =0.19
41
x+y
0
1
2
3
4
p(x+y)
0.12
0.63
0.19
0.05
0.01
Can evaluate mean and variance of (X+Y) E(X+Y) = 1.2 V(X+Y) = 0.56 (check these at home!) 42
7
Law of expected value and variance of the sum of two variables
Application of this – portfolio diversification and asset allocation
If a and b are constants, and A and Y are random variables, then
See Keller ◦ Pages 210-214 (7th edition)
In Finance, use variance and standard deviation to assess risk of an investment. Analysts reduce risk by diversifying their investments – that is, combining investments where the correlation is small.
E (aX bY ) aE ( X ) bE (Y ) V (aX bY ) a 2V ( X ) b2V (Y ) 2ab cov( X , Y )
43
44
Continuous probability distribution
About the function
Remember: discrete data has a limited (finite) number of possible values discrete probability distributions can be put in tables p Continuous data have an infinite number of possible values we use a smooth function, f(x) to describe the probabilities
1. 2 2.
f(x) must satisfy the following: f(x)≥0 for all x, that is, it must be nonnegative. The total area underneath the curve representing f(x) = 1.
45
46
Notes about continuous pdfs
Notes about continuous pdfs
1) P(a<X
2) For a continuous pdf, the probability that X will take any specific value is zero. Let a b – see that area 0.
b
P a X b f ( x)dx d a
47
48
8
Notes about continuous pdfs
The Normal Distribution
3) A continuous random variable has a mean and a variance! The mean measures the location of the distribution, the variance measures the spread of the distribution.
Bell-shaped, symmetric about µ, reaches highest point at x=µ, tends to zero as x→±∞.
49
50
Different means
Notes about the Normal Distribution
2. 3. 4. 5.
E(X) = µ; V(X) = σ². Area under curve = 1 Different means – shift curve up and down xaxis Different variances – curve becomes more peaked Shorthand notation: X~N(µ, σ²).
0.5
f(x)
1.
0 x
51
Different variances
52
Probabilities from the Normal Distribution (1)
Generally, we require probabilities P(X
1
σ=1 1 σ=0.5 f(x)
σ=2
0 -4
-2
0
2
4
x
53
54
9
So, need to find the area under the curve…(3)
OR we require (2)
P(a<X
That is, need to integrate as follows: b
Area =
a
b
1 x
1 f x dx e 2 a 2
2
dx.
Not easy to do!
55
56
Tabulated values
Tables made to provide probabilities.
However, obviously, different values needed for each different μ and σ² - infinite possible values, so impossible to have all the tables needed!
So we select one particular normal distribution – μ μ=00, σ σ² =11 – call this the Standard Normal Distribution, and tabulate all the probabilities for it.
Call a r.v. from this a Standard Normal r.v., use notation Z~N(0,1)
Now we just need a way to convert any other normal distribution to the standard normal – then we can use the existing tables
Standardising
The process of converting any Normal random variable to a Standard Normal Random Variable.
If X~N(μ,σ²), then use the linear transformation below:
Z
X
~ N (0,1)
57
58
Rules to find probabilities normal tables
Standardising (cont.)
So, for ANY random variable that comes from a normal distribution, if we subtract the mean and divide by the standard deviation, we get a r.v.~N(0,1). S th See the Z Z-table t bl in i Appendix A di B-8. B 8 This Thi Table T bl provides P(Z
z) for various values of z.
Symmetry
P(Z<-a) P(Z>a)
59
= P(Z>a)
= 1 – P(Z
P(
Total area under curve is 1, total area under each half of curve is 0.5, i.e. P(Z<0)=P(Z rel="nofollow">0)=0.5
Draw the curve, shade the area, break it up into areas you can find (differences or sums)
60
10
Examples using tables (1)
Examples using tables (2)
1) P(Z<1.5) = 0.9332 (from table)
2) P(Z>1) = 1 – P(Z<1) = 1 – 0.8413 (from tables) = 0.1587
61
62
Examples using tables (4)
Examples using tables (3)
4) P(1
3) P(Z<-1) = P(Z>1) by symmetry = 0.1587 (from (2))
63
= P(Z<1.5) – P(Z<1) = 0.9332 – 0.8413 = 0.0919
64
In general
Given X~N(μ,σ²), suppose we require P(X
Know that Z
X
~ N (0,1).
X a So, P X a P a PZ where Z ~ N (0,1). 65
11
Outline
Section 3
Distribution of sample means The central limit theorem
Sampling Distribution Reading materials: Chap 9 (Keller)
1
Distribution of Sample Means: example (1)
Another 50 observations; 1000 observations, on the time to complete a pizza order (2)
Data were collected on the time taken for a pizza order to be completed in minutes (from order taken to pizza handed over to customer). Below is a histogram of 50 observations and some summary statistics.
100
10
Frequency
Frequency
Frequency
10
5
50
5 0
0 6
8
10
12
14
16
18
20
22
24
26
10
20
Pizza time
30
Pizza time
0 10
12
14
16
18
20
22
24
26
Variable
Pizza time
Variable Pizza time
N 50
Mean 17.256
Median 17.041
StDev 3.743
N
Mean
Median
StDev
Pizza time
50
17.585
17.374
Variable
N
Mean
Median
17.934
17.627
Pizza time
1000
3.872 StDev 4.009
3
10,000 observations on the time to complete a pizza order (3)
4
In general (4) One thousand datasets, each with 10 observations in it (that is, 1 thousand samples of size 10) are generated (simulated data) from this model and for each sample, the average (sample mean), median (sample median) and sample standard deviation are calculated and recorded.
600 500
Freq quency
Variable average median di
400 300 200
N 1000 1000
Mean 18.007 17 17.757 757
Median 18.020 17 17.804 804
StDev 1.231 1 433 1.433
100 0 10
20
30
40
Pizza time 90 80
80
Mean 18.046
Median 17.744
StDev 4.006
70
70
60
60
Frequency
Variable N Pizza time 10000
Frequency
2
50 40 30 20
40 30 20
10
10
0
0
13
14
15
16
17
18
average
5
50
19
20
21
22
14
15
16
17
18
19
20
21
22
23
median
6
1
More random numbers
S.D for the 1000 random samples of size 10
Another thousand datasets are generated from the same model, but this time each dataset has 25 observations.
90 80 100
60
90
50
80
80 70 60
30 20 10
Frequency
70
40
Frequency
Frequ uency
70
60 50 40 30
0
0
2
3
4
5
6
7
15.5
N 1000
16.5
17.5
18.5
19.5
20.5
14
15
16
17
average
stdev
Variable stdev
30
10
10
1
40
20
20
0
50
Mean 3.8183
Median 3.7282
StDev 0.9505
18
Variable average
N Mean Median StDev 1000 17.991 17.982 0.814
median
1000
19
20
21
22
median
17.711
17.675
1.017
7
8
S.D for samples of size 25
Notices as we take larger samples…. samples….
The histograms for all three statistics (sample mean, sample median and sample standard deviation) are becoming more and more symmetric and bell-shaped and less variable, particularly those for the sample mean
Also notice that the estimated standard deviation of the sample mean is not only decreasing as sample size increases, but is also approximately the same for the same sample sizes.
70 60
Frequ uency
50 40 30 20 10 0 2
3
4
5
6
stdev
Variable stdev
N 1000
Mean 3.9637
Median 3.9391
StDev 0.6048
9
A general result of great importance
10
The Central Limit Theorem
No matter what model a random sample is taken from, as the sample size (number of random observations) increases, the distribution of the sample mean becomes closer and closer to the normal distribution,, and No matter what model a random sample is taken from, and for any sample size n, the standard deviation of the sample mean is the model standard deviation, , (the theoretical standard deviation) divided by n, that is, /n = rel="nofollow"> Called standard error of the means (SE).
11
Whatever the population dist. looks like (normal or not), when a sample size is large enough, the distribution of sample means will be normal and we can use Z-statistic to calculate probability of any mean value
12
2
So, how large does n need to be?
This is the Central Limit Theorem
If X is a random variable with a mean µ and variance σ², then in general,
Generally, it depends on the original distribution of X. ◦ If X has a normal distribution, then the sample mean has a normal distribution for all sample sizes. ◦ If X has a distribution that is close to normal, the approximation is good for small sample sizes (e.g. n=20).
2 X N , n X Z ~ N 0,1 as n . Z n
◦ If X has a distribution that is far from normal, the approximation requires larger sample sizes (e.g. n=50).
13
14
Activity 1
The average height of Vietnamese women is 1.6m, with a standard deviation of 0.2m. If I choose 25 women at random, what is the probability that their average height is less than 1 53m? 1.53m?
15
3
Outline
Estimation
Reading materials:
Concepts of estimation – point and interval estimators; unbiasedness and consistency Estimating the population mean when the population variance is known Estimating the population mean when the population variance is unknown Selecting the sample size
Chap 10 (Keller)
1
2
Recap: What size n?
Recap: The Central Limit Theorem
As n→∞, the distribution of the sample mean becomes Normal, with centre µ and standard deviation σ/√n. This happens regardless of the shape of the original population. i.e. X follows a Normal distribution with
If the distribution of X is normal, then for all n the sample mean will follow a normal distribution. If the distribution of X is VERY not normal, then we will need a large n for us to see the normality of the distribution of the sample mean. mean In all cases, as n gets larger, the distribution of the mean gets more normal.
E ( X ) and var( X )
2
n 3
4
Estimation
How does this help?
This means that if we have a large enough sample, we can always find out probabilities to do with the mean, since it will have a normal distribution no matter what the original distribution. distribution
The aim of estimation is to determine the approximate value of a parameter of the population using statistics calculated in respect of a sample drawn from that population. • As an example, l we estimate i the h mean off a population l i using i the mean of a sample drawn from that population. That is, the sample mean is an estimator of the population mean. • The actual statistic we calculate in respect of the sample is called an estimate of the population parameter. For example, a calculated sample mean is an estimate of the population mean.
5
6
1
Estimators
Desirable qualities of estimators
There are two types of estimators
Want our estimators to be precise and accurate Accurate: on average, our estimator is getting towards the true value Precise: our estimates are close together
Point estimate: a single value or point, i.e. sample mean = 4 is a point estimate of the population mean, µ. Interval estimate: Draws inferences about a population by estimating a parameter using an interval (range).
Sample mean is a precise and accurate estimator of the population mean. (Sometimes, accurate and precise together is referred to as unbiased.)
• E.g. We are 95% confidence that the unknown mean score lies between 56 and 78. 7
8
Interval estimators for , is known
Point and interval estimators
A point estimate is just that, an interval gives some idea of how sure we are. Interval estimator:
We know that
2 x ~ N , . n x So, Z ~ N 0,1 . n
Gi Give an iintervall (range) ( ) based b d on a sample l statistic i i This interval corresponds to a probability and this probability is never equal to 100%
9
Put these things together…. And rearranging…
Interval estimators (cont.)
10
We also know that, for a standard normal distribution, 95% of the area is contained between -1.96 and + 1.96.
P 1.96 Z 1.96 0.95 x P 1.96 1.96 0.95 n
P 1.96 Z 1.96 0.95
P 1.96
P x 1.96 11
n x 1.96 n x 1.96
n 0.95
n 0.95 12
2
P x 1.96 x 1.96 0.95 n n This is called a 95% confidence interval for μ. What this means:
Example 1
• In repeated sampling, 95% of the intervals createdd this hi way would ld contain i μ andd 5% would not.
Suppose we know from experience that a random variable X~N(μ, 1.66), and for a sample of size 10 from this population, the sample mean is 1.58. N Now,
P x 1.96 x 1.96 0.95 n n
Can change how confident we are by changing the 1.96 • Use 1.64 to get a 90% confidence interval • Use 2.57 to get a 99% confidence interval 13
General notation
x 1.96 P x 1.96 0.95 n n
1.66 1.66 P 1.58 1.96 1.58 1.96 0.95 10 10 P 0.78 2.38 0.95
14
In general, a 100(1-α)% confidence interval estimator for μ is given by P x Z / 2 x Z / 2 100(1 )% n n
Interpretation: If the experiment were carried out multiple times, 95% of the intervals created in this way would contain μ. Lower Confidence Limit: 0.78, Upper Confidence Limit: 2.38
Notations:
C o n f id e n c e le v e l: 1 0 0 (1 ) % th e p ro b . th a t a p a r a m e te r f a lls in to C I C I: x Z
/2
LCL: x Z /2
n
n
; U CL: x Z
/2
n
15
16
What does 100(1 100(1--α)% mean
What does Zα/2 mean?
If we want 95% confidence, α=0.05 (or 5%). If we want 90% confidence, α=0.10 (or 10%). 0%). If we want 99% confidence, α=0.01 (or 1%).
We want to find the middle 100(1- α)% area of the standard normal curve: ◦ So the area left in each tail will be α/2. po t which w c marks a s off o area a ea of o α/ α/2 in the t e tail ta ◦ Zα/2 iss tthee point ◦ Need to look up normal tables to find this!
17
18
3
Factors influence width of the interval
IMPORTANT!
fixed; can’t be changed Vary the sample size: as n gets bigger, the interval gets narrower. V Vary th confidence the fid l l If we wantt to level: t be b more confident, then we simply change the 1.96 to another number from the standard normal, 2.33 will give 98% confidence, 2.575 will give 99% confidence; increasing confidence will make the interval wider.
Remember that it is the INTERVAL that changes from sample to sample. µ is a fixed and constant value. It is either within the interval or not. You should interpret a 95% confidence interval as saying “In repeated sampling, 95% of such intervals created would contain the true population mean”.
19
20
1. A 95% confidence interval for the population mean height.
Example 2
Average height of a sample of 25 men is found to be 178cm. Assume that the standard deviation of male heights is known to be 10cm, and that heights follow a normal distribution. Find
P x 1.96 x 1.96 0.95 n n 10 10 P 178 1.96 1 96 178 11.96 96 95 00.95 25 25
1. A 95% confidence interval for the population mean height. 2. A 90% confidence interval for the population mean height.
P 174.08 181.92 0.95
So, in repeated sampling, we would expect 95% of the intervals created this way to contain μ.
21
22
2. A 90% confidence interval for the population mean height.
Interval estimators for , is unknown
P 1.645 Z 1.645 0.90,
that is Z / 2 1.645
P x 1.645 x 1.645 0.90 n n 10 10 178 1.645 P 178 1.645 0.90 25 25
t
P 174.71 181.29 0.90
We can’t simply substitute s in for σ, since X does not have a standard normal s n distribution! However, it does follow a known distribution: it follo s a t-distribution follows t distrib tion with ith n-11 degrees of freedom. freedom The statistic is called t-statistic:
So, in repeated sampling, we would expect 90% of the intervals created this way to contain μ. 23
x s/ n
24
4
About the t-distribution (2)
About the tt-distribution (1)
Found by Gossett, published under pseudonym “Student”. Called “Student’s t-distribution” It is symmetric around 0, mound shaped (like a normal), but has a higher variance than a normal distribution. The higher the degrees of freedom, the more normal the curve looks.
Normal distribution Bell-shaped Symmetric More spread out
t ((df = 13)) t (df = 5)
0
Z t
25
26
Hints for Using the tt-tables
Degree of freedom (df)
Number of obs whose value are free to vary after calculating the sample mean E.g X 2
◦
◦
X1 = 1 (or another value) X2 = 2 (or another value) X3 = 3 (can’t be changed)
Bottom row has df=∞; this is the standard normal probabilities.
If df is not on tables as exact, use whatever df is closest
◦ If df is very large, use Z tables even if σ is unknown
df = n -11 = 3 -1 =2
◦ Difference between values for large df is small ◦ E.g. df=74; would use values for df=70 as this is closest. Then say:
t0.05,74 t0.05,70 1.667 27
28
Confidence Interval for , is unknown
Example 3
s s P x t / 2 x t / 2 100(1 )% n n s s CI: x t / 2 x t / 2 n n
Note: (i) (ii)
A random sample, size n = 25, x = 50, = 8. Use 95% confidence level to estimate .
s
s s x t / 2 n n 8 8 50 2.0639 50 2.0639 25 25 46.69 53.30 x t / 2
The population must follow normal distribution to get t-statistic Use t-table to find t-value 29
30
5
Sample size required
Determine the sample size Suppose that before we gather data, we know that we want to get an average within a certain distance of the true population p p value. We can use the CLT to find the minimum sample size required to meet this condition, if the standard deviation of the population is known.
Example 4: Assume that the standard deviation of a population is 5. I want to estimate the true p population p mean lying y g in a range of 3, with 99% certainty. Step 1: set up the equation needed.
P X 3 0.99
31
32
Sample size continued
Sample size continued
Step 2: standardise.
Step 3: solve for n.
P Z 2.575 0.99
X 3 P 0.99 n n 3 P Z 0.99 5 n 3 n P Z 0.99 5
3 n 2.575 5 n (2.575*5) / 3 n 18.42
33
Therefore, I need a minimum sample size of 19 to be able to estimate the true population mean lying in CI of 3, with 99% certainty 34
Activity 1
Suppose that we know the standard deviation of men’s heights is 10cm. How many men should we measure to ensure that the sample p mean we obtain is no more than 2cm from the population mean with 99% confidence?
35
6
Outline
Section 4
Hypothesis Testing
Reading materials: Chap 11, 12 (Keller)
Hypothesis testing: basic concepts; Testing µ when is known Testingg µ when is unknown Testing for the difference of two means (independent samples)
2
1
Plan
Hypothesis testing
Making decisions in the face of uncertainty Hypothesis testing is a structure for making these decisions We have in mind two competing p g ideas – call these hypotheses
Collect data and use this to decide which idea is most likely to be correct Depending on the decision, we either will or will not carry an umbrella. Decision matrix – thinking about consequences.
◦ First idea: null hypothesis ◦ Second idea: alternative hypothesis
What actually happens (truth)
The ideas must be distinct; e.g.
What you decide
◦ Idea 1(H0): it will rain today ◦ Idea 2 (HA): it will not rain today
Take umbrella Don’t take umbrella
It rains
It doesn’t rain
4
3
An analogy for hypothesis testing – criminal law
In Statistics: Truth H0 true Accept H0 Decision Accept HA
Type 1 Error
HA true
Type 2 Error
α = significance level = P(type 1 error) β = 1 – power = P(type 2 error) Power=P(reject H0 when it is false) 5
Criminal law
Hypothesis testing
Accused is innocent
Null hypothesis
Accused is guilty
Alternative hypothesis
Gathering evidence
Gathering data
Build case – presenting and summarising evidence
Presenting a summarising data, building a test statistic 6
1
Analogy continued – outcomes (2)
Analogy continued – outcomes (1) Criminal law
Hypothesis testing
Accused is acquitted
Choose H0
Accused is convicted
Choose HA
Convict an innocent person Acquit a guilty person “Beyond reasonable doubt”
If we say we have a 95% chance of making the right decision, it means we have a 5% chance of making an error. But, what type of error do we have a 5% chance of making? A Type 1 error is considered to be more serious than a Type 2 error. Therefore, by convention, we set up testing so the probability of yp 1 error,, α,, is small;; makingg a Type Ideally, we would also like to have the probability of making a Type 2 error, β, small. But reducing chance of Type 1 error increases chance of Type 2 error;
Type 1 Error
Therefore, we choose to set
Type 2 Error
α
to 5% (i.e. a 5% chance we reject H0
when it is true), or some other fixed, low probability and ignore β
“95% Certainty of making the right decision” 7
8
Steps for hypothesis tests
Analogy continued – outcomes (3)
1.
2.
In hypothesis testing, we also make a “presumption of innocence”. This means that, when we test a hypothesis, we start by assuming null is true. Then, we gather data, and if we find enough evidence, we will reject the null hypothesis and accept the alternative hypothesis.
3.
4.
State null and alternative hypotheses Calculate test statistic Formulate a Decision Rule using either the Rejection Region, or p-value - found from appropriate distribution (std normal), or confidence interval approach Reach a conclusion regarding whether to accept the null or alternative hypothesis.
9
10
Rules for hypotheses
Testing µ when is known Example 1: A store manager is considering a new billing system for credit customers. New system will only be cost effective if mean monthly account is more than $170. Random sample of 400 monthly accounts gives sample average of $178. $178 Manager knows that accounts are approximately normally distributed, with standard deviation of $65. Can the manager conclude from this data that the new system will be cost effective? Want to find out if µ, true mean monthly account, is bigger than $170.
11
Null hypothesis: Always about a population value (greek letter) Always has an “=“
Al Alternative i hypothesis: h h i Always about a population value (greek letter) Has one of <, > or ≠ Looks like null, but “=“ has been replaced.
12
2
Applying the rules to example 1
Recap: The Central Limit Theorem
Null hypothesis H0:µ=170
Alternative hypothesis HA:µ>170
Having done this, the question now becomes: “is $178 far enough away from $170 to conclude that µ is bigger than $170?”
The central limit theorem says that a sample average has a normal distribution with a centre at µ and a standard deviation of / n . So, if we calculate the test statistic below, it should follow a standard normal distribution X Z ~ N 0 ,1 a s n . n
14
13
Applying this to the example (1)
Applying this to example 1 (2)
We have σ=65. We calculate a test statistic – this measures (in standardised units) how far from the hypothesised µ our sample average is. Formula:
Test statistic in this case: Z
X Z n
X 178 170 8 2.46 2 46 n 65 400 3.25
Z should follow a standard normal distribution IF the true µ is equal to the one in our null hypothesis.
15
Applying this to example 1
Decision Rule
16
Three methods – rejection region, p-value, or confidence interval Rejection region: We want to be 95% certain. This means a 5% chance of rejecting H0 when it is true. So, we find the EXTREME 5% of the standard normal (according to our alternative hypothesis) and this will be our rejection region.
17
Point that marks off top 5% of a standard normal is 1.645. So, we will reject the null hypothesis if our test statistics lies above 1.645. 1 645 Here, Test statistic = 2.46. So we reject the null hypothesis in favor of the alternative hypothesis. In other words, there is sufficient evidence to conclude that the mean monthly account is higher than $170 18
3
P-value approach (by hand or computer)
Applying this to example 1 (by hand)
This is probability of getting our test statistic or further away from middle if the null is true. Draw a diagram – it is the area more extreme than our test statistic, i.e. for the last example, p-value is P(Z>2 46) P(Z>2.46). Small p-value is evidence against the null hypothesis. Rule: If p-value < α, => reject null hypothesis; If p-value > α, => Do not reject null hypothesis
From the standard normal tables: P(Z>2.46)
= 1 – P(Z<2.46) = 1 – 0.9931 = 0.0069
This means that the probability of observing a sample mean at least as large as 178 for a population whose mean is 170 is 0.0069, or extremely small (much smaller than 0.05). Therefore, we reject the null and conclude that the mean monthly account is higher than $170 (the same conclusion as we did using the rejection region approach)
19
Confidence interval (CI) approach
20
Applying this to example 1
For a 5% significance level, we set up a rejection region:
65 65 < 178 1.96 400 400 171.63< <184.37 178-1.96
X X 1.96 or 1.96 n n
Acceptance Region is:
-1.96<
The 95% confident interval for μ is:
X 1.96 n
Then the 95% CI for is: X -1.96 / n < X 1.96 / n
Because µ does not lie b/w this CI, we reject the null in favor of the alternative
22
21
So if alternative is “≠ “ ≠“
One tailed vs two tailed tests
If the alternative hypothesis is “<“ or “>”
This is a one tailed test Rejection region will be in either upper or lower tail P-value is the probability of getting a more extreme result
Two sided or two tailed test Rejection Region will be Z<-Zα/2, Z>+Zα/2 P-value will be P(Z>T.S)+P(Z
If the alternative hypothesis is “≠” This is a two tailed test Rejection region needs to be split between both tails P-value will include an absolute value – i.e. will be the probability of getting further away from the hypothesised mean on either side
23
24
4
So if alternative is “> “>“
If alternative is “< “<“
Right tailed test Rejection Region will be Z>+Zα P-value will be P(Z>T.S)
Left tailed test Rejection Region will be Z<-Zα P-value will be P(Z
25
Example 2
Testing µ when is unknown
26
Similar to the case of estimation, we can substitute s in for σ and calculate the t-statistic. The basic process of hypothesis testing remains the g same,, with the followingg changes
Use the gssft.sav file to test the hypothesis that college graduates work a 40-hour work week.
Test statistic is now calculated as
t
X s n
It follows the t-distribution with n-1 degrees of freedom (use t-table to find rejection region or p-value). 27
28
Hypotheses, test statistic for 22-tailed test
CI for µ when σ is unknown
H0: µ=40 HA: µ 40 Here are results from SPSS
One-Sample p Test Test Value = 40
Number of hours worked last week
t
df
14.326
436
Sig. (2tailed)
Mean Difference
.000
6.995
95% Confidence Interval of the Difference Lower
Upper
6.04
29
7.96
Also use t-distribution for confidence intervals for µ when σ is unknown. If σ has been estimated from data, confidence interval will be of form s . n s s Or X t / 2,n 1 X t / 2,n 1 . n n X t / 2,n 1
30
5
Conclusion
Based on either t-statistic or p-value, or confidence interval approach, we reject the null hypothesis. In other words, there is sufficiently statistical evidence to conclude that full-time workers work more than 40 hours per week.
31
6
Outline
Section 5
Simple Regression:
Regression analysis
Reading materials:
Form of the general model Procedure in SPSS Interpretation of SPSS output T i significance Testing i ifi off a slope/intercept l /i Assumption checking
Multiple Regression: As above
Chap 17, 18 (Keller)
1
Regression analysis
Types of relationships
Regression analysis investigates whether and how variables are related to each other. More specifically, regression analysis can be used to:
2
Positive linear relationship
Negative linear relationship
Non-linear relationship
No relationship
• Determine whether the value of one variable has any effects on the values of another; • Determine whether, as one variable changes, another tend to increase or decrease? • Predict the values of one variable based on the values of one or more other variables.
E.g: • How price is related to product demand => making changes on price, how product demand will change? • How salary of staffs depend on their education and experience?
3
Simple linear relationship: example
Simple linear relationship
4
In simple linear relationship, we want to see whether a linear relationship exist b/w one dependent variable (Y) and one independent variable (X). Example: want to see whether the time persons have li d in lived i a city i (in (i years)) affects ff their h i attitude i d towards d that city in a linear manner. Attitude towards the city is measured on an 11-point scale (1=do not like, 11= very much like).
5
Respondent Number
Duration of Residence
Quality of infrastructure
1
10
3
Attitude Towards City 6
2
12
11
9
3
12
4
8
4
4
1
3
5
12
11
10
6
6
1
4
7
8
7
5
8
2
4
2
9
18
8
11
10
9
10
9
11
17
8
10
12
2
5
2 6
1
Steps in regression analysis
Simple linear regression: notation
Analyse the nature of the relationship b/w independent and dependent variables
2.
Make a scatterplot
3 3.
Formulate the mathematical model that describes the relationship b/w the independent and dependent variables
4.
Estimate and interpret the coefficients of the model
5.
Test the model
6.
Evaluate the strength of the relationship and prediction accuracy
1.
Simple regression – one predictor We have n observations. Xi = value of the independent variable on ith obs Yi= value of dependent variable on ith obs. sx=sample standard deviation of the independent variables sy=sample standard deviation of the dependent variables Y is the sample average of the independent variables X is the sample average of the dependent variables
7
8
Simple linear regression: Model
Simple linear regression: scatterplot
Step 2: Make a Scatterplot Example – city attitudes vs duration of residence
Step 3: Formulate the General Model a straight line to the data, fitting the following model:
Fit
Scatterplot of Attitude Towards City vs Duration of Residence
Intercept
11
Attitude Towards City
10 9
Error terms (Residual)
Yi 0 1 X i i
8 7 6 5
Slope
4 3
Slope
and intercept are estimated by the ordinary least squares (OLS) method.
2 0
5
10 Duration of Residence
15
20
9
OLS method and assumptions
10
Gauss--Markov assumptions Gauss
2 Want: i minimum
Y
Yi 0 1X i i
Observed value
i = error terms
Assumption on linear relation A0: linear model Assumption on the factor Cov( X , ) 0 A5: Exogeneity g y assumption: p Assumption on the error terms: A1 : E ( i ) 0 i 1,..., n
YX
A2 : Normality of error terms ~ N
0 1X i
A3 : Non-autocorrelation of error terms cov( i , j ) 0 i j A4 : Homoskedasticity Var( i ) 2 i 1,..., n
X
Source: Dehon’s lecture 11
12
2
Estimate the parameters
Applying this to example
Step 4: Estimate the parameters (slope and intercept) Yˆi ˆ 0 ˆ 1 X i Can calculate estimates of slope and intercept using formulae, which are derived from the OLS 1
n
n
n
i 1
i 1
i 1
n X iYi X i Yi
n n X X i i 1 i 1 0 Y 1 X n
Slope
= 16.333/27.697 = 0.5897 Intercept = 6.5833-0.5897*9.333 =1.0796
Fitted Equation: Yˆi 1.07960.5897*Xi
2
2 i
13
14
Step 5: Testing for significance of estimated parameters
Interpreting the coefficients
ˆ1= 0.5897 means that each additional year of staying in the city, your attitude towards city will increase by an average of 0.5897 points
ˆ0 = 1.0796 is the value when X=0. This means
Can test significance of linear relationship H0:β1=0 HA:β1≠0 Test Statistic:
that other reasons unrelated to the duration of residence make your attitude towards city equal to 1.0796 points.
T
ˆ1 1 sˆ
; where sˆ is the standard error of ˆ1.
Note: sometimes, ˆ0 makes non-sense when X=0, we don’t interpret the meaning of this coefficient.
Decision Rule: Compare to a t-distribution with n-2 degrees of freedom.
15
16
Applying this to example
So, rejection region will be t>2.2281 or t<-2.2281 for 5% significance (use df=10) OR from SPSS, p-value = 0.000. Conclusion: Reject the null hypothesis. hypothesis There is a significant linear relationship between duration of residence and attitude to the city.
ˆ1 1 sˆ
1
Decision rule
H0:β1=0 HA:β1≠0 Test Statistic: t
1
1
00.5897 5897 0 8.412 0.0701
Compare this t-value with the t-distribution to make decision rule.
17
18
3
Step 6: Determine the strength and significance of association
Applying this to example
Measured by r2 – coefficient of determination. r2 measures proportion of total variation (Y) explained by the variation in X, i.e.
r2
Here is outputs from SPSS
S = 1.22329 86 4% 86.4%
explained variation SS x total variation SS y
R-Sq = 87.6%
R-Sq(adj) =
So, 87.6% of variation in Y is explained by the variation in X.
19
Checking assumption
Step 6: Check prediction accuracy
Can use standard error of the estimate, sε.
s
20
SSres n k 1
Regression analysis makes several assumptions: Error terms normally distributed Error terms have mean 0, constant variance Error terms are independent
Interpretation: average residual; average error in predicting Y from the regression equation. Used to construct confidence intervals
These should be checked with plots (see multiple regression section)
◦ for mean value of Y for given X ◦ for all values of Y for given X
21
Multiple Regression
Example using SPSS
22
Use the cntry15.sav data file for SPSS practice.
Data: one dependent variable two or more independent variables
23
Example: Are consumers consumers’ perceptions of quality determined by the perceptions of prices, brand image and brand attributes?
24
4
Interpreting a Partial Regression Coefficient
Model – general form Y 0 1 X 1 2 X 2 k X k
Imagine a case with two predictors
which is estimated by ˆ ˆ Y 0 ˆ1 X 1 ˆ2 X 2 ˆk X k
Y 0 1 X 1 2 X 2
ˆ0 estimated intercept ˆ i estimated partial regression coefficient
is increased by one unit, but X 2 is held constant
1 represents the expected change in Y when X 1 or otherwise controlled.
As before, use least squares method to estimate parameters, minimise the error (residual) sum of squares. 25
26
Example 2
General Model
Attitude to city now being explained by
Duration of residence Quality of infrastructure
Let Y=attitude to city duration of residence X1=duration X2=quality of infrastructure
Y 0 1 X 1 2 X 2
27
Estimation (SPSS)
28
Strength of relationship (R2)
The regression equation is Attitude Towards City = 0.337 + 0.481 Duration of Residence + 0.289 quality of infrastructure
Coefficientsa Unstandardized Coefficients Model 1
B
Std Error Std.
Standardized Coefficients Beta
t .595
R2
Sig Sig.
(Constant)
.337
.567
duration
.481
.059
.764
8.160
.000
quality
.289
.086
.314
3.353
.008
.567
a. Dependent Variable: attitude
29
As before, is the proportion of variation explained by the model.
explained variation SS reg total variation SS y
In the example, 94.5% of variation in Y can be explained by the variation in X1 and X2 30
5
Points about R2
Significance Testing
Now called coefficient of multiple determination Will go up as we add more explanatory terms to the model whether they are “i “important” ” or not. Often we use “adjusted R2” – compensates for adding more variables, so is lower than R2 when variables are not “important”
Can test two different things 1. Significance of the overall regression 2. Significance of specific partial regression coefficients.
31
32
Applying this to example– example–SPSS output
1. Significance of the overall regression
H0: β1= β2= β3=…= βk=0 HA: not all slopes = 0
Test Statistic:
This is the test done in the ANOVA section of the output. In this case, we reject the null hypothesis – at least one of the slopes is significantly different from zero.
SSreg / k R2 / k F 2 SS /(n k 1) 1 R /(n k 1) Decision res Rule: Compared to an F-distribution with k, (n-k-1) degrees of freedom. If H0 is rejected, one or more slopes are not zero. Additional tests are needed to determine which slopes are significant. 33
2.
Applying this to example
Significance of specific partial regression coefficients.
H0: βi=0 HA: βi≠0 Test Statistic:
Coefficientsa Unstandardized Coefficients
ˆ i ˆi t i sˆ sˆ i
34
Model 1
i
Decision Rule: Compared to a t-distribution with (n-k-1) degrees of freedom (i.e. residual d.f.) If H0 is rejected, the slope of the ith variable is significantly different from zero. That is, once the other variables are considered, the ith predictor has a significant linear relationship with the response.
B
Std. Error
Standardized Coefficients Beta
t
Sig.
((Constant))
.337
.567
.595
duration
.481
.059
.764
8.160
.567 .000
quality
.289
.086
.314
3.353
.008
a. Dependent Variable: attitude
35
Once the quality of infrastructure is considered, the duration of residence still has a significant linear relationship with the attitude to a city. 36
6
Check residuals
Error terms normally distributed Can be checked by looking at a histogram of the residuals - look for bell-shaped distribution. Also normal probability plot – look for straight line. For preference, use standardised residuals – have a std dev of 1.
Assumptions made:
Error terms normally distributed Error terms have mean 0, constant variance Error terms are independent
Definition: A residual (also called error term) is the difference between the observed response value Yi, and the value predicted by the regression equation, Yˆi (Vertical distance between point and line.)
37
Error terms have mean 0, constant variance
38
Error terms are independent Check in previous plots; also in residuals vs time/order. Look for random scatter of residuals.
Checked by using plots of residuals vs predicted values; residuals vs independent variables. variables Look for random scatter of points around zero. If not, may indicate linear regression is not appropriate – may need to transform data
39
40
Example Residual Plots for Attitude Towards City Normal Probability Plot of the Residuals
Percent
90 50 10 1
Residuals Versus the Fitted Values Standardized Residual S
99
-2
-1 0 1 Standardized Residual
2
2 1 0 -1 -2 2
Histogram of the Residuals Standardized Residual
Frequency
3 2 1 0
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 Standardized Residual
2
4
6 8 Fitted Value
10
Residuals Versus the Order of the Data
2.0
2 1 0 -1 -2
1
2
3
4
5 6 7 8 9 Observation Order
10
11 12
41
7
8/13/2012
Outline
Section 6
Overview
of time series; hypothesis Autoregressive g processes; p ; Determining process order; Stationarity
Introduction to Time Series Reading materials: Chap 20 (Keller)
1
Different Types of Data Cross sectional data: You observe each member in your sample ONCE (usually but not ne cessarily at the same time).
◦
Examples of Cross Sectional Data: Observing the heights and weights of 1000 people. Observing the income, education, and experience of 1000 people. Observing the per-capita GDP, population, and real defence spending of 80 nations.
2
Overview of Time Series
◦ We will assume that the observations are made at equally spaced time intervals. This assumption enables us to use the interval between two successive observations as the unit of time. time
Time series: You observe each variable once per time period for a number of periods.
◦
Examples of Time Series
◦
Observing U.S. inflation and unemployment from 1961-1995. Observing the profitability of one firm over 20 years. Observing the daily closing price of gold over 30 years.
Pooled time series (="panel data"): You observe each member in your sample once per time period for a number of periods. Examples of Pooled Time Series (="panel data")
Time series is time-ordered data.
The total number of observations in a time series is called the length of the time series (or the length of the data). ◦ More Examples of Time Series: Daily closing stock prices; and, Monthly unemployment figures.
Observing the output and prices of 100 industries over 12 quarters. Observing the profitability of 20 firms over 20 years. Observing the annual rate of return of 300 mutual funds over the 1960-1997 period.
3
Overview of Time Series
Overview of Time Series
Univariate time series models:
◦ Model and predict financial variables using only information contained in their own past values and possibly current and past values of an error term.
4
Virtually any quantity recorded over time yields a time series. To "visualize" a time series we plot our observations as a function of the time. This is called a time plot.
5
Think of a time series stochastic~random process.
as
a
random
or
◦ Do not know the outcome until the experiment is implemented The closing value of next trading day of Dow Jones Index. The Th annuall output t t growth th off Malaysia M l i nextt year.
◦ When collecting a time series data set, we get one possible outcome under a certain number of conditions. Changing conditions => get different set of outcomes (different crosssectional samples from a population)
6
1
8/13/2012
Stationary time series
Stationary time series
Recall Gauss-Markov assumptions for OLS estimation of cross sectional data ◦ Error terms are normally distributed. If not, apply LLN and CLT
LLN and CLT hold for TS if the process satisfies stationary conditions
Strict stationary: A TS is stationary if the joint probability distribution of any set of times is not affected by an arbitrary shift along the time axis. More clearly: the joint distribution of ( yt , yt ,..., yt ) i the is th same as the th joint j i t distribution di t ib ti off 1
2
m
( yt1 h , yt2 h ,..., ytm h )
Weak or covariance stationary if covariances b/w y t and y for any h do not depend upon t. t h
7
Autoregressive Processes
Covariance stationary
Then: EYt V Yt E(Yt )2 0 covYt ,Yt k E(Yt )(Yt k ) k , k 1,2,3,... A t Autocorrelation: l ti standardising t d di i k gives i autocorrelation k as : k
8
An autoregressive model is one where the current value of a variable, y, depends only upon the values that the variable took in previous periods, plus an error term. For example, a first-order process as is where y is influenced by 1 lag. lag This is known as an AR(1) model, model or an autoregressive model of order 1. This is formalised below:
yt 1 yt 1 ut
cov( yt , ytk ) k 0 V yt
In general, an autoregressive model of order p, denoted AR(p) is expressed as:
Which measures dependency among observations or number of lags
yt 1 yt 1 1 yt 2 1 yt 3 ... 1 yt p ut
10
9
Autoregressive Processes
Determining Process Order
What does an AR(1) look like? What does a white noise process look like? How can the lag order be determined? 1. 2. 3. 4.
1.
Autocorrelation Function (ACF) ◦ ◦
ACF; PACF; AIC and SIC criteria; and, White noise residuals.
11
ACF measures the correlation between the current observation and the k’th lag. i.e. the correlation between yt and ytt-kk. For an AR process the ACF can decay slowly or rapidly, but it will decay geometrically to zero.
12
2
8/13/2012
Determining Process Order
Determining Process Order Partial Autocorrelation (PACF)
2.
Autocorrelation Function for ASX ALL ORDINARIES - PRICE IN
◦
(with 5% significance limits for the autocorrelations)
PACF measures the correlation between the observation k periods ago and the current observation, after controlling for observations at intermediate lags (i.e. all lags < k). For example, example the PACF for lag 3 would measure the correlation between yt and yt-3, after controlling for the effects of yt-1 and yt-2. Note: at lag 1, the autocorrelation and partial autocorrelation coefficients are equal, since there are no intermediate lag effects to eliminate.
1.0 0.8
Autocorrelation n
0.6 04 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1
10
20
30
40 Lag
50
60
70
80
14
13
Determining Process Order
Determining Process Order 3.
AIC and SIC criterion ◦
Partial Autocorrelation Function for ASX ALL ORDINARIES - PRICE IN
Akaike (AIC) and Schwarz information criterion (SIC)
(with 5% significance limits for the partial autocorrelations) 1.0
Partial Autocorrelation
0.8
A IC e 2k
0.6 04 0.4 0.0
◦
-0.2 -0.4 -0.8 -1.0 10
20
30
40 Lag
50
60
70
/ n
k = lag order n = # obs
1. Fit model with k lags. Calculate AIC and SIC; 2. Fit another model with k+1 or k-1 lags; and, 3. Best model will have lowest AIC/SIC.
-0.6
1
S Technique: IC n k
RSS n RSS n
/ n
0.2
80
15
16
Determining Process Order 4. White noise approach ◦ ◦
◦ ◦
Recall that yt is autocorrelated. If we have fitted the correct number of lags, (to take into account the autocorrelation), then there should be none left in the residuals. That is the residuals are white noise White noise properties; Homoscedastic; Constant mean; and, No autocorrelation.
17
3