Statistics Lecture.pdf

  • Uploaded by: Lee Yon
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Statistics Lecture.pdf as PDF for free.

More details

  • Words: 15,337
  • Pages: 51
Introduction to the Course 



Teaching staff:

Lectures: -

Statistics Vietnamese--Belgium program Vietnamese

Dr. Tran Thi Bich Department of Statistics, NEU Email: [email protected]

16th of Aug: 17th of Aug: 18thh of Aug: 19th of Aug:

6-9:00pm 6-9:00pm 9-12:00pm; 14-17:00pm 9-12:00pm; 14-17:00pm

Tutorials: 30 minutes - 1 hour, at the end of the lecture, from lecture 2 to 6  Text book: 

- Statistics for Management and Economics. 7th Edition, Keller. 

Assessment: one in-class exam at the end of the course

1

2

Outline

Section 1

Introduction to statistics  Basic concepts: variables and data  Getting acquainted with SPSS 

Introduction to Statistics and SPSS Reading materials: Chap 1, 2 (Keller)

3

4

Why is statistics important?

What is statistics

 

Statistics is all about collecting, organising and interpreting data





Statistics is a way to get information from data and make k ddecisions ii under d uncertainty i





Statistical analysis of data uses statistical modelling and probability; our main focus is on data and techniques for analysing data



 

 

5

Financial management (capital budgeting) Marketing management (pricing) Marketing research (consumer behaviour) Operations management (inventory) Accounting (forecasting sales) Human resources management (performance appraisal) Information systems Economics (summarising, predicting) 6

1

Basic concepts: variables and data

Types of statistics 

Descriptive statistics:



Collecting, organising, summarising, and presenting data  E.g: graphical techniques; numerical techniques

A variable is some characteristics of population or sample



Eg: • Height of female students • Occupation of students in this class



Data are the observed values of a variable



Eg: • Height of 10 female students: 1.6, 1.7, 1.55, 1.59, 1.5, 1.58, 1.64, 1.67, 1.58, 1.55 • Occupation of 5 students: teller, accountant, IT, marketing manager, teacher

Inferential statistics:



Estimating, predicting, and making decisions about population based on sample data  E.g: estimation; hypothesis testing

7

8

Qualitative data

Types of data



Data

Qualitative is the kind of data that cannot be measured (quantified)  Marital status: single, married, divorced, and widowed  Study performance of students: poor, fair, good, very good, excellent



Quantitative (also called Interval)

Q lit ti Qualitative

More classification: qualitative data can be classified as Nominal aandd Ordinal O d a data da a  Nominal data (also called categorical data): cannot be quantified with any meaningful unit - Marital status: single, married, divorced, and widowed

Nominal

Ordinal

Discrete

 Ordinal data: a sort of nominal data but their values are in order

Continuous

- Study performance of students: poor, fair, good, very good, excellent - Opinions of consumers: strongly disagree, somewhat disagree, neither disagree nor agree, agree, strongly agree

9

10

Quantitative data 

Activity 1

Quantitative (interval) data are real number (can be measured)  Eg:



 Mid-term test marks of 10 students: 7, 8, 10, 5, 5, 6, 8, 9, 9, 7  Weights of postal packages  Monthly salary



For each of the following examples of data, determine the type: i. The number of miles joggers run per week

More classification: quantitative data can be divided into two types: yp discrete or continuous

ii. The starting salaries of graduates of advanced program

◦ Discrete data: take only integer value

iii.The months in which a firm’s employees choose to take their vacations

 Eg:

iv.The occupation of graduates of advanced program

 Number of children in family: 1, 2, 4, 7, 2

v. Teachers’ ranking

 Number of owned houses

◦ Continuous data: can take any value  Eg:  Weights of postal packages  Monthly salary 11

12

2

Population versus sample 

Population versus sample (con.t) 

Population is a set of all items or people that share some common characteristics

A sample survey is obtained by collecting information of some members of the population - Collect the height of 1,000 Vietnamese citizens - Verify the quality of a proportion of products that are produced by factory X 2  Statistics: a descriptive measure of a sample (x, s )

 A census

is obtained by collecting information about every member of a population - Collect the height of Vietnamese citizens - Verify the quality of all products that are produced by factory X



A sample is a smaller group of the population.





Sampling: taking a sample from the population



An important requirement: a sample must be representative of the population. That means the profile of the sample is the same as that of the population

Parameter: a descriptive measure of a population ( ,  2 )

14

13

Moving from population to sample

Reasons to take sample

Population

Sampling frame (a list of all items of the population)



A census can give accurate data but collecting information from the entire population is sometimes impossible



A census is time-consuming and expensive



A sample allows to investigate more detailed information



A certain sample size ensures that results from the sample are as accurate as those of the population

Sample 15

Types of sample

16

Getting acquainted with SPSS



Random sampling => Random sample



Quasi random sampling Quasi-random



Import the file ‘assignment 1 data set.xls’ into SPSS and get familiar with SPSS.

Systematic sample St tifi d sample Stratified l Multistage sample

Quota sample



Non-random sampling Cluster sample

17

18

3

Data presentation:

Outline

Tables and charts 

Frequency distribution - Simple frequency table



Charts

- Grouped frequency table - Bar and pie charts - Histograms - Boxplot - Stem-and-leaf - Ogive

Reading materials: Chap 2, 3 (Keller)

1

Why do we have to summarise data 

Tables: frequency distribution

Recap



◦ In the previous chap you know how to collect data. Data collected through surveys are called ‘raw’ data.



◦ Raw data may include thous. obs and often provide too much information => need to summarise before presenting to audience 

Requirement ◦ Data summary clears away details but should give the overall pattern. ◦ Summarised information are concise but should reflect the accurate view of the original data



2



Frequency is the number of times a certain event has happened A frequency distribution records the number of times each value occurs and is presented in the form of table Types of frequency distribution: • Simple frequency distribution • Grouped frequency distribution • Cumulative, percentage, and cumulative percentage frequency distribution

Methods to summarise and present data ◦ Tables ◦ Charts ◦ Numerical summaries (measure of location and dispersion) 3

Simple frequency table: example 1

Simple frequency distribution 



4

Applications:

Marks

Number of students (frequency)

• Qualitative data

4

3

• Discrete variable with few values

5

3

6

2

7

4

8

3

9

2

10

3

Example of discrete variable with few values • You are given a raw data of midterm marks of 20 students as follows: 7, 7, 10, 8, 5, 4, 5, 6, 4, 9, 8, 7, 6, 4, 8, 5, 7, 10, 10, 9 • Create a simple frequency table manually

5

6

1

Simple frequency distribution: nominal variable

Simple frequency distribution: example 2 Nationality

Number of students (frequency)

Australia

     

Example 2: We have a data set of 686 international students studying at UNSW, Australia. Create a frequency table Large data set => can’t create a frequency table manually Creating a simple frequency table using SPSS Go to ‘Analyse’ => ‘Tables’ => ‘Tables of frequency’ When the dialog box appears, choose a variable for the box ‘Frequencies for’, then click OK Copy the table to Excel for more manipulations

179

New Zealand

1

Hong Kong

21

Singapore

48

Malaysia

70

Indonesia

76

Philippines

6

Thailand

18

China

99

Vietnam

9

India

11

USA, Canada

14

UK, Ireland

35

Other Europe

42

Rest of the world

57

Total

686

7

8

Grouped frequency table: discrete variable with many values (cont.)

Grouped frequency table: discrete variable with many values Example 3: 3: the marks scored by 58 candidates seeking promotion in a personnel selection test were recorded as follows. Construct a frequency table using a class width of ten marks

Marks (class interval)

Number of candidates (frequency)

37

49

58

59

56

79

21 – 30

2

62

82

53

58

34

45

31 – 40

11

40

43

44

50

42

61

41 – 50

17

54

30

49

54

76

47

51 – 60

20

64

53

64

54

60

39

61 – 70

5

49

44

47

44

25

38

71 – 80

2

55

57

54

55

59

40

81 – 90

1

31

41

53

47

58

55

Total

58

59

64

56

42

38

37

33

33

47

50

Note: Decision on the number of classes and class intervals is subjective but the number should be chosen carefully

9

10

Grouped frequency table: continuous variable (cont.)

Grouped frequency table: continuous variable Example 4: 4: draw a frequency table of wages (in USD) paid to 30 people as follows:

Wages (class interval)

Number of people (frequency)

< $100

2

$100 – < $200

5

$200 – <$300

8

429

$300 – <$400

9

216

$400 – <$500

5

398

282

$500 – <$600

1

338

209

Total

30

202

277

554

145

361

457

87

94

240

144

310

391

362

437

176

325

221

374

480

120

274

153

470

303

11

Terminology: Lower value: the lowest value of one class. Upper value: the highest value of one class Class interval: range from lower to upper value Open-ended class: the first or last classes in the range may be openended. That means they have no lower or upper values (e.g: <$100). Open-ended class is designed for uncommon value: too low or too high

12

2

Cumulative, percentage, and cumulative percentage frequency distribution

Frequency distribution: summary 1.

Simple frequency distribution: easy task and can either do manually or rely on statistical software

Wages (class interval)

2.

Grouped frequency distribution: more difficult. The hardest task is to decide the number of classes and class width or class intervals. Ideal: each class reflects differences in the nature of data. The more you work on it, the more reasonable classes’ number and size you decide

< $100

2

2

6.7

6.7

$100 – < $200

5

7

16.7

23.3

3.

The upper value of the previous class should not coincide with the lower value of the following class to make sure each value should only be in one class.

Number of people (frequency)

Cumulative frequency

Percentage frequency

Cumulative percentage frequency

$200 – <$300

8

15

26.7

50.0

$300 – <$400

9

24

30.0

80.0

$400 – <$500

5

29

16.7

96.7

$500 – <$600

1

30

3.3

100.0

Total

30

13

14

Bar and pie charts

Charts

Back to the UNSW survey example, create a bar and pie charts  Reduce numbers of classes for easily visual look 



Tools for qualitative and discrete data: • Simple bar charts • Pie charts



Number of students (frequency)

Nationality

Percentage frequency

T l for Tools f continuous ti data: d t

A t li & NZ Australia

180

26 24% 26.24%

• • • •

China

120

17.49%

South East Asia

227

33.09%

Histograms Stem-and-leaf plots Cumulative frequency curve (ogive) Boxplots (discussed in lecture 3)

India

11

1.60%

USA & Canada

14

2.04%

UK & Ireland

35

5.10%

Other Europe

42

6.12%

Rest of the world

57

8.31%

Total

686

100.00%

15

16

Pie charts: example of UNSW

Bar charts: example of UNSW

Percentage of inter.students at UNSW

Number of inter. students at UNSW

8.31%

6.12%

250

5.10%

Frequency

200

2 04% 2.04% 150

26.24%

1.60%

100

17.49%

50

33.09%

0 Australia & NZ

China

South East Asia

India

USA & Canada

UK & Ireland

Other Europe

Rest of the world

17

Australia & NZ

China

South East Asia

India

USA & Canada

UK & Ireland

Other Europe

Rest of the world

18

3

Equal--width histograms Equal

Histograms 

Raw data => frequency table => histograms



A histogram looks like a bar charts except that the bars are joined together



  

All bars have the same width (the same class intervals) The height of each bar represents the frequency of the class intervals Using raw data in the example 4, draw a histogram representing wages

Two types yp of histograms: g 

Equal-width histogram



Unequal-width histogram

19

20

Shapes of histograms – positive skew (long tail to right)

Shapes of Histograms - symmetric

Histogram of Positive skew

Histogram of Symmetric

35

50

30 25 Frequency

Frequency

40

30

20 15 10

20

5

10

0

0

-2.4

-1.6

-0.8

0.0 Symmetric

0.8

1.6

0.0

1.5

3.0

4.5 Positive skew

6.0

7.5

2.4

21

Shapes of histograms – negative skew (long tail to left)

22

Shapes of histograms - bimodal

Histogram of Negative skew

Histogram of Bimodal

35

25

30 20

Frequency

Frequency

25 20 15

15

10

10 5

5 0

0

3.0

4.5

6.0 Negative skew

7.5

9.0

23

-1.5

0.0

1.5

3.0 Bimodal

4.5

6.0

24

4

Stem-and-leaf display

Histogram terms    

Modal class – class with highest number of observations Uni-modal, bi-modal, tri-modal, multi-modal Skewness, symmetry Relative frequency histogram: replace frequency for each class by class frequency/total number of obs.

 Raw data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38

 Rearranging data: 21 24, 21, 24 24, 24 26, 26 27, 27 27, 27 30, 30 32, 32 38, 38 41

 Display stem-and-leaf

2

144677

3 028 4 1

25

Ogive

26

How to draw an ogive

Ogive is a cumulative frequency curve which shows number of items less than one particular value  E.g. Having frequency table of salary => draw an ogive 



Ogive is line chart of cumulative frequency and can be drawn in Excel using line graph Ogive

Frequency

Cumulative frequency

<100

22

22

100-<150

44

66

150-<200

79

145

200-<250

96

241

250-<300

44

285

300-<350

15

300

350 Cumulative frequency

Class

300 250 200 150 100 50 0 <100

100-<150

150-<200

200-<250

250-<300

300-<350

Value

27

28

5

Outline

Numerical summaries: Central tendency and dispersion



Measures of location:  Mean, median, mode  Selection of measures of location



Measures of dispersion:  Range Range, quartile range, range quartile deviation, deviation variance, variance standard deviation

  

Reading materials: Chap 4 (Keller)

Chebyshev’s law Coefficient of variation Coefficient of skewness

1

2

Arithmetic mean

Measures of location (central tendency)

N



A measure of location shows where the centre of the data is



Arithmetic mean from population:



X i 1

i

N n



Three most useful measures of location:



Arithmetic mean from sample:

 Arithmetic mean/average  Median

Where:

 Mode

x

x i 1

i

n

Xi, xi - the value of each item N, n - total number of items

3

4

Advantages and disadvantages of arithmetic mean

Easy example - mean  

◦ Easy to understand and calculate ◦ Values of every items are included => representative for the whole set of data

Data: 5, 7, 1, 2, 4

1 n  Xi n i 1 1  5  7  1  2  4 5 1  *19 5  3.8

X

Advantages:



Disadvantages sadva tages ◦ Sensitive to outliers: Sample: (43; 38; 37; : : : ; 27; 34): => x  33.5 Contaminated sample (43; 38; 37; : : : ; 27; 1934): => x  71.5 (Source: Slide #23, Dehon’s statistics lecture, Universite libre de Bruxelles, SBS-EM)

6

1

Median 

Calculate median from raw data If the data has an odd number of observations:



Median is the value of the observation which is located in the middle of the data set



Steps to find median:



1.

Median  x( n 1))th

Arrange the observations in order of size (normally ascending order)

2.

Find the number of observations and hence the middle observation

3.

The median is the value of the middle observation

(n  1)th 2

Middle observation:

2

If the data has an even number of observations:





There are two observations located in the middle and

M edian  ( x

n   2

th

x

n   1  2 

th

)/2

7

Example

8

Advantages and disadvantages of median 



E.g1. Raw data: 11, 11, 13, 14, 17 => find median



E.g 2. Raw data: 11, 11, 13, 14, 16, 17 => find median

Advantages: ◦ Easy to understand and calculate ◦ Not affected by outlying values => thus can be used when th mean would the ld be b misleading i l di



Disadvantages ◦ Value of one observation => fails to reflect the whole data set ◦ Not easy to use in other analysis

9

10

Mode Example to calculate mode 



Mode is the value which occurs most frequently in the data set

X

Steps to find mode

Frequency

8

3

12

7

1.

Draw a frequency table for the data

16

12

2.

Identify the mode as the most frequent value

17

8

19

5

11

12

2

Mean, median and mode in normal and skewed distributions

Bimodal and multimodal data

Multimodal (several modes)

Bimodal (two modes)

13

14

Which measure of centre is best?

Measures of dispersion

Mean generally most commonly used Sensitive to extreme values  If data skewed/extreme values present, median better, e.g. real estate prices  Mode generally best for categorical data – e.g. restaurant service quality (below): mode is very good. (ordinal)  



Measures of dispersion tell you how spread out all other values of the distribution from the central tendency



Measures of dispersion

Rating

# customers



The range, quartile range, and quartile deviation

Excellent

20



Variance and standard deviation

Very good

50

Good

30

Satisfactory

12

Poor

10

Very Poor

6 15

16

Measures of dispersion

Why do we need measures of dispersion? 

Two data sets of midterm marks of 5 students: ◦ First set: 100, 40, 40, 35, 35 => Mean: 50 => Measure of location is less representative, and thus less reliable ◦ Second set: 70, 55, 50, 40, 35 => Mean: 50 => Measure of location is more representative, and thus more reliable



17

Need to know the spread of other values around the central tendency, especially important in analysing stock market.

18

3

Variance

Range Range is the difference between the largest and smallest value => Sort data before computing range Formula: Range = maximum value - minimum value Advantages of Range: easy to calculate for ungrouped data. Disadvantages:

   



Variance from population:



Variance from sample



Advantages:

2  

s2

( X i   )2 N

 ( x  x) 

2

n 1

• Take into account all values • Easy to interpret the result. 

◦ Take into account only two values ◦ Affected by one or two extreme values ◦ More difficult to calculate for grouped data

Disadvantages: the unit of variance has no meaning

19

20

Standard deviation ( )  

Application of this in finance

Standard deviation (S.D) is the square root of variance S.D from population:



  2

 

s  s2



S D from sample: S.D



Advantages: • Overcome the disadvantage of meaningless unit of variance • The most widely used measure of dispersion (the bigger its value => the more spread out are the data)

Variance (or S.D) of an investment, can be used as a measure of risk e.g. on profits/return. Larger variance  larger risk Usually, higher rate of return, higher risk

21

Chebyshev’s law or the law of 3

Example – 2 funds over 10 years 



A 8.3 -6.2 20.9 -2.7 33.6 42.9 24.4 5.2

xB  12%

x A  16% s  280.34(%) 

2

( x  1s)  ( x  1s )

3.1 30.5

B 12.1 12 1 -2.8 -2 8 6.4 6 4 12.2 12 2 27.8 27 8 25.3 25 3 18.2 18 2 10.7 10 7 -1.3 -1 3 11.4 11 4

2 A

For a normal or symmetrical distribution: ◦ 68.26% of all obs fall within 1 standard deviation of the mean, i.e. in the range:

Rates of return

s A2  99.37(%) 2

Fund A: higher risk, but also higher average rate of return.

◦ 95.45% of all obs fall within 2 standard deviation of the mean, i.e. in the range: ◦

( x  2s)  ( x  2s)

◦ 99.73% of all obs fall within 3 standard deviation of the mean, i.e. in the range:

( x  3s )  ( x  3s ) 24

4

Boxplot

Boxplots

Here is the Boxplot of height of international students studying at UNSW



Boxplot of Height



Need MEDIAN and QUARTILES to create a boxplot MEDIAN = middle of observations, i.e. ½ way through observations QUARTILES = mark quarter points of observations, i.e. ¼ (Q1) and ¾ (Q3) of the way through data [(n+1)/4; 3(n+1)/4] INTERQUARTILE RANGE = Q3-Q1 Whiskers: max length is 1.5*IQR; stretch from box to furthest data point (within this range) Points further out from box marked with stars; called outliers



200

190

whisker



Height

170



upper quartile

180

median

box

 lower quartile

160

whisker 150

25

26

Coefficient of variation (C of V)

Shapes of Boxplots

Boxplot of Symmetric, Positive skew, Negative skew, Bimodal 5.0

Skewness/ symmetry  Modality  Range 

Data

2.5

0.0



Standard deviation can compare dispersion of two distributions with similar mean



For distributions having diff. means, we use coefficient of variation to compare their dispersions



The bigger the coefficient of variation, the wider the dispersion



Eg: two sets of data having the following information: A

B

Mean

120

125

Standard deviation

50

51

-2.5



Which one is more spread out?

-5.0 Symmetric

Positive skew

Negative skew

Bimodal

27

Coefficient of skewness (C of S)

Coefficient of variation (cont.) 

Formula: Coefficient of variation = standard deviation/mean =

28

s x



This measures the shape of distribution



There are some measures of skewness.



Below is a common one: Pearson’s coefficient of skewness. Coefficient of skewness = 3 x (mean-median)/standard deviation



C off VA = 0.417 0 417 andd C off VB=0.408 0 408 => > A iis more spreadd outt than B

29



If C of S is nearly +1 or -1, the distribution is highly skewed



If C of S is positive => distribution is skewed to the right (positive skew)



If C of S is negative => distribution is skewed to the left (negative skew)

30

5

Distribution shapes

Activity 1

10

Summary statistics of two data sets are as follows

6.3756

125.93

Freq uency 4 6

292.5

Standard deviation

2

21

20

Compute the Pearson’s coefficient of skewness of these data sets and describe their shapes of distribution

40

60

80

100

age

200

300

400

500

Nearly normal

31

Measure correlation between two variables 

If we have two measurements on one observation. E.g. height and weight of a person, weekly income and amount spent on rent pper week. ◦ Scatterplot (discussed in lecture 8) ◦ Covariance ◦ Correlation

Values of covariance 

If cov>0, then as X increases, Y increases; as X decreases, Y decreases (positive slope)

600

wages

Skewed to the right

32

Covariance Measures the strength of linear relationship between X and Y.  Calculated as 

n

co v( X , Y ) 

 (X i 1

i

 X )(Yi  Y )

n 1 1  n    X iYi  nX Y  n  1  i  1

Values of covariance 

If cov<0, then as X increases, Y decreases; as X decreases, Y increases (negative slope)

Scatterplot of Positive vs X values

Scatterplot of Negative vs X values

100

50

80

40

60

30

Negative

Positive



0

0

22 4839 22.4839

150

294 3 294.3

Mean Median

quency Freq 100

Set 2: Wages of staffs

50

Set 1: Ages of students studying at UNSW

8

200



40

20 10

20

0

0 0

10

20

30 X values

40

50

0

10

20

30

40

50

X values

6

Values of covariance 

Coefficient of Correlation

If cov=0, then as X changes, Y doesn’t change  variables are not linearly related

Also measures strength of linear relationship between X and Y.  Is bounded between -1 and +1.  Calculated as 

Scatterplot of Zero vs X values 1.0 0.5

Zero

0.0

 

-0.5 -1.0

COV ( X ,Y )

 X Y

-1.5

,

r 

co v( X , Y ) s X sY

-2.0 -2.5 0

10

20

30

40

50

X values

Example

If correlation equals….



If r=-1, perfect negative linear relationship  If r=+1, perfect positive linear relationship  If r=0, no LINEAR relationship 

Total

Calculate covariance and correlation for the following data.

yi

xi  x

1

7

-2.5

6.25

3

9

-7.5

2

5

-1.5 15

2 25 2.25

1

1

-1.5 15

3

5

4

4

5

2

0

17.5

0

24

-20

6

1

21

24

Covariance cont’d

 xi  x 

2

yi  y

 xi  x  * 2  yi  y   y i  y 

xi

Summary Techniques for summarizing data Bar charts, pie charts  Histograms and boxplots – shape of distribution 

n

cov( X , Y ) 

(X i 1

i

n 1 n

sx2 

 X )(Yi  Y )

(X i 1

i



n

 (Y  Y )

2



20  4 5

17.5  3.5 5



Correlation implies strong negative relationship – view graph over.

i 24   4.8 n 1 5 cov( x, y ) 4 r   0.976 sx s y 3.5 4.8

s y2 

i 1

 Centre, spread, modality, skewness

Cumulative Relative Density Function (Ogive)  Numerical measures: 

 X )2

n 1



◦ Central tendency – mean, median, mode ◦ Dispersion – variance, standard deviation, coefficient of variation, range, interquartile range 

Two sets of data: scatterplot, covariance, correlation

7

Why do we need to study probability and probability distribution?

Section 2



Probability and Random Variables

 

Prob is a crucial component to obtain information about pops from samples Prob provides the link between pops and samples. Eg: ◦ From sample means => infer pop means ◦ From a known pop => measure the likelihood of obtain a particular event or sample.

Reading materials: Chap 6, 7, 8 (Keller)

1

2

Terminology (1)

Terminology (2)

A random experiment is a process that results in a number of possible outcomes. None of which can be predicted with certainty. y  Eg: 

 

The sample space of a random experiment is a list of all possible outcomes Outcomes must be mutually exclusive and exhaustive. ◦ No two outcomes can both occur on any one trial ◦ All possible outcomes must be included

◦ Roll a die: outcomes 1, 2, 3, 4, 5, 6. ◦ Flip a coin: outcomes Heads, Tails ◦ Take an exam: pass or fail



E.g. roll a die: sample space: S={1, 2, 3, 4, 5, 6}.

3

4

Probabilities

Continued

Probability of an event=

An event is a collection of one or more simple (individual) outcomes or events.  E.g. roll a die: event A = odd number comes up. Then A={1, 3, 5}. 

 

In general, use sample space S={E1, E2,…, En} where there are n possible outcomes.  Probability of an event Ei occurring on a single trial is written as P(Ei) 

5

Number of favorable outcomes Total number of outcomes

For the sample space S, P(S)=1 E.g. roll a die: sample space: S={1, S {1, 2, 3, 4, 5, 6}. Example of events: Obtain the number ‘1’: A= {1} and P(A)= 1/6 Obtain an odd number: B={1, 3, 5} and P(B)=1/2 Obtain a number larger than 6: C={} and P(C)=0 Obtain a number smaller than 7: D={1, 2, 3, 4, 5, 6} and P(D)=1 6

1

Probabilities of Combined Events

Two rules about probabilities 

The probability assigned to each simple event Ei must satisfy:



Consider two events, A and B. P(A or B) = P(A U B) = P(A union with B) = P(A occurs, or B occurs, or both occur) P(A and B) = P(A ∩ B) = P(A intersection with B) = P(A and B both occur)

1. 0  P  Ei   1 for all i n

2.

 PE  1

P(Ā)=P(Ac)= P(A complement) = P(A does not occur)

i

i 1

P(A|B)=P(A occurs given that B has occurred) 7

8

Joint Probabilities 

Marginal Probabilities (1)

Eg: mutual funds (http://www.howtosavemoney.com/howdo-mutual-funds-work/) Probabilities

B1 = Mutual Fund outperforms market

B2 = Mutual fund does not outperform market

A1=Top-20 Top 20 MBA program

0.11

0.29

A2 = Not top-20 MBA program

0.06

0.54



Joint probabilities = P(A ∩ B) P(Mutual fund outperforms AND top-20 MBA)=0.11 P(Mutual fund outperforms AND not top-20)=0.06 P(Mutual fund not outperform AND top-20)=0.29 P(Mutual fund not outperform AND not top-20)=0.54



Probabilities

B1

B2

A1 A2

0.11

0.29

0.06

0.54

Marginal probabilities: ◦ Computed by adding across rows or down columns ◦ Named because they are calculated in the margins of the table

9

10

Marginal Probabilities (2)

Conditional probability 

Probabilities

B1

B2

A1 A2

0.11

0.29

Totals 0.40

0.06

0.54

0.60

Totals

0.17

0.83

1.00



P(A1)=P(A1 and B1)+P(A1 and B2)=0.11+0.29=0.40 P(A2)=P(A2 and B1)+P(A2 and B2)=0.06+0.54=0.60 P(B1)=P(B1 and A1)+P(B1 and A2)=0.11+0.06=0.17 P(B2)=P(B2 and A1)+P(B2 and A2)=0.29+0.54=0.83

Conditional probability that A occurs, given that B has occurred: P ( A and B ) P A | B   P(B) Want to see whether a fund managed by a graduate of a top-20 MBA program will outperform the market P(B1 | A1) 

11

P(B1 and A1) 0.11   0.275 0.40 P( A1) 12

2

Some rules of probability 

Independence

Additive rule: for the union of two events

P(A or B)  P(A)  P(B)  P( A and B) 

Multiplicative rule: for the joint prob. of two events:

P A| B  



P(Aand B)  P(Aand B)  P A| B P(B)  P B| A P(A) P(B)

Complement rule: A and its complement, Ā, so P(A)+P(Ā)=1; therefore P(Ā)=1-P(A)

Two events are independent if P(A|B)=P(A) or P(B|A)=P(B)

Note: If A and B are independent, independent then P(A and B) = P(A)*P(B) Note: only if indep! Then P(A|B) = [P(A and B)]/P(B) =[P(A)*P(B)] /P(B) =P(A)

13

Activity 1 

14

Random Variables

Check whether the event that manager graduated from a top-20 MBA program is independent from the event that the fund outperforms p the market.

 Imagine tossing three unbiased coins. S= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT)  8 equally likely outcomes.  Let X = number of heads that occur.  X can take values 0, 1, 2, 3.  Actual value of X depends on chance – call it a random variable (r.v.)  Definition: a random variable is a function that assigns a numeric value to each simple event in a sample space.

15

16

Notation

Discrete vs continuous R.V.s

Denote random variables (X, Y, ...) in upper case  Denote actual realised values (x, y,...) in lower case



A discrete random variable has a countable number of possible values, e.g. number of heads, number of sales etc.  A co continuous t uous random a do variable va ab e has as an a infinite number of possible values – number of elements in sample space is infinite as a result of continuous variation e.g. height, weight etc.



17

18

3

Discrete probability distributions

More on discrete prob distns

Definition: A table or formula listing all possible values that a discrete r.v. can take, together with the associated probabilities.  E.g. E for f our toss three h coins i example: l



If x is the value taken by a r.v. X, then p(x)=P(X=x)= sum of all the probabilities associated with the simple events for which X=x.  If a r.v. X can take values xi, then



0

x

1

P(X=x) 1/8

2

3

1. 0  p  xi   1 for all xi

3/8 3/8 1/8

(Check the probability in the table)

2.

 px  1 i

xi

19

20

Activity 2 o o

Describing the Probability Distribution

What is the probability of at most one head? What is the probability of at least one head? ead?



Expected value or mean of a discrete random variable, X, which takes on values x with probability p(xi) is:

  E ( X )   xi  p( xi ) all xi

21

22

Back to the coin tossing 0

x

P(X=x) 1/8

1

2

Rules for Expectations

3

3/8 3/8 1/8

 

  E ( X )   xi  p( xi )

 

all xi

 0*  1*  2*  3* 1 8



12 8

3 8

3 8



1 8

 1.5 23

If X and Y are random variables, and c is any constant, then the following hold: E(c)=c E(cX)=cE(X) E(X-Y)=E(X)-E(Y) E(X+Y)=E(X)+E(Y) E(XY)=E(X)*E(Y) only if X and Y are independent 24

4

Variance

Variance continued

Measures spread/dispersion of distribution  Let X be a discrete random variable with values xi that occur with probability p(xi), and E(X) = μ.  The variance of X is defined as 

2  2  E  X    

  2 2  E  X   

   xi2  p  xi    2

2   E  X       2    xi     p  xi     all x 2

all xi

i

25

26

Tossing three coins – again… x

0

1

2

3

P(X=x)

1/8

3/8

3/8

1/8

Laws for Variances

V  X     xi2  p  xi    2 all xi

 02  18  12  83  22  83  32  18   1.52  0.75



If X and Y are r.v.s and c is a constant,

11. 2. 3. 4.

V(c)=0 V(cX)=c²V(X) V(X+c)=V(X) V(X+Y)=V(X)+V(Y) if X and Y are independent V(X-Y)=V(X)+V(Y) if X and Y are independent

5.

Std Dev  X   0.75  0.866 (to 3dp) 27

28

Bivariate Distributions

Example Toss three coins. Let X be the number of heads.  Let Y be the number of changes of sequence, i the i.e. th number b off times ti we change h from f H →T or T→H. 

Distribution of a single variable – univariate  Distribution of two variables together – bivariate  So, So if X and Y are discrete random variables, variables then we say p(x,y) = P(X=x and Y=y) is the joint probability that X=x and Y=y. 

29



◦ ◦ ◦ ◦

HHH: x=3, y=0 HHT: x=2, y=1 HTH: x=2, y=2 THH: x=2, y=1

TTT: x=0, y=0 TTH: x=1, y=1 THT: x=1, y=2 HTT: x=1, y=1 30

5

Example continued

Bivariate probability distribution

Outcome (S) HHH

x 3

y 0

HHT

2

1

HTH

2

2

THH

2

1

TTH

1

1

THT

1

2

HTT TTT

1 0

1 0

y

x

0

1

2

px(x)

0

1/8

0

0

1/8

1

0

2/8

1/8

3/8

2

0

2/8

1/8

3/8

3

1/8

0

0

1/8

py(y)

2/8

4/8

2/8

1

31

32

Covariance

Independence of Random Variables

Consider the r.v.s X and Y with joint pdf p(x,y); x=x1,…,xm; y=y1,…,yn.  If E(X) = µx and E(Y)= µy, then the covariance cova a ce between betwee X aandd Y iss ggiven ve by 

If the random variables X and Y are independent, then P(X=x P(X x and Y Y=y) y) = P(X P(X=x) x) . P(Y P(Y=y) y) p(x,y) = px(x) . py(y)  In previous example, X and Y are clearly not independent: p(0, 0) = 1/8 px(0) . py(0) = 1/8 * 2/8 = 1/32 p(0, 0) ≠ px(0) . py(0) 

 xy  cov  X , Y   E  X   x  Y   y   m

n

  xi y j p  xi , y j    x . y i 1 j i

33

Correlation coefficient 

Return to 3 coins tossed

Associated with covariance.



cov(( x, y )

 x y

;

34

X = number of heads, Y = number of sequence changes.  Check for yourself: 

1    1

3 2 3 1 2  x  ,  y2  4 2

x  ,  y  1

35

36

6

Covariance for example m

The sum of two random variables

n

 xy   xi y j p  xi , y j    x . y

o

i 1 j i

X = the number of houses sold by Albert in a week Y = the th number b off houses h sold ld by b Beatrice B t i in i a week

1 2 1 2 1 1 3  0.0.  1.1.  1.2.  2.1.  2.2.  3.0.  .1 8 8 8 8 8 8 2 12 3   0 8 2 cov(x, y )=0 and  



Consider two real estate agents.

 xy  0.  x y

o

Bivariate distribution of X and Y shown on next slide

X and Y are uncorrelated.

37

38

Bivariate distribution of X and Y

We can show (check these at home!) 

E(X)=0.7



V(X)=0.41



E(Y)=0.5



V(Y)=0.45

X Y 0

1

2

py(y)

0

0 12 0.12

0 42 0.42

0 06 0.06

0 60 0.60

1

0.21

0.06

0.03

0.30

2

0.07

0.02

0.01

0.10

px(x)

0.40

0.50

0.10

1

39

40

Suppose interest is in X+Y

Repeat this for 0, 1, 2, 3, 4…

That is, the total number of houses Albert and Beatrice sell in a week.  Possible values of X+Y: 0, 1, 2, 3, 4.  Then, P(X+Y=2) = sum of all joint probabilities for which x+y=2;  That is P(X+Y=2) = p(0,2) + p(1,1) + p(2,0) = 0.07 + 0.06 + 0.06 =0.19 

41



x+y

0

1

2

3

4

p(x+y)

0.12

0.63

0.19

0.05

0.01

Can evaluate mean and variance of (X+Y) E(X+Y) = 1.2 V(X+Y) = 0.56 (check these at home!) 42

7

Law of expected value and variance of the sum of two variables

Application of this – portfolio diversification and asset allocation

If a and b are constants, and A and Y are random variables, then



See Keller ◦ Pages 210-214 (7th edition)

In Finance, use variance and standard deviation to assess risk of an investment.  Analysts reduce risk by diversifying their investments – that is, combining investments where the correlation is small. 

E (aX  bY )  aE ( X )  bE (Y ) V (aX  bY )  a 2V ( X )  b2V (Y )  2ab cov( X , Y )

43

44

Continuous probability distribution

About the function

Remember: discrete data has a limited (finite) number of possible values  discrete probability distributions can be put in tables p  Continuous data have an infinite number of possible values  we use a smooth function, f(x) to describe the probabilities





1. 2 2.

f(x) must satisfy the following: f(x)≥0 for all x, that is, it must be nonnegative. The total area underneath the curve representing f(x) = 1.

45

46

Notes about continuous pdfs

Notes about continuous pdfs

1) P(a<X
2) For a continuous pdf, the probability that X will take any specific value is zero. Let a b – see that area  0.

b

P  a  X  b    f ( x)dx d a

47

48

8

Notes about continuous pdfs

The Normal Distribution

3) A continuous random variable has a mean and a variance! The mean measures the location of the distribution, the variance measures the spread of the distribution.



Bell-shaped, symmetric about µ, reaches highest point at x=µ, tends to zero as x→±∞.

49

50

Different means

Notes about the Normal Distribution

2. 3. 4. 5.

E(X) = µ; V(X) = σ². Area under curve = 1 Different means – shift curve up and down xaxis Different variances – curve becomes more peaked Shorthand notation: X~N(µ, σ²).

0.5

f(x)

1.

0 x

51

Different variances

52

Probabilities from the Normal Distribution (1) 

Generally, we require probabilities P(X
1

σ=1 1 σ=0.5 f(x)

σ=2

0 -4

-2

0

2

4

x

53

54

9

So, need to find the area under the curve…(3)

OR we require (2) 



P(a<X
That is, need to integrate as follows: b

Area =

 a



b

1  x    

  1 f  x dx   e 2 a  2

2

dx.

Not easy to do!

55

56

Tabulated values 

Tables made to provide probabilities.



However, obviously, different values needed for each different μ and σ² - infinite possible values, so impossible to have all the tables needed!



So we select one particular normal distribution – μ μ=00, σ σ² =11 – call this the Standard Normal Distribution, and tabulate all the probabilities for it.



Call a r.v. from this a Standard Normal r.v., use notation Z~N(0,1)



Now we just need a way to convert any other normal distribution to the standard normal – then we can use the existing tables

Standardising 

The process of converting any Normal random variable to a Standard Normal Random Variable.



If X~N(μ,σ²), then use the linear transformation below:

Z

X 



~ N (0,1)

57

58

Rules to find probabilities normal tables

Standardising (cont.) 

 

So, for ANY random variable that comes from a normal distribution, if we subtract the mean and divide by the standard deviation, we get a r.v.~N(0,1). S th See the Z Z-table t bl in i Appendix A di B-8. B 8 This Thi Table T bl provides P(Zz) for various values of z.



Symmetry

 P(Z<-a)  P(Z>a)

59

= P(Z>a)

= 1 – P(Z


P(


Total area under curve is 1, total area under each half of curve is 0.5, i.e. P(Z<0)=P(Z rel="nofollow">0)=0.5



Draw the curve, shade the area, break it up into areas you can find (differences or sums)

60

10

Examples using tables (1)

Examples using tables (2)

1) P(Z<1.5) = 0.9332 (from table)

2) P(Z>1) = 1 – P(Z<1) = 1 – 0.8413 (from tables) = 0.1587

61

62

Examples using tables (4)

Examples using tables (3)

4) P(1
3) P(Z<-1) = P(Z>1) by symmetry = 0.1587 (from (2))

63

= P(Z<1.5) – P(Z<1) = 0.9332 – 0.8413 = 0.0919

64

In general 

Given X~N(μ,σ²), suppose we require P(X
Know that Z 

X 



~ N (0,1).

 X  a   So, P  X  a   P      a    PZ  where Z ~ N (0,1).    65

11

Outline

Section 3

Distribution of sample means  The central limit theorem 

Sampling Distribution Reading materials: Chap 9 (Keller)

1

Distribution of Sample Means: example (1)

Another 50 observations; 1000 observations, on the time to complete a pizza order (2)

Data were collected on the time taken for a pizza order to be completed in minutes (from order taken to pizza handed over to customer). Below is a histogram of 50 observations and some summary statistics.

100

10

Frequency

Frequency

Frequency

10

5

50

5 0

0 6

8

10

12

14

16

18

20

22

24

26

10

20

Pizza time

30

Pizza time

0 10

12

14

16

18

20

22

24

26

Variable

Pizza time

Variable Pizza time

N 50

Mean 17.256

Median 17.041

StDev 3.743

N

Mean

Median

StDev

Pizza time

50

17.585

17.374

Variable

N

Mean

Median

17.934

17.627

Pizza time

1000

3.872 StDev 4.009

3

10,000 observations on the time to complete a pizza order (3)

4

In general (4) One thousand datasets, each with 10 observations in it (that is, 1 thousand samples of size 10) are generated (simulated data) from this model and for each sample, the average (sample mean), median (sample median) and sample standard deviation are calculated and recorded.



600 500

Freq quency

Variable average median di

400 300 200

N 1000 1000

Mean 18.007 17 17.757 757

Median 18.020 17 17.804 804

StDev 1.231 1 433 1.433

100 0 10

20

30

40

Pizza time 90 80

80

Mean 18.046

Median 17.744

StDev 4.006

70

70

60

60

Frequency

Variable N Pizza time 10000

Frequency



2

50 40 30 20

40 30 20

10

10

0

0

13

14

15

16

17

18

average

5

50

19

20

21

22

14

15

16

17

18

19

20

21

22

23

median

6

1

More random numbers

S.D for the 1000 random samples of size 10 

Another thousand datasets are generated from the same model, but this time each dataset has 25 observations.

90 80 100

60

90

50

80

80 70 60

30 20 10

Frequency

70

40

Frequency

Frequ uency

70

60 50 40 30

0

0

2

3

4

5

6

7

15.5

N 1000

16.5

17.5

18.5

19.5

20.5

14

15

16

17

average

stdev

Variable stdev

30

10

10

1

40

20

20

0

50

Mean 3.8183

Median 3.7282

StDev 0.9505

18



Variable average

N Mean Median StDev 1000 17.991 17.982 0.814



median

1000



19

20

21

22

median

17.711

17.675

1.017

7

8

S.D for samples of size 25

Notices as we take larger samples…. samples…. 

The histograms for all three statistics (sample mean, sample median and sample standard deviation) are becoming more and more symmetric and bell-shaped and less variable, particularly those for the sample mean



Also notice that the estimated standard deviation of the sample mean is not only decreasing as sample size increases, but is also approximately the same for the same sample sizes.

70 60

Frequ uency

50 40 30 20 10 0 2

3

4

5

6

stdev

Variable stdev

N 1000

Mean 3.9637

Median 3.9391

StDev 0.6048

9

A general result of great importance 



10

The Central Limit Theorem 

No matter what model a random sample is taken from, as the sample size (number of random observations) increases, the distribution of the sample mean becomes closer and closer to the normal distribution,, and No matter what model a random sample is taken from, and for any sample size n, the standard deviation of the sample mean is the model standard deviation, , (the theoretical standard deviation) divided by n, that is, /n = rel="nofollow"> Called standard error of the means (SE).

11

Whatever the population dist. looks like (normal or not), when a sample size is large enough, the distribution of sample means will be normal and we can use Z-statistic to calculate probability of any mean value

12

2

So, how large does n need to be?

This is the Central Limit Theorem 

If X is a random variable with a mean µ and variance σ², then in general,



Generally, it depends on the original distribution of X. ◦ If X has a normal distribution, then the sample mean has a normal distribution for all sample sizes. ◦ If X has a distribution that is close to normal, the approximation is good for small sample sizes (e.g. n=20).

 2  X  N  ,  n   X   Z ~ N  0,1 as n  . Z  n

◦ If X has a distribution that is far from normal, the approximation requires larger sample sizes (e.g. n=50).

13

14

Activity 1 

The average height of Vietnamese women is 1.6m, with a standard deviation of 0.2m. If I choose 25 women at random, what is the probability that their average height is less than 1 53m? 1.53m?

15

3

Outline

Estimation

  

Reading materials:



Concepts of estimation – point and interval estimators; unbiasedness and consistency Estimating the population mean when the population variance is known Estimating the population mean when the population variance is unknown Selecting the sample size

Chap 10 (Keller)

1

2

Recap: What size n?

Recap: The Central Limit Theorem 

 

As n→∞, the distribution of the sample mean becomes Normal, with centre µ and standard deviation σ/√n. This happens regardless of the shape of the original population. i.e. X follows a Normal distribution with

 



If the distribution of X is normal, then for all n the sample mean will follow a normal distribution. If the distribution of X is VERY not normal, then we will need a large n for us to see the normality of the distribution of the sample mean. mean In all cases, as n gets larger, the distribution of the mean gets more normal.

E ( X )   and var( X )  

2

n 3

4

Estimation

How does this help? 

This means that if we have a large enough sample, we can always find out probabilities to do with the mean, since it will have a normal distribution no matter what the original distribution. distribution



The aim of estimation is to determine the approximate value of a parameter of the population using statistics calculated in respect of a sample drawn from that population. • As an example, l we estimate i the h mean off a population l i using i the mean of a sample drawn from that population. That is, the sample mean is an estimator of the population mean. • The actual statistic we calculate in respect of the sample is called an estimate of the population parameter. For example, a calculated sample mean is an estimate of the population mean.

5

6

1

Estimators 

Desirable qualities of estimators

There are two types of estimators



Want our estimators to be precise and accurate Accurate: on average, our estimator is getting towards the true value Precise: our estimates are close together

Point estimate: a single value or point, i.e. sample mean = 4 is a point estimate of the population mean, µ. Interval estimate: Draws inferences about a population by estimating a parameter using an interval (range).



Sample mean is a precise and accurate estimator of the population mean. (Sometimes, accurate and precise together is referred to as unbiased.)

• E.g. We are 95% confidence that the unknown mean score lies between 56 and 78. 7

8

Interval estimators for  , is known

Point and interval estimators  

A point estimate is just that, an interval gives some idea of how sure we are. Interval estimator:



We know that

 2  x ~ N  , . n   x  So, Z  ~ N  0,1 .  n

 Gi Give an iintervall (range) ( ) based b d on a sample l statistic i i  This interval corresponds to a probability and this probability is never equal to 100%

9

Put these things together…. And rearranging…

Interval estimators (cont.) 

10

We also know that, for a standard normal distribution, 95% of the area is contained between -1.96 and + 1.96.

P  1.96  Z  1.96   0.95   x  P  1.96   1.96   0.95  n  

P  1.96  Z  1.96   0.95



P 1.96 



P x  1.96  11

n  x    1.96  n    x  1.96 



n  0.95



n  0.95 12

2

    P  x  1.96    x  1.96   0.95 n n   This is called a 95% confidence interval for μ.  What this means:

Example 1 

• In repeated sampling, 95% of the intervals createdd this hi way would ld contain i μ andd 5% would not. 



Suppose we know from experience that a random variable X~N(μ, 1.66), and for a sample of size 10 from this population, the sample mean is 1.58. N Now,

    P  x  1.96    x  1.96   0.95 n n 

Can change how confident we are by changing the 1.96 • Use 1.64 to get a 90% confidence interval • Use 2.57 to get a 99% confidence interval 13

General notation

       x  1.96 P  x  1.96   0.95 n n 



 1.66 1.66  P 1.58  1.96    1.58  1.96   0.95 10 10   P  0.78    2.38   0.95 



14

In general, a 100(1-α)% confidence interval estimator for μ is given by     P  x  Z / 2    x  Z / 2   100(1   )% n n 



Interpretation: If the experiment were carried out multiple times, 95% of the intervals created in this way would contain μ. Lower Confidence Limit: 0.78, Upper Confidence Limit: 2.38

Notations:

C o n f id e n c e le v e l: 1 0 0 (1   ) % th e p ro b . th a t a p a r a m e te r f a lls in to C I C I: x  Z 

 /2

LCL: x  Z /2

n

 n

; U CL: x  Z

 /2

n

15

16

What does 100(1 100(1--α)% mean

What does Zα/2 mean?

If we want 95% confidence, α=0.05 (or 5%).  If we want 90% confidence, α=0.10 (or 10%). 0%).  If we want 99% confidence, α=0.01 (or 1%). 



We want to find the middle 100(1- α)% area of the standard normal curve: ◦ So the area left in each tail will be α/2. po t which w c marks a s off o area a ea of o α/ α/2 in the t e tail ta ◦ Zα/2 iss tthee point ◦ Need to look up normal tables to find this!

17

18

3

Factors influence width of the interval   

IMPORTANT!

 fixed; can’t be changed Vary the sample size: as n gets bigger, the interval gets narrower. V Vary th confidence the fid l l If we wantt to level: t be b more confident, then we simply change the 1.96 to another number from the standard normal, 2.33 will give 98% confidence, 2.575 will give 99% confidence; increasing confidence will make the interval wider.

Remember that it is the INTERVAL that changes from sample to sample.  µ is a fixed and constant value. It is either within the interval or not.  You should interpret a 95% confidence interval as saying “In repeated sampling, 95% of such intervals created would contain the true population mean”. 

19

20

1. A 95% confidence interval for the population mean height.

Example 2 

Average height of a sample of 25 men is found to be 178cm. Assume that the standard deviation of male heights is known to be 10cm, and that heights follow a normal distribution. Find

    P  x  1.96    x  1.96   0.95 n n  10 10   P  178  1.96 1 96    178  11.96 96 95   00.95 25 25  

1. A 95% confidence interval for the population mean height. 2. A 90% confidence interval for the population mean height.

P 174.08    181.92   0.95



So, in repeated sampling, we would expect 95% of the intervals created this way to contain μ.

21

22

2. A 90% confidence interval for the population mean height.

Interval estimators for  ,  is unknown

P  1.645  Z  1.645   0.90,



that is Z / 2  1.645

    P  x  1.645    x  1.645   0.90 n n  10 10      178  1.645 P  178  1.645   0.90 25 25  



t 

P 174.71    181.29   0.90



We can’t simply substitute s in for σ, since X   does not have a standard normal s n distribution! However, it does follow a known distribution: it follo s a t-distribution follows t distrib tion with ith n-11 degrees of freedom. freedom The statistic is called t-statistic:

So, in repeated sampling, we would expect 90% of the intervals created this way to contain μ. 23

x s/ n

24

4

About the t-distribution (2)

About the tt-distribution (1)   



Found by Gossett, published under pseudonym “Student”. Called “Student’s t-distribution” It is symmetric around 0, mound shaped (like a normal), but has a higher variance than a normal distribution. The higher the degrees of freedom, the more normal the curve looks.

Normal distribution Bell-shaped Symmetric More spread out

t ((df = 13)) t (df = 5)

0

Z t

25

26

Hints for Using the tt-tables

Degree of freedom (df)  

Number of obs whose value are free to vary after calculating the sample mean E.g X 2





X1 = 1 (or another value) X2 = 2 (or another value) X3 = 3 (can’t be changed)



Bottom row has df=∞; this is the standard normal probabilities.



If df is not on tables as exact, use whatever df is closest

◦ If df is very large, use Z tables even if σ is unknown

df = n -11 = 3 -1 =2

◦ Difference between values for large df is small ◦ E.g. df=74; would use values for df=70 as this is closest. Then say:

t0.05,74  t0.05,70  1.667 27

28

Confidence Interval for  , is unknown

Example 3

s s   P  x  t / 2    x  t / 2   100(1   )% n n  s s  CI: x  t / 2    x  t / 2 n n



Note: (i) (ii)

A random sample, size n = 25, x = 50, = 8. Use 95% confidence level to estimate .

s

s s    x  t / 2 n  n 8 8    50  2.0639 50  2.0639 25 25  46.69    53.30 x  t / 2

The population must follow normal distribution to get t-statistic Use t-table to find t-value 29

30

5

Sample size required

Determine the sample size Suppose that before we gather data, we know that we want to get an average within a certain distance of the true population p p value.  We can use the CLT to find the minimum sample size required to meet this condition, if the standard deviation of the population is known. 

Example 4: Assume that the standard deviation of a population is 5. I want to estimate the true p population p mean lying y g in a range of 3, with 99% certainty.  Step 1: set up the equation needed. 





P X    3  0.99

31

32

Sample size continued

Sample size continued





Step 2: standardise.

Step 3: solve for n.

P  Z  2.575   0.99

 X  3   P   0.99  n  n    3  P Z    0.99 5 n   3 n P  Z    0.99 5  

3 n  2.575 5 n  (2.575*5) / 3 n  18.42 

33

Therefore, I need a minimum sample size of 19 to be able to estimate the true population mean lying in CI of 3, with 99% certainty 34

Activity 1 

Suppose that we know the standard deviation of men’s heights is 10cm. How many men should we measure to ensure that the sample p mean we obtain is no more than 2cm from the population mean with 99% confidence?

35

6

Outline

Section 4



Hypothesis Testing

  

Reading materials: Chap 11, 12 (Keller)

Hypothesis testing: basic concepts; Testing µ when  is known Testingg µ when  is unknown Testing for the difference of two means (independent samples)

2

1

Plan

Hypothesis testing    

Making decisions in the face of uncertainty Hypothesis testing is a structure for making these decisions We have in mind two competing p g ideas – call these hypotheses

 

Collect data and use this to decide which idea is most likely to be correct Depending on the decision, we either will or will not carry an umbrella. Decision matrix – thinking about consequences.

◦ First idea: null hypothesis ◦ Second idea: alternative hypothesis 

What actually happens (truth)

The ideas must be distinct; e.g.

What you decide

◦ Idea 1(H0): it will rain today ◦ Idea 2 (HA): it will not rain today

Take umbrella Don’t take umbrella

It rains 

It doesn’t rain 



 4

3

An analogy for hypothesis testing – criminal law

In Statistics: Truth H0 true Accept H0 Decision Accept HA

  Type 1 Error

HA true

 Type 2 Error



α = significance level = P(type 1 error) β = 1 – power = P(type 2 error) Power=P(reject H0 when it is false) 5

Criminal law

Hypothesis testing

Accused is innocent

Null hypothesis

Accused is guilty

Alternative hypothesis

Gathering evidence

Gathering data

Build case – presenting and summarising evidence

Presenting a summarising data, building a test statistic 6

1

Analogy continued – outcomes (2)

Analogy continued – outcomes (1) Criminal law

Hypothesis testing

Accused is acquitted

Choose H0

Accused is convicted

Choose HA

Convict an innocent person Acquit a guilty person “Beyond reasonable doubt”

If we say we have a 95% chance of making the right decision, it means we have a 5% chance of making an error. But, what type of error do we have a 5% chance of making?  A Type 1 error is considered to be more serious than a Type 2 error. Therefore, by convention, we set up testing so the probability of yp 1 error,, α,, is small;; makingg a Type  Ideally, we would also like to have the probability of making a Type 2 error, β, small. But reducing chance of Type 1 error increases chance of Type 2 error;

Type 1 Error

 Therefore, we choose to set

Type 2 Error

α

to 5% (i.e. a 5% chance we reject H0

when it is true), or some other fixed, low probability and ignore β

“95% Certainty of making the right decision” 7

8

Steps for hypothesis tests

Analogy continued – outcomes (3)

1. 

2.

In hypothesis testing, we also make a “presumption of innocence”. This means that, when we test a hypothesis, we start by assuming null is true. Then, we gather data, and if we find enough evidence, we will reject the null hypothesis and accept the alternative hypothesis.

3.

4.

State null and alternative hypotheses Calculate test statistic Formulate a Decision Rule using either the Rejection Region, or p-value - found from appropriate distribution (std normal), or confidence interval approach Reach a conclusion regarding whether to accept the null or alternative hypothesis.

9

10

Rules for hypotheses

Testing µ when  is known Example 1: A store manager is considering a new billing system for credit customers. New system will only be cost effective if mean monthly account is more than $170. Random sample of 400 monthly accounts gives sample average of $178. $178 Manager knows that accounts are approximately normally distributed, with standard deviation of $65. Can the manager conclude from this data that the new system will be cost effective?  Want to find out if µ, true mean monthly account, is bigger than $170. 

11



Null hypothesis:  Always about a population value (greek letter)  Always has an “=“



Al Alternative i hypothesis: h h i  Always about a population value (greek letter)  Has one of <, > or ≠  Looks like null, but “=“ has been replaced.

12

2

Applying the rules to example 1

Recap: The Central Limit Theorem



Null hypothesis H0:µ=170



Alternative hypothesis HA:µ>170



Having done this, the question now becomes: “is $178 far enough away from $170 to conclude that µ is bigger than $170?”



The central limit theorem says that a sample average has a normal distribution with a centre at µ and a standard deviation of  / n . So, if we calculate the test statistic below, it should follow a standard normal distribution X    Z ~ N  0 ,1  a s n   . n



14

13

Applying this to the example (1)  





Applying this to example 1 (2)

We have σ=65. We calculate a test statistic – this measures (in standardised units) how far from the hypothesised µ our sample average is. Formula:



Test statistic in this case: Z

X  Z  n

X   178  170 8    2.46 2 46 n 65 400 3.25



Z should follow a standard normal distribution IF the true µ is equal to the one in our null hypothesis.

15

Applying this to example 1

Decision Rule  

16

Three methods – rejection region, p-value, or confidence interval Rejection region:  We want to be 95% certain. This means a 5% chance of rejecting H0 when it is true.  So, we find the EXTREME 5% of the standard normal (according to our alternative hypothesis) and this will be our rejection region.

   

17

Point that marks off top 5% of a standard normal is 1.645. So, we will reject the null hypothesis if our test statistics lies above 1.645. 1 645 Here, Test statistic = 2.46. So we reject the null hypothesis in favor of the alternative hypothesis. In other words, there is sufficient evidence to conclude that the mean monthly account is higher than $170 18

3

P-value approach (by hand or computer)

Applying this to example 1 (by hand) 

This is probability of getting our test statistic or further away from middle if the null is true.  Draw a diagram – it is the area more extreme than our test statistic, i.e. for the last example, p-value is P(Z>2 46) P(Z>2.46).  Small p-value is evidence against the null hypothesis.  Rule:  If p-value < α, => reject null hypothesis;  If p-value > α, => Do not reject null hypothesis 

From the standard normal tables: P(Z>2.46)



= 1 – P(Z<2.46) = 1 – 0.9931 = 0.0069

This means that the probability of observing a sample mean at least as large as 178 for a population whose mean is 170 is 0.0069, or extremely small (much smaller than 0.05). Therefore, we reject the null and conclude that the mean monthly account is higher than $170 (the same conclusion as we did using the rejection region approach)

19

Confidence interval (CI) approach 

20

Applying this to example 1

For a 5% significance level, we set up a rejection region:



65 65 <  178  1.96 400 400  171.63< <184.37 178-1.96

X  X   1.96 or  1.96  n n

 Acceptance Region is:

-1.96<

The 95% confident interval for μ is:

X   1.96 n





Then the 95% CI for  is: X -1.96 / n <  X  1.96 / n

Because µ does not lie b/w this CI, we reject the null in favor of the alternative

22

21

So if alternative is “≠ “ ≠“

One tailed vs two tailed tests 



If the alternative hypothesis is “<“ or “>”



 This is a one tailed test  Rejection region will be in either upper or lower tail  P-value is the probability of getting a more extreme result 



Two sided or two tailed test Rejection Region will be Z<-Zα/2, Z>+Zα/2 P-value will be P(Z>T.S)+P(Z
If the alternative hypothesis is “≠”  This is a two tailed test  Rejection region needs to be split between both tails  P-value will include an absolute value – i.e. will be the probability of getting further away from the hypothesised mean on either side

23

24

4

So if alternative is “> “>“

If alternative is “< “<“   

Right tailed test Rejection Region will be Z>+Zα P-value will be P(Z>T.S)



Left tailed test Rejection Region will be Z<-Zα P-value will be P(Z
 

25

Example 2

Testing µ when  is unknown  

26

Similar to the case of estimation, we can substitute s in for σ and calculate the t-statistic. The basic process of hypothesis testing remains the g same,, with the followingg changes



Use the gssft.sav file to test the hypothesis that college graduates work a 40-hour work week.

 Test statistic is now calculated as

t

X  s n

 It follows the t-distribution with n-1 degrees of freedom (use t-table to find rejection region or p-value). 27

28

Hypotheses, test statistic for 22-tailed test

CI for µ when σ is unknown

H0: µ=40 HA: µ  40  Here are results from SPSS  

 

One-Sample p Test Test Value = 40

Number of hours worked last week

t

df

14.326

436

Sig. (2tailed)

Mean Difference

.000

6.995

95% Confidence Interval of the Difference Lower

Upper

6.04

29

7.96

Also use t-distribution for confidence intervals for µ when σ is unknown. If σ has been estimated from data, confidence interval will be of form s . n s s Or X  t / 2,n 1    X  t / 2,n 1 . n n X  t / 2,n 1

30

5

Conclusion 

Based on either t-statistic or p-value, or confidence interval approach, we reject the null hypothesis. In other words, there is sufficiently statistical evidence to conclude that full-time workers work more than 40 hours per week.

31

6

Outline

Section 5



Simple Regression:     

Regression analysis 

Reading materials:

Form of the general model Procedure in SPSS Interpretation of SPSS output T i significance Testing i ifi off a slope/intercept l /i Assumption checking

Multiple Regression:  As above

Chap 17, 18 (Keller)

1

Regression analysis

Types of relationships

Regression analysis investigates whether and how variables are related to each other. More specifically, regression analysis can be used to:



2

Positive linear relationship

Negative linear relationship

Non-linear relationship

No relationship

• Determine whether the value of one variable has any effects on the values of another; • Determine whether, as one variable changes, another tend to increase or decrease? • Predict the values of one variable based on the values of one or more other variables. 

E.g: • How price is related to product demand => making changes on price, how product demand will change? • How salary of staffs depend on their education and experience?

3

Simple linear relationship: example

Simple linear relationship 



4

In simple linear relationship, we want to see whether a linear relationship exist b/w one dependent variable (Y) and one independent variable (X). Example: want to see whether the time persons have li d in lived i a city i (in (i years)) affects ff their h i attitude i d towards d that city in a linear manner. Attitude towards the city is measured on an 11-point scale (1=do not like, 11= very much like).

5

Respondent Number

Duration of Residence

Quality of infrastructure

1

10

3

Attitude Towards City 6

2

12

11

9

3

12

4

8

4

4

1

3

5

12

11

10

6

6

1

4

7

8

7

5

8

2

4

2

9

18

8

11

10

9

10

9

11

17

8

10

12

2

5

2 6

1

Steps in regression analysis

Simple linear regression: notation

Analyse the nature of the relationship b/w independent and dependent variables



2.

Make a scatterplot



3 3.

Formulate the mathematical model that describes the relationship b/w the independent and dependent variables

4.

Estimate and interpret the coefficients of the model



5.

Test the model



6.

Evaluate the strength of the relationship and prediction accuracy

1.

   

Simple regression – one predictor We have n observations. Xi = value of the independent variable on ith obs Yi= value of dependent variable on ith obs. sx=sample standard deviation of the independent variables sy=sample standard deviation of the dependent variables Y is the sample average of the independent variables X is the sample average of the dependent variables

7

8

Simple linear regression: Model

Simple linear regression: scatterplot 



Step 2: Make a Scatterplot Example – city attitudes vs duration of residence

Step 3: Formulate the General Model a straight line to the data, fitting the following model:

 Fit

Scatterplot of Attitude Towards City vs Duration of Residence

Intercept

11

Attitude Towards City

10 9

Error terms (Residual)

Yi   0  1 X i   i

8 7 6 5

Slope

4 3

 Slope

and intercept are estimated by the ordinary least squares (OLS) method.

2 0

5

10 Duration of Residence

15

20

9

OLS method and assumptions

10

Gauss--Markov assumptions Gauss

2 Want:   i minimum



Y

Yi   0  1X i   i

Observed value

i = error terms

 

Assumption on linear relation A0: linear model Assumption on the factor Cov( X ,  )  0 A5: Exogeneity g y assumption: p Assumption on the error terms: A1 : E ( i )  0 i  1,..., n



YX

A2 : Normality of error terms  ~ N

  0  1X i

A3 : Non-autocorrelation of error terms cov( i ,  j )  0 i  j A4 : Homoskedasticity Var( i )   2 i  1,..., n

X

Source: Dehon’s lecture 11

12

2

Estimate the parameters 



Applying this to example

Step 4: Estimate the parameters (slope and intercept) Yˆi  ˆ 0  ˆ 1 X i Can calculate estimates of slope and intercept using formulae, which are derived from the OLS 1 

n

n

n

i 1

i 1

i 1

 

n X iYi   X i  Yi

 n  n X    X i  i 1  i 1  0  Y  1 X n

Slope

= 16.333/27.697 = 0.5897 Intercept = 6.5833-0.5897*9.333 =1.0796

Fitted Equation: Yˆi 1.07960.5897*Xi

2

2 i

13

14

Step 5: Testing for significance of estimated parameters

Interpreting the coefficients





ˆ1= 0.5897 means that each additional year of staying in the city, your attitude towards city will increase by an average of 0.5897 points

  

ˆ0 = 1.0796 is the value when X=0. This means



Can test significance of linear relationship H0:β1=0 HA:β1≠0 Test Statistic:

that other reasons unrelated to the duration of residence make your attitude towards city equal to 1.0796 points. 

T

ˆ1  1 sˆ

; where sˆ is the standard error of ˆ1.

Note: sometimes, ˆ0 makes non-sense when X=0, we don’t interpret the meaning of this coefficient.



Decision Rule: Compare to a t-distribution with n-2 degrees of freedom.

15

16

Applying this to example   

So, rejection region will be t>2.2281 or t<-2.2281 for 5% significance (use df=10)  OR from SPSS, p-value = 0.000.  Conclusion: Reject the null hypothesis. hypothesis There is a significant linear relationship between duration of residence and attitude to the city. 

ˆ1  1 sˆ

1



Decision rule

H0:β1=0 HA:β1≠0 Test Statistic: t

1

1



00.5897 5897  0  8.412 0.0701

Compare this t-value with the t-distribution to make decision rule.

17

18

3

Step 6: Determine the strength and significance of association

Applying this to example

Measured by r2 – coefficient of determination.  r2 measures proportion of total variation (Y) explained by the variation in X, i.e. 

r2 



Here is outputs from SPSS

S = 1.22329 86 4% 86.4%



explained variation SS x  total variation SS y

R-Sq = 87.6%

R-Sq(adj) =

So, 87.6% of variation in Y is explained by the variation in X.

19

Checking assumption

Step 6: Check prediction accuracy 



Can use standard error of the estimate, sε.

s 

20

SSres n  k 1

Regression analysis makes several assumptions:  Error terms normally distributed  Error terms have mean 0, constant variance  Error terms are independent

Interpretation: average residual; average error in predicting Y from the regression equation.  Used to construct confidence intervals 



These should be checked with plots (see multiple regression section)

◦ for mean value of Y for given X ◦ for all values of Y for given X

21

Multiple Regression

Example using SPSS 

22



Use the cntry15.sav data file for SPSS practice.

Data:  one dependent variable  two or more independent variables



23

Example: Are consumers consumers’ perceptions of quality determined by the perceptions of prices, brand image and brand attributes?

24

4

Interpreting a Partial Regression Coefficient

Model – general form Y   0  1 X 1   2 X 2     k X k  



Imagine a case with two predictors

which is estimated by ˆ ˆ Y   0  ˆ1 X 1  ˆ2 X 2    ˆk X k

Y   0  1 X 1   2 X 2  

ˆ0  estimated intercept ˆ i  estimated partial regression coefficient

is increased by one unit, but X 2 is held constant



1 represents the expected change in Y when X 1 or otherwise controlled.

As before, use least squares method to estimate parameters, minimise the error (residual) sum of squares. 25

26

Example 2 

General Model

Attitude to city now being explained by



 Duration of residence  Quality of infrastructure

Let  Y=attitude to city duration of residence  X1=duration  X2=quality of infrastructure

Y   0  1 X 1   2 X 2  

27

Estimation (SPSS)

28

Strength of relationship (R2)

The regression equation is Attitude Towards City = 0.337 + 0.481 Duration of Residence + 0.289 quality of infrastructure



Coefficientsa Unstandardized Coefficients Model 1

B

Std Error Std.

Standardized Coefficients Beta

t .595

R2 

Sig Sig.

(Constant)

.337

.567

duration

.481

.059

.764

8.160

.000

quality

.289

.086

.314

3.353

.008

.567

a. Dependent Variable: attitude

29

As before, is the proportion of variation explained by the model.



explained variation SS reg  total variation SS y

In the example, 94.5% of variation in Y can be explained by the variation in X1 and X2 30

5

Points about R2

Significance Testing

Now called coefficient of multiple determination  Will go up as we add more explanatory terms to the model whether they are “i “important” ” or not.  Often we use “adjusted R2” – compensates for adding more variables, so is lower than R2 when variables are not “important”





Can test two different things 1. Significance of the overall regression 2. Significance of specific partial regression coefficients.

31

32

Applying this to example– example–SPSS output

1. Significance of the overall regression  

H0: β1= β2= β3=…= βk=0 HA: not all slopes = 0





Test Statistic:



 

This is the test done in the ANOVA section of the output. In this case, we reject the null hypothesis – at least one of the slopes is significantly different from zero.

SSreg / k R2 / k F  2 SS /(n  k  1) 1  R  /(n  k  1) Decision res Rule: Compared to an F-distribution with k, (n-k-1) degrees of freedom. If H0 is rejected, one or more slopes are not zero. Additional tests are needed to determine which slopes are significant. 33

2.   



Applying this to example

Significance of specific partial regression coefficients.

H0: βi=0 HA: βi≠0 Test Statistic:

Coefficientsa Unstandardized Coefficients

ˆ  i ˆi  t i sˆ sˆ i



34

Model 1

i

Decision Rule: Compared to a t-distribution with (n-k-1) degrees of freedom (i.e. residual d.f.) If H0 is rejected, the slope of the ith variable is significantly different from zero. That is, once the other variables are considered, the ith predictor has a significant linear relationship with the response.

B

Std. Error

Standardized Coefficients Beta

t

Sig.

((Constant))

.337

.567

.595

duration

.481

.059

.764

8.160

.567 .000

quality

.289

.086

.314

3.353

.008

a. Dependent Variable: attitude



35

Once the quality of infrastructure is considered, the duration of residence still has a significant linear relationship with the attitude to a city. 36

6

Check residuals

Error terms normally distributed Can be checked by looking at a histogram of the residuals - look for bell-shaped distribution.  Also normal probability plot – look for straight line.  For preference, use standardised residuals – have a std dev of 1. 

Assumptions made:



 Error terms normally distributed  Error terms have mean 0, constant variance  Error terms are independent

Definition: A residual (also called error term) is the difference between the observed response value Yi, and the value predicted by the regression equation, Yˆi (Vertical distance between point and line.)





37

Error terms have mean 0, constant variance

38

Error terms are independent Check in previous plots; also in residuals vs time/order.  Look for random scatter of residuals. 

Checked by using plots of residuals vs predicted values; residuals vs independent variables. variables  Look for random scatter of points around zero.  If not, may indicate linear regression is not appropriate – may need to transform data 

39

40

Example Residual Plots for Attitude Towards City Normal Probability Plot of the Residuals

Percent

90 50 10 1

Residuals Versus the Fitted Values Standardized Residual S

99

-2

-1 0 1 Standardized Residual

2

2 1 0 -1 -2 2

Histogram of the Residuals Standardized Residual

Frequency

3 2 1 0

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 Standardized Residual

2

4

6 8 Fitted Value

10

Residuals Versus the Order of the Data

2.0

2 1 0 -1 -2

1

2

3

4

5 6 7 8 9 Observation Order

10

11 12

41

7

8/13/2012

Outline

Section 6

 Overview

of time series; hypothesis  Autoregressive g processes; p ;  Determining process order;  Stationarity

Introduction to Time Series Reading materials: Chap 20 (Keller)

1

Different Types of Data Cross sectional data: You observe each member in your sample ONCE (usually but not ne cessarily at the same time).





Examples of Cross Sectional Data:  Observing the heights and weights of 1000 people.  Observing the income, education, and experience of 1000 people.  Observing the per-capita GDP, population, and real defence spending of 80 nations.

2

Overview of Time Series 

◦ We will assume that the observations are made at equally spaced time intervals. This assumption enables us to use the interval between two successive observations as the unit of time. time

Time series: You observe each variable once per time period for a number of periods.

 ◦

Examples of Time Series   







Observing U.S. inflation and unemployment from 1961-1995. Observing the profitability of one firm over 20 years. Observing the daily closing price of gold over 30 years.

Pooled time series (="panel data"): You observe each member in your sample once per time period for a number of periods. Examples of Pooled Time Series (="panel data")   

Time series is time-ordered data.

The total number of observations in a time series is called the length of the time series (or the length of the data). ◦ More Examples of Time Series:  Daily closing stock prices; and,  Monthly unemployment figures.

Observing the output and prices of 100 industries over 12 quarters. Observing the profitability of 20 firms over 20 years. Observing the annual rate of return of 300 mutual funds over the 1960-1997 period.

3

Overview of Time Series 

Overview of Time Series

Univariate time series models:



◦ Model and predict financial variables using only information contained in their own past values and possibly current and past values of an error term. 

4

Virtually any quantity recorded over time yields a time series. To "visualize" a time series we plot our observations as a function of the time. This is called a time plot.

5

Think of a time series stochastic~random process.

as

a

random

or

◦ Do not know the outcome until the experiment is implemented  The closing value of next trading day of Dow Jones Index.  The Th annuall output t t growth th off Malaysia M l i nextt year.

◦ When collecting a time series data set, we get one possible outcome under a certain number of conditions. Changing conditions => get different set of outcomes (different crosssectional samples from a population)

6

1

8/13/2012

Stationary time series 

Stationary time series 

Recall Gauss-Markov assumptions for OLS estimation of cross sectional data ◦ Error terms are normally distributed. If not, apply LLN and CLT





LLN and CLT hold for TS if the process satisfies stationary conditions

Strict stationary: A TS is stationary if the joint probability distribution of any set of times is not affected by an arbitrary shift along the time axis. More clearly: the joint distribution of ( yt , yt ,..., yt ) i the is th same as the th joint j i t distribution di t ib ti off 1

2

m

( yt1 h , yt2 h ,..., ytm h ) 

Weak or covariance stationary if covariances b/w y t and y for any h do not depend upon t. t  h

7

Autoregressive Processes

Covariance stationary 



Then: EYt      V Yt   E(Yt  )2   0   covYt ,Yt k   E(Yt  )(Yt k  )   k , k  1,2,3,... A t Autocorrelation: l ti standardising t d di i  k gives i autocorrelation  k as : k 



8





An autoregressive model is one where the current value of a variable, y, depends only upon the values that the variable took in previous periods, plus an error term. For example, a first-order process as is where y is influenced by 1 lag. lag This is known as an AR(1) model, model or an autoregressive model of order 1. This is formalised below:

yt     1 yt  1  ut

cov( yt , ytk )  k  0 V  yt 



In general, an autoregressive model of order p, denoted AR(p) is expressed as:

Which measures dependency among observations or number of lags

yt     1 yt 1   1 yt  2   1 yt 3  ...   1 yt  p  ut

10

9

Autoregressive Processes   

Determining Process Order

What does an AR(1) look like? What does a white noise process look like? How can the lag order be determined? 1. 2. 3. 4.

1.

Autocorrelation Function (ACF) ◦ ◦

ACF; PACF; AIC and SIC criteria; and, White noise residuals.

11

ACF measures the correlation between the current observation and the k’th lag.  i.e. the correlation between yt and ytt-kk. For an AR process the ACF can decay slowly or rapidly, but it will decay geometrically to zero.

12

2

8/13/2012

Determining Process Order

Determining Process Order Partial Autocorrelation (PACF)

2.

Autocorrelation Function for ASX ALL ORDINARIES - PRICE IN



(with 5% significance limits for the autocorrelations)

PACF measures the correlation between the observation k periods ago and the current observation, after controlling for observations at intermediate lags (i.e. all lags < k).  For example, example the PACF for lag 3 would measure the correlation between yt and yt-3, after controlling for the effects of yt-1 and yt-2.  Note: at lag 1, the autocorrelation and partial autocorrelation coefficients are equal, since there are no intermediate lag effects to eliminate.

1.0 0.8

Autocorrelation n

0.6 04 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1

10

20

30

40 Lag

50

60

70

80

14

13

Determining Process Order

Determining Process Order 3.

AIC and SIC criterion ◦

Partial Autocorrelation Function for ASX ALL ORDINARIES - PRICE IN

Akaike (AIC) and Schwarz information criterion (SIC)

(with 5% significance limits for the partial autocorrelations) 1.0

Partial Autocorrelation

0.8

A IC  e 2k

0.6 04 0.4 0.0



-0.2 -0.4 -0.8 -1.0 10

20

30

40 Lag

50

60

70

/ n

k = lag order n = # obs

1. Fit model with k lags. Calculate AIC and SIC; 2. Fit another model with k+1 or k-1 lags; and, 3. Best model will have lowest AIC/SIC.

-0.6

1

S Technique: IC  n k

RSS n RSS n

/ n

0.2

80

15

16

Determining Process Order 4. White noise approach ◦ ◦

◦ ◦

Recall that yt is autocorrelated. If we have fitted the correct number of lags, (to take into account the autocorrelation), then there should be none left in the residuals. That is the residuals are white noise White noise properties;  Homoscedastic;  Constant mean; and,  No autocorrelation.

17

3

Related Documents

Statistics
May 2020 22
Statistics
May 2020 15
Statistics
July 2020 19
Statistics
November 2019 16
Statistics
November 2019 25
Statistics
November 2019 21

More Documents from ""

000016.pdf
November 2019 12
Statistics Lecture.pdf
November 2019 20
August 2019 29
Desolation.pdf
October 2019 21
Pembuatan Gas Co2-2.docx
December 2019 23