Lessons 2(b)-(d).pdf

  • Uploaded by: Francis Cayanan
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Lessons 2(b)-(d).pdf as PDF for free.

More details

  • Words: 3,016
  • Pages: 88
Describing a Set of Data with Numerical Measures

Lesson 2(b)



To present the important measures

and to show how to compute the following:

▪ Mean  Median  Mode



A measure of center is a value at the

center or middle of a data set.

...Mean ...Median ...Mode



The (arithmetic) mean is generally the most important of all numerical

descriptive measurements, and it is what most people call an average



The arithmetic mean of a set of values

is the number obtained by adding the values and dividing the total by the

number of values; also referred to as mean will be often used throughout

the remainder of the course.



The mean is denoted by

(pronounced “x-bar”) if the data set is a sample from a larger population. 

The mean is denoted by (lowercase Greek mu) if all values of the population are used.



The Greek letter  (uppercase Greek sigma) indicates that the data values

should be added. 

Formula

Denotes the mean of a set of sample values (ungrouped) Notation

Σ

Denotes the addition of a set of values

is the variable usually used to represent the individual data values

x

n represents the

number of values in a sample

Denotes the mean of all values in a population Notation

Σ

Denotes the addition of a set of values

is the variable usually used to represent the individual data values

x

N represents the

number of values in a population



Listed below are the volumes (in ounces) of the Coke in five different cans. Find the mean for this example.

12.3 12.1 12.2 12.3 12.2

12.3 12.1 12.2 12.3 12.2



It is sensitive to every value, so one exceptional value can affect the mean dramatically.



The median largely overcomes this disadvantage.



The median of a data set is the middle

value when the original data values are arranged in order of increasing (or decreasing) magnitude. 

The median is often denoted by

(pronounced “x-tilde”, or “x-curl”).









first sort the values (arranged them in order), then follow one of these two procedures: If the number of values is odd, the median is the number located in the exact middle of the list. If the number of values is even, the median is found by computing the mean of the two middle number



Find the median of the following salaries (in millions of dollars) paid to female executives (based on data from Working Woman magazine): 6.72

3.46

3.60

6.44



Since the number of values is an even number, and arranging them in order; such that 3.46

3.60

6.44

6.72

Then, the median is $5.02 million



Repeat Example 1, this time including another salary of $26.70 million. That is, find the median of the following salaries (in million dollars):

6.72

3.46

3.60

6.44

26.70



Since the number of values is an odd number, and arranging them in order; such that 3.46

3.60

6.44

6.72 26.70

Exact middle Then, the median is $6.44



The mode of a data set is the value that occurs most frequently.



The mode is often denoted by M.







When two values occur with the same greatest frequency, each one is a mode and the data is bimodal. When more than two values occur with the same greatest frequency, each is a mode and the data set is said to be multimodal. When no value is repeated, we say that there is no mode.

Find the modes of the following data sets. 1. 2.

3.

5 5 5 3 1 5 1 4 3 5 1 2 2 2 3 4 5 6 6 6 7 9 1 2 3 6 7 8 9 10

1. The number 5 is the mode because it is the value that occurs most often.

5 5 5 3 1 5 1 4 3 5

2. The number 2 and 6 are both modes because they occur with the same greatest frequency. This data set is bimodal.

1 2 2 2 3

4

5

6

6

6 7 9

3. There is no mode because n value is repeated. 1 2 3 6 7 8 9

10



It is the value midway between the highest and the lowest values in the original data set. It is found using the formula shown

Find the midrange of the ages of people arrested on theft charges at the Dutches County jail. 18 16 23 25 19 18 20 38

Find the midrange of the ages of people arrested on theft charges at the Dutches County jail. 19 16 23 25 19 18 20 38

2

2

2

20

34

45

210

Mean = 45;

Midrange = (2

that occur in the data set

average value

+ 210)/2 = 156

Mean

2

Median

Median = 20; middle value

2

Mode

2

20

34

Mode = 2 value that occur most often

45

Outlier 210

A distribution of data is skewed if it is not

symmetric and if it extends more to one side than the other. A distribution of data is symmetric if the

left half of its histogram is roughly a mirror image of its right half

Lopsided to the right = Skewed to the left =

Negatively Skewed The mean and median are to the left of the

mode. Although not always predictable, data of this type of distribution have the

mean to the left of the median

Lopsided to the left = Skewed to the right = positively Skewed

The mean and median are to the right of the mode. Although not always predictable, data of this type of distribution generally have the mean to the right of the median

Grouped Data



When data are summarized in a frequency table, we do not know the exact values falling in a particular class. To make calculations possible, we pretend that within each class, all sample values are equal to the class midpoint.





Since each class midpoint is repeated number of times equal to the class frequency, the sum of all sample values becomes (f•x), where f denotes frequency and x represents the class midpoint. The total number of sample values is the sum of frequencies f.

𝒙=

(𝑭 ∙ 𝒙) (𝑭 ∙ 𝒙) 𝒙= 𝒇 𝒇

Example from Lesson 2(a) Frequency Distribution CI

Class Width

f

x

fx

1

28

-

34

1

31

31

2

35

-

41

4

38

152

3

42

-

48

10

45

450

4

49

-

55

9

52

468

5

56

-

62

9

59

531

6

63

-

69

4

66

264

7

70

-

76

1

73

73

8

77

-

83

2

80

160

Total

40

2129

Lesson 2(c)

 



To discuss the following key concepts: Variation refers to the amount that values vary among themselves, and it can be measured with specific numbers Values that are relatively close together have lower measures of variations, and values that are spread farther apart have measures of variation that are larger



The Standard deviation, which is a

particularly important measure of variation can be computed



The values of Standard Deviation must be interpreted correctly.

Data sets may have the same center but look difference because of the way the numbers spread

out from the center

Different range and unequal variability

...Range ...Variance ...Standard Deviation ...Coefficient of Variation



The difference between the largest observation and the smallest observation



Its advantage is also its disadvantage  Its simplicity; because it is calculated from only two observations, it tells nothing about other observations



Population variance



Sample variance







The population variance is represented by σ2 (Greek letter sigma squared) To compute the sample variance s2 begin by calculating the sample mean , then compute for the difference (also known as deviation) between each observation and the mean Square the deviation and sum, finally devide the sum of squared deviation by (n – 1)

8

4

9

11

The mean is From each observation we determine the deviation from the mean

8–7=1 4 – 7 = -3 9–7=2 11 – 7 = 4 3 – 7 = -4

13

Squaring the deviations yields

Summing and dividing by (n – 1)

(1)2 = 1 (-3)2 = 9 (2)2 = 4 (4)2 = 16 (-4)2 = 16



the difference between the value and the mean



Formula



This is seldom used because of limited utility

  

The variance provides only a rough idea about the amount of variation in the data It is useful when comparing two or more sets of data Squaring the deviations from the mean is squared requires squaring the unit attached to the variance. This contributes to the problem of interpretation: Solution is Standard Deviation

The following are the number of summer jobs a sample of six students applied for. Find the mean and variance of these data 17

15

23

7

9

13

The mean is

The sample variance is



Population



Sample





The standard deviation of a set of sample values is a measure of variation of values about the mean. Formula

(a) Sample standard deviation

(b) Shortcut formula for standard deviation



Step 1:Find the mean of the values



Step 2:Subtract the mean from each individual value to get a list of

deviations of the form 

Step 3:Square each of the differences obtained from Step 2



Knowing the mean and standard

deviation allows the statistician to extract useful bits of information. The

information depends on the shape of the histogram. If the histogram is bellshaped the Empirical Rule is used.



Step 4:Add all the squares obtained

from Step 3 to get 

Step 5:Divide the total from Step 4 by

the number (n-1) 

Step 6:find the square root of the result of Step 5.

µ

Approximately 68% of all observations fall within one standard deviation of the mean

Approximately 95% of all observations fall within two standard deviations of the mean

Approximately 99.7% of all observations fall within three standard deviations of the mean



Calculate the variance and standard deviation for the five measurements given in the table below. 5



7

1

2

Use formulae and

4



Solution: Given

5

7

1

2

Table for simplified calculation of s2 and s

xi 5

(xi)2 25

7

49

1 2

1 4

4 19

16 95

4



Solution: Given

5

7

1

2

4



Solution: using Computation using deviation from the mean

5

1.2

1.44

7

3.2

10.24

1

-2.8

7.84

2

-1.8

3.24

4

0.2

0.04

19

0.0

22.80



Solution



The coefficient of variance of a set of observations is the standard deviation of the observations divided by the mean Population



Sample





Calculate the variance of the following samples 9 3 7 4 7 5 4



Determine the variance and standard deviation of the following samples 12 6 22 31 23 15 13 15 17 21

Calculate the variance and standard deviation of the following samples 

6.5 6.6 6.7 6.8 7.1 7.4 7.7 7.7 7.7 7.3

Lesson 2(d)





Provides information about the position of particular values relative to the entire data set Types  Median  Centiles (Percentile, Quartile, Decile)  Z-score



A centile or centile point is defined as a

specific point in a distribution which has a given percentage of the cases below it. 

Widely used in educational circles in reporting the results of standardized

tests



Any Centile Point

Where LL = lower exact limit of interval in which we are interpolating N = number of cases p = proportion corresponding to 

the desired centile cf = cumulative frequency of cases below interval in which we are interpolating fi = frequency of the interval in whic we are interpolating i = size of the class interval

Cumulative Relative Frequency Distribution

CI LL

UL

f

Cum Rel Freq

1

28

-

34

1

0.025

2

35

-

41

4

0.125

3

42

-

48

10

0.375

4

49

-

55

9

0.600

5

56

-

62

9

0.825

6

63

-

69

4

0.925

7

70

-

76

1

0.950

8

77

-

83

2

1.000

Total

40

Ogive This side of the curve tells that... 60% of the 82.50% students who took the Geography Test 60.00% got a score below 56 points

92.50% 95.00%

100%

37.50%

12.50% 2.50% 1

2

3

4

5

6

7

8

Ogive 92.50% 95.00%

100%

82.50%

60.00%

C60 37.50%

12.50% 2.50% 1

2

3

4

5

6

7

8

For example, the 60th centile (C60) is that point in a distribution which has 60% of the cases below it.



A frequency distribution of the scores of 376 boys on a test of mechanical ability is presented in the opposite table

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376



To illustrate C50

By definition, C50 is the centile point that will have 50% of the cases above and below it.

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376

C50 is the midpoint of the distribution and is known as the

median

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376

Hence we are interested in finding that point in the distribution with 188 cases above and below it

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376

Beginning from the bottom until we come as close to 188 cases, as possible, but not exceeding it.

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376

188 cases is at the bottom of class interval 30-34 and above 25-29. This being 29.5 has 142 cases below it. We need 46 cases to meet the 188 cases

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376

We need, therefore, to interpolate.

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376

We verify by coming down from top.

Cumulative Frequency and Percentage

CI

f

cf

cP

60-64

2

376

100

55-59

12

374

99.5

50-54

20

362

96.3

45-49

32

342

90.7

40-44

46

310

82.4

35-39

58

264

70.2

30-34

64

206

54.8

25-29

58

142

37.7

20-24

42

84

22.3

15-19

23

42

11.2

10-14

15

19

5.0

5-9

4

4

1.1

Total

376



Several of the Centile points have special names:  C10 – Decile (D1)  C20 – D2  C25 – 1st Quartile (Q1)  C50 – Median  C75 – 3rd Quartile (Q3)

25%

25%

25%

25%

Median, M

Lower Quartile, Q1

Upper Quartile, Q3



Suppose you have been notified that your score of 610 on the Verbal Graduate record Examination placed you at the 60th percentile in the distribution of scores. Where does your score of 610 stand in relation to the scores of others who took the examination?



Scoring at the

60%

60th

40%

percetile means that 60% of all examination scores were lower that your score and 40%

were higher

25%

25%

25%



60th %-tile

25%



Sample z-score

A z-score measures the distance between an observation and the mean, measured in units of standard deviation; Valuable tool for determining whether the observation under consideration is likely to occur quite frequently or somewhat unusual.



Consider the sample of 10 measurements: 1

1

0

15

2

3

4

0

1

3

The measurements x = 15 appears to be unusually large. Calculate the zscore for this observation and state your conclusions.

x

x2

1

1

1

1

0

0

15

225

2

4

3

9

4

16

0

0

1

1

3

9

∑x=30

∑x2=266

The z-score for the suspected outlier is calculated as:

µ

The measurement x=15 lies 2.71 standard deviation above the sample mean. Although the z-score does not exceed 3s, it is close enough so that you might suspect that x=15 is an outlier. In case, examine the sampling procedure to see whether x=15 is a faulty observation

Related Documents

Lessons
May 2020 25
Lessons
October 2019 48
Spelling Lessons
November 2019 13
Breathing Lessons
August 2019 58
English Lessons
July 2020 21
50 Lessons
May 2020 10

More Documents from ""

Marjorie Arevalo.pptx
November 2019 23
Ce Board Exam Coverage.docx
November 2019 42
Lessons 2(b)-(d).pdf
November 2019 14
Spa_claiming Of Fund
October 2019 18
Notice To Vacate
October 2019 22