Nota Pengantar Statistik Bab 2

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Nota Pengantar Statistik Bab 2 as PDF for free.

More details

  • Words: 7,939
  • Pages: 49
QQS1013 Elementary Statistics

DESCRIPTIVE STATISTICS

2.1 INTRODUCTION

Raw data

- Data recorded in the sequence in which there are collected and before they are processed or ranked

Array data - Raw data that is arranged in ascending or descending order. Exampl e1 Here is a list of question asked in a large statistics class and the “raw data” given by one of the students:

1.

What is your sex (m=male, f=female)? Answer : m

2.

How many hours did you sleep last night? Answer: 5 hours

3.

Randomly pick a letter – S or Q. Answer: S

4.

What is your height in inches? Answer: 67 inches

5.

What’s the fastest you’ve ever driven a car (mph)? Answer: 110 mph

Exampl e2 Quantitative raw data

Qualitative raw data

These data also called ungrouped data.

Chapter 2: Descriptive Statistics

1

QQS1013 Elementary Statistics

2.2 ORGANIZING AND GRAPHING QUALITATIVE DATA 2.2.1 Frequency Distributions Table A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories. It exhibits the frequencies are distributed over various categories Also called as a frequency distribution table or simply a frequency table. e.g. : The number of students who belong to a certain category is called the frequency of that category.

2.2.2 Relative Frequency and Percentage Distribution



A relative frequency distribution is a listing of all categories along with their relative frequencies (given as proportions or percentages).



It is commonplace to give the frequency and relative frequency distribution together.



Calculating relative frequency and percentage of a category

FORMUL A

Σξϖ∆λβ

Relative Frequency of a category =

Frequency of that category Sum of all frequencies

Percentage (%) = (Relative Frequency)* 100

Chapter 2: Descriptive Statistics

2

QQS1013 Elementary Statistics

Exampl e3 A sample of UUM staff-owned vehicles produced by Proton was identified and the make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy): Construct a frequency distribution table for these data with their relative frequency and percentage.

W Is Wj Wj St

W W Is Sv W

P W Wj W W

Is Wj Sv Is W

Is Is W P W

P W W Sv St

Is W W Wj St

W Is Wj Wj P

St W St W Wj

Wj Wj W W Sv

Solution: Category

Frequency

Wira Iswara Perdana Waja Satria Savvy Total

19 8 4 10 5 4 50

Relative Frequency 19/50 = 0.38 0.16 0.08 0.20 0.10 0.08 1.00

Percentage (%) 0.38*100 = 38 16 8 20 10 8 100

2.2.3 Graphical Presentation of Qualitative Data a) Bar Graphs A graph made of bars whose heights represent the frequencies of respective categories.

Chapter 2: Descriptive Statistics

3

QQS1013 Elementary Statistics



Such a graph is most helpful when you have many categories to represent.



Notice that a gap is inserted between each of the bars.

It has simple/ vertical bar chart horizontal bar chart component bar chart multiple bar chart Simple/ Vertical Bar Chart To construct a vertical bar chart, mark the various categories on the horizontal axis and mark the frequencies on the vertical axis



Horizontal Bar Chart To construct a horizontal bar chart, mark the various categories on the vertical axis and mark the frequencies on the horizontal axis.



Types of Vehicle

UUM Staff-owned Vehicles Produced By Proton

Satria Perdana Wira 0

5

10

15

20

Frequency

Component Bar Chart  To construct a component bar chart, all categories is in one bar and every bar is divided into components.  The height of components should be tally with representative frequencies.

Exampl e4

Chapter 2: Descriptive Statistics

4

QQS1013 Elementary Statistics

Suppose we want to illustrate the information below, representing the number of people participating in the activities offered by an outdoor pursuits centre during Jun of three consecutive years. 2004 21 10 75 36

Climbing Caving Walking Sailing Total

2005 34 12 85 36 142

2006 36 21 100 40 167

191

Solution:

Number of participants

Activities Breakdown (Jun) 200 150

Sailing Walking

100

Caving Climbing

50 0 2004

2005

2006

Year



Multiple Bar Chart 

To construct a multiple bar chart, each bars that representative any

categories are gathered in groups.  

The height of the bar represented the frequencies of categories.

Useful for making comparisons (two or more values). Activities Breakdown (Jun)

Number of participants

120 100 Climbing

80

Caving

60

Walking

40

Sailing

20 0 2004

2005

Chapter 2: Descriptive Statistics Year

2006

5

QQS1013 Elementary Statistics 

The bar graphs for relative frequency and percentage distributions can

be drawn simply by marking the relative frequencies or percentages, instead of the class frequencies.

Pie Chart A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories. •

An alternative to the bar chart and useful for summarizing a single categorical variable if there are not too many categories.



The chart makes it easy to compare relative sizes of each class/category.

The whole pie represents the total sample or population. The pie is divided into different portions that represent the different categories. To construct a pie chart, we multiply 360o by the relative frequency for each category to obtain the degree measure or size of the angle for the corresponding categories.

Exampl e5 Movie Genres Comedy Action Romance Drama Horror Foreign Science Fiction Total

Frequency

Relative Frequency

Angle Size

54 36 28 28 22 16 16

0.27 0.18 0.14 0.14 0.11 0.08 0.08

360*0.27=97.2o 360*0.18=64.8o 360*0.14=50.4o 360*0.14=50.4o 360*0.11=39.6o 360*0.08=28.8o 360*0.08=28.8o

200

1.00

360o

Chapter 2: Descriptive Statistics

6

QQS1013 Elementary Statistics

c) Line Graph/Time Series Graph A graph represents data that occur over a specific period time of time. •

Line graphs are more popular than all other graphs combined because their visual characteristics reveal data trends clearly and these graphs are easy to create.

When analyzing the graph, look for a trend or pattern that occurs over the time period. •

Example is the line ascending (indicating an increase over time) or descending (indicating a decrease over time).



Another thing to look for is the slope, or steepness, of the line. A line that is steep over a specific time period indicates a rapid increase or decrease over that period.

Two data sets can be compared on the same graph (called a compound time series graph) if two lines are used. Data collected on the same element for the same variable at different points in time or for different periods of time are called time series data. •

A line graph is a visual comparison of how two variables—shown on the x- and y-axes—are related or vary with each other. It shows related information by drawing a continuous line between all the points on a grid.



Line graphs compare two variables: one is plotted along the x-axis (horizontal) and the other along the y-axis (vertical).



The y-axis in a line graph usually indicates quantity (e.g., RM, numbers of sales litres) or percentage, while the horizontal x-axis often measures units of time. As a result, the line graph is often viewed as a time series graph

Exampl e6 Chapter 2: Descriptive Statistics

7

QQS1013 Elementary Statistics

A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings.

Ridership (in millions) 88.0 85.0 75.7 76.6 75.4

Year 1990 1991 1992 1993 1994

Solution:

Ridership (in millions)

89 87 85 83 81 79 77 75 1990

1991

1992

1993

1994

Year

The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.

EXERCISE 1

Chapter 2: Descriptive Statistics

8

QQS1013 Elementary Statistics

1. The following data show the method of payment by 16 customers in a supermarket checkout line. ( C = cash, CK = check, CC = credit card, D = debit and O = other ). C CK

a. b. c.

CK CC

CK D

C CC

CC C

D CK

O CK

C CC

Construct a frequency distribution table. Calculate the relative frequencies and percentages for all categories. Draw a pie chart for the percentage distribution.

2. The frequency distribution table represents the sale of certain product in ZeeZee Company. Each of the products was given the frequency of the sales in certain period. Find the relative frequency and the percentage of each product. Then, construct a pie chart using the obtained information. Type of Product A B C D E

Frequency

Relative Frequency

Percentage

Angle Size

13 12 5 9 11

3. Draw a time series graph to represent the data for the number of worldwide airline fatalities for the given years. Year No. of fatalities

1990

1991

1992

1993

1994

1995

1996

440

510

990

801

732

557

1132

4. A questionnaire about how people get news resulted in the following information from 25 respondents (N = newspaper, T = television, R = radio, M = magazine). N R M T T

a. b.

N N M R R

R T N M R

T M R N N

T R N M N

Construct a frequency distribution for the data. Construct a bar graph for the data.

5. The given information shows the export and import trade in million RM for four months of sales in certain year. Using the provided information, present this data in component bar graph. Month September October November December

6.

Export 28 30 32 24

Import 20 28 17 14

The following information represents the maximum rain fall in millimeter (mm) in each state in Malaysia. You are supposed to help a meteorologist in your place to make an analysis. Based on your knowledge,

Chapter 2: Descriptive Statistics

9

QQS1013 Elementary Statistics present this information using the most appropriate chart and give your comment. State

Quantity (mm) 435 512 163 721 664

Perlis Kedah Pulau Pinang Perak Selangor Wilayah Persekutuan Kuala Lumpur Negeri Sembilan Melaka Johor Pahang Terengganu Kelantan Sarawak Sabah

1003 390 223 876 1050 1255 986 878 456

2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA 2.3.1 Stem-and-Leaf Display In stem and leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. Then the leaves for each stem are shown separately in a display. Gives the information of data pattern. Can detect which value frequently repeated. Exampl e7

13 41

25 12 9 10 5 12 11 12 31 28 37 6 38 44 13 22 18 19

23

7

Solution: 0 1 2 3 4

9 2 5 6 1

5 0 3 1 4

7 2 8 7

6 3 1 2 4 3 8 9 2 8

2.3.2 Frequency Distributions

Chapter 2: Descriptive Statistics

10

QQS1013 Elementary Statistics

A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. Data presented in form of frequency distribution are called grouped data. The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. Also called real class limit.

To find the midpoint of the upper limit of the first class and the lower limit of the second class, we divide the sum of these two limits by 2. e.g.:

400 + 401 = 400.5 2

class boundary

Class Width (class size)

FORMUL A

Σξϖ∆λβ

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

Class Midpoint or Mark

FORMUL A

Σξϖ∆λβ

Chapter 2: Descriptive Statistics

11

QQS1013 Elementary Statistics

class midpoint or mark =

Lower limit + Upper limit 2

e.g:

Midpoint of the 1st class =

401 + 600 =500.5 2

Constructing Frequency Distribution Tables 1. To decide the number of classes, we used Sturge’s formula, which is FORMUL A

Σξϖ∆λβ where

c = 1 + 3.3 log n c is the no. of classes n is the no. of observations in the data set.

2. Class width,

FORMUL A

Σ ∆λβ ξϖ

Largest value - Smallest value Number of classes Range i> c i>

This class width is rounded up to a convenient number. 3. Lower Limit of the First Class or the Starting Point  Use Exampl the smallest value in the data set.

e8 Chapter 2: Descriptive Statistics

12

QQS1013 Elementary Statistics The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season.

Number of classes, c

Class width,

= 1 + 3.3 log 30 = 1 + 3.3(1.48) = 5.89 ≈ 6 class 242 − 135 6 > 17.8 ≈ 18

i>

i)

Starting Point = 135

Table 2.10 : Frequency Distribution for Data of Table 2.9

Total Home Runs 135 – 153 153 – 171 171 – 189 189 – 207 207 – 225 225 – 242

Tally |||| |||| || |||| |||| | ||| ||||

f 10 2 5 6 3 4

∑ f = 30

2.3.3 Relative Frequency and Percentage Distributions FORMUL A

Σξϖ∆λβ

Chapter 2: Descriptive Statistics

13

QQS1013 Elementary Statistics Frequency of that class Sum of all frequencies f = ∑f

Relative frequency of a class =

Percentage = (Relative frequency) •100

Exampl e9 (Refer example 8) Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs 135 – 153 153 – 171 171 – 189 189 – 207 207 – 225 225 – 242

Class Boundaries 134.5 less than 152.5 152.5 less than 170.5 170.5 less than 188.5 188.5 less than 206.5 206.5 less than 224.5 224.5 less than 242.5 Total

Relative Frequency 0.3333 0.0667 0.1667 0.2000 0.1000 0.1333 1.0

% 33.33 6.67 16.67 20.00 10.00 13.33 100%

2.3.4 Graphing Grouped Data a) Histograms A histogram is a graph in which the class boundaries are marked on the horizontal axis and either the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies or percentages are represented by the heights of the bars. In histogram, the bars are drawn adjacent to each other and there is a space between y axis and the first bar.

Exampl e 10

(Refer example 8) Frequency histogram for Table 2.9 12 10 8 6 4 2 0 134.5

b) Polygon

152.5 170.5 188.5 206.5 224.5 242.5

1

Total home runs

A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon. Chapter 2: Descriptive Statistics

14

QQS1013 Elementary Statistics

Exampl e 11

Frequency polygon for Table 2.11 12

Frequency

10 8 6 4 2 0 134.5

152.5 170.5 188.5 206.5 224.5 242.5

1

Total home runs

For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.

Frequency distribution curve

Shape of Histogram Same as polygon. For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.

The most common of shapes are: (i) Symmetric (ii) Right skewed

(iii) Left skewed Chapter 2: Descriptive Statistics

15

QQS1013 Elementary Statistics

Symmetric histograms

Right skewed and Left skewed

 Describing data using graphs helps us insight into the main characteristics of the data.  When interpreting a graph, we should be very cautious. We should observe carefully whether the frequency axis has been truncated or whether any axis has been unnecessarily shortened or stretched.

2.3.5 Cumulative Frequency Distributions • A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class. Exampl e 12 Chapter 2: Descriptive Statistics

16

QQS1013 Elementary Statistics

Using the frequency distribution of table 2.11, Total Home Runs 135 – 152 153 – 170 171 – 188 189 – 206 207 – 224 225 – 242

Cumulative Frequency

Class Boundaries

f

134.5 less than 152.5 152.5 less than 170.5 170.5 less than 188.5 188.5 less than 206.5 206.5 less than 224.5 224.5 less than 242.5

10 2 5 6 3 4

10 10+2=12 10+2+5=17 10+2+5+6=23 10+2+5+6+3=26 10+2+5+6+3+4=30

Ogive An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. Two type of ogive: (i) (ii)

ogive less than ogive greater than

First, build a table of cumulative frequency.

Exampl e 13 (Ogive Less Than) Earnings Number of (RM) students (f)

Cumulative Frequency

30 – 39 40 – 49 50 – 59 60 - 69 70 – 79 80 - 89

5 6 6 3 3 7

Total

Earnings (RM)

Cumulative Frequency (F)

Less than 29.5 Less than 39.5 Less than 49.5 Less than 59.5 Less than 69.5 Less than 79.5 Less than 89.5

0 5 11 17 20 23 30

30

35 Ogive Less Than Graph 30 25 20 15 10 5 0

Chapter 2: Descriptive 29.5Statistics 39.5

49.5

59.5

69.5

79.5

89.5

Earnings

17

QQS1013 Elementary Statistics

Exampl e 14 (Ogive More Than) Earnings (RM)

Number of students (f)

30 – 39 40 – 49 50 – 59 60 - 69 70 – 79 80 - 89

5 6 6 3 3 7

Total

Earnings (RM)

Cumulative Frequency (F)

More than 29.5 More than 39.5 More than 49.5 More than 59.5 More than 69.5 More than 79.5 More than 89.5

30 25 19 13 10 7 0

30

Graph Ogive More Than 35 30 25 20

Cumulative Frequency 15 10 5 0 29.5

39.5

49.5

59.5

69.5

79.5

89.5

Earnings

2.3.6 Box-Plot Describe the analyze data graphically using 5 measurement: smallest value, first quartile (K1), second quartile (median or K2), third quartile (K3) and largest value.

Chapter 2: Descriptive Statistics

18

QQS1013 Elementary Statistics

For symmetry data

Smallest value

K1

Median

K3

Largest value

For left skewed data

Smallest value

K1

Median

K3

Largest value

For right skewed data Smallest K1 value

Median

Largest value

K3

2.4 MEASURES OF CENTRAL TENDENCY 2.4.1 Ungrouped Data Measurement Mean FORMUL A

Σξϖ∆λβ

µ=

Mean for population data:

Mean for sample data:

where:

x=

∑x N

∑x n

∑x =

the sum of all values N = the population size n = the sample size, µ = the population mean

x

= the sample mean

Exampl e 15 The following data give the prices (rounded to thousand RM) of five homes sold recently in Sekayang.

158

189

265

127

191

Find the mean sale price for these homes. Chapter 2: Descriptive Statistics

19

QQS1013 Elementary Statistics

Solution: x=

∑x

n 158 189 + +265 + 1+ 27 191 = 5 930 = 5 =186

Thus, these five homes were sold for an average price of RM186 thousand @ RM186 000.

The mean has the advantage that its calculation includes each value of the data set. 

Weighted Mean Used when have different needs. 

Weight mean : FORMUL A

Σξϖ∆λβ

xw =

∑ wx ∑w

where w is a weight.

Exampl e 16 Consider the data of electricity components purchasing from a factory in the table below: Type

Number of component (w)

Chapter 2: Descriptive Statistics

Cost/unit (x)

20

QQS1013 Elementary Statistics 1 2 3 4 5

1200 500 2500 1000 800

Total

6000

RM3.00 RM3.40 RM2.80 RM2.90 RM3.25

Solution:

xw =

∑wx ∑w

1200(3) 500(3.4) + +2500(2.8) + 1000(2.9) + 800(3.2 1200 + 500 +2500+ 1000 + 800 17800 = 6000 = 2.967

5)

=

Mean cost of a unit of the component is RM2.97

Median Median is the value of the middle term in a data set that has been ranked in increasing order. Procedure for finding the Median Step 1: Rank the data set in increasing order. Step 2: Determine the depth (position or location) of the median. FORMUL A

Σ ∆λβ ξϖ

Depth of Median =

n +1 2

Step 3: Determine the value of the Median.

Exampl e 17 Find the median for the following data: 10 5 19

8

3

Solution: (1)

Rank the data in increasing order

Chapter 2: Descriptive Statistics

21

QQS1013 Elementary Statistics 3

5

8

10

19

(2)

Determine the depth of the Median n +1 Depth of Median = 2 5 +1 = 2 =3 (3) Determine the value of the median Therefore the median is located in third position of the data set. 3

5

8

10

19

Hence, the Median for above data = 8

Exampl e 18 Find the median for the following data: 10 5 19 8 3

15

Solution: (1) Rank the data in increasing order 3

5

8

10

15

(2) Determine the depth of the Median

Depth of Median = = =

19

n +1 2 6 +1 2 3.5

(3) Determine the value of the Median Therefore the median is located in the middle of 3rd position and 4th position of the data set.

Median =

8 +10 = 9 2

Hence, the Median for the above data = 9 

The median gives the center of a histogram, with half of the data

values to the left of (or, less than) the median and half to the right of (or, more than) the median. 

The advantage of using the median is that it is not influenced by

outliers.

Mode Chapter 2: Descriptive Statistics

22

QQS1013 Elementary Statistics

Mode is the value that occurs with the highest frequency in a data set. Exampl e 19 1. What is the mode for given data? 77 69 74 81 71

68

74

73

2. What is the mode for given data? 77 69 68 74 81 71 68 74 73 Solution: 1. Mode = 74 (this number occurs twice): Unimodal 2. Mode = 68 and 74: Bimodal 

A major shortcoming of the mode is that a data set may have

none or may have more than one mode. 

One advantage of the mode is that it can be calculated for both

kinds of data, quantitative and qualitative.

2.4.2 Grouped Data Measurement Mean FORMUL A

Σξϖ∆λβ

Mean for population data:

μ=

∑fx N

Mean for sample data:

x= Exampl

Where

x

∑fx n

the midpoint and f is the frequency of a class.

e 20 The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Calculate the mean. Number of order 10 – 12 13 – 15 16 – 18 19 – 21 Chapter 2: Descriptive Statistics

f 4 12 20 14 n = 50

23

QQS1013 Elementary Statistics

Solution: Because the data set includes only 50 days, it represents a sample. The value of fx is calculated in the following table:



Number of order 10 – 12 13 – 15 16 – 18 19 – 21

f 4 12 20 14 n = 50

x 11 14 17 20

fx 44 168 340 280 ∑ fx = 832

The value of mean sample is: x=

∑fx= n

832 =16.64 50

Thus, this mail-order company received an average of 16.64 orders per day during these 50 days.

Median Step 1: Construct the cumulative frequency distribution. Step 2: Decide the class that contain the median. Class Median is the first class with the value of cumulative frequency is at least n/2.

Step 3: Find the median by using the following formula: FORMUL A

Σξϖ∆λβ Exampl

n  2-F Median= Lm +  fm 

  i  

Where: n = the total frequency F = the total frequency before class median i = the class width

Lm = the lower boundary of the class median

fm =

the frequency of the class median

e 21 Based on the grouped data below, find the median: Time to travel to work 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50

Frequency 8 14 12 9 7

Solution: Chapter 2: Descriptive Statistics

24

QQS1013 Elementary Statistics

1st Step: Construct the cumulative frequency distribution Time to travel to work

Frequency

Cumulative Frequency

1 – 10 11 – 20 21 – 30 31 – 40 41 – 50

8 14 12 9 7

8 22 34 43 50

n 50 = = 25 2 2

So,

F = 22 ,

class median is the 3rd class

fm = 12, Lm = 21.5

and

i = 10

Therefore,

n   2 -F Median = Lm +  i  fm     25 - 22  = 21.5 +  10  12  = 24 Thus, 25 persons take less than 24 minutes to travel to work and another 25 persons take more than 24 minutes to travel to work.

Mode Mode is the value that has the highest frequency in a data set. For grouped data, class mode (or, modal class) is the class with the highest frequency. Formula of mode for grouped data: FORMUL A

Σξϖ∆λβ Mode

 Δ1 =L mo+  i Δ +Δ  1 2

Where:

Lmo Chapter 2: Descriptive Statistics

25

QQS1013 Elementary Statistics is the lower boundary of class mode

∆1 ∆2 i

is the difference between the frequency of class mode and the frequency of the class before the class mode is the difference between the frequency of class mode and the frequency of the class after the class mode is the class width

Exampl e 22 Based on the grouped data below, find the mode Time to travel to work 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50

Frequency 8 14 12 9 7

Solution: Based on the table,

Lmo = 10.5, ∆1= (14 – 8) = 6, ∆ 2 = (14 – 12) = 2 and i = 10

 6 Mode = 105 . + . 10= 175  6 +2

We can also obtain the mode by using the histogram;

Chapter 2: Descriptive Statistics

26

QQS1013 Elementary Statistics

2.4.3 Relationship among Mean, Median & Mode As discussed in previous topic, histogram or a frequency distribution curve can assume either skewed shape or symmetrical shape. Knowing the value of mean, median and mode can give us some idea about the shape of frequency curve. For a symmetrical histogram and frequency curve with one peak, the value of the mean, median and mode are identical and they lie at the center of the distribution.

Mean, median, and mode for a symmetric histogram and frequency distribution curve

For a histogram and a frequency curve skewed to the right, the value of the mean is the largest that of the mode is the smallest and the value of the median lies between these two.

Mean, median, and mode for a histogram and frequency distribution curve skewed to the right

Chapter 2: Descriptive Statistics

27

QQS1013 Elementary Statistics For a histogram and a frequency curve skewed to the left, the value of the mean is the smallest and that of the mode is the largest and the value of the median lies between these two.

Mean, median, and mode for a histogram and frequency distribution curve skewed to the left

2.5 DISPERSION MEASUREMENT The measures of central tendency such as mean, median and mode do not reveal the whole picture of the distribution of a data set. Two data sets with the same mean may have a completely different spreads. •

The variation among the values of observations for one data set may be much larger or smaller than for the other data set.

2.5.1 Ungrouped Data Measurement Range FORMUL A

Σξϖ∆λβ

RANGE = Largest value – Smallest value

Chapter 2: Descriptive Statistics

28

QQS1013 Elementary Statistics

Exampl e 23

Find the range of production for this data set,

Solution: Range = Largest value – Smallest value = 267 277 – 49 651 = 217 626



Disadvantages: o

o

being influenced by outliers. based on two values only. All other values in a data set are ignored.

Variance and Standard Deviation Standard deviation is the most used measure of dispersion. A Standard Deviation value tells how closely the values of a data set clustered around the mean. Lower value of standard deviation indicates that the data set value are spread over relatively smaller range around the mean. Larger value of data set indicates that the data set value are spread over relatively larger around the mean (far from mean). Standard deviation is obtained the positive root of the variance: FORMUL A Variance for population:

Σξϖ∆λβ

σ = 2

Chapter 2: Descriptive Statistics

∑x

2

(∑x ) −

2

N

N 29

QQS1013 Elementary Statistics

Variance for sample:

s = 2

FORMUL A

∑x

( ∑ x) −

2

n −1

2

n

Standard Deviation for population:

Σ ∆λβ ξϖ

σ2 = σ2 Standard Deviation for sample:

s2 = s2 Exampl e 24 Let x denote the total production (in unit) of company Company A B C D E

Production 62 93 126 75 34

Find the variance and standard deviation,

Solution: Company

Production (x)

x2

A B C D E

62 93 126 75 34

3 844 8 649 15 876 5 625 1 156

∑x

1156

2

s =

∑x

2

-

2

n

( 390 )

5 −1 = 118250 .

Chapter 2: Descriptive Statistics

=35150

n -1 35150-

=

( ∑x )

2

2

5

30

QQS1013 Elementary Statistics

Since s2 = 1182.50; Therefore, s = 118250 . = 343875 . The properties of variance and standard deviation: The standard deviation is a measure of variation of all values from the mean. The value of the variance and the standard deviation are never negative. Also, larger values of variance or standard deviation indicate greater amounts of variation. The value of s can increase dramatically with the inclusion of one or more outliers. The measurement units of variance are always the square of the measurement units of the original data while the units of standard deviation are the same as the units of the original data values.

2.5.2 Grouped Data Measurement Range FORMUL A

Σ ∆λβ ξϖ

Range = Upper bound of last class – Lower bound of first class

Class 41 – 50 51 – 60 61 – 70 71 – 80 81 – 90 91 - 100 Total

Frequency 1 3 7 13 10 6 40

Upper bound of last class = 100.5 Lower bound of first class = 40.5 Range = 100.5 – 40.5 = 60

Variance and Standard Deviation FORMUL A

Σξϖ∆λβ

Variance for population:

Chapter 2: Descriptive Statistics

31

QQS1013 Elementary Statistics

σ = 2

∑ fx

2

( ∑ fx ) − N

N

Variance for sample:

s2 = FORMUL A

Σξϖ∆λβ

2

∑ fx

( ∑ fx ) −

2

n −1

2

n

Standard Deviation: Population: σ Sample:

2

= σ2

s2 = s2

Exampl e 25 Find the variance and standard deviation for the following data: No. of order 10 – 12 13 – 15 16 – 18 19 – 21 Total

f 4 12 20 14 n = 50

Solution: No. of order 10 – 12 13 – 15 16 – 18 19 – 21 Total

f 4 12 20 14 n = 50

x 11 14 17 20

fx 44 168 340 280

fx2 484 2352 5780 5600

857

14216

Variance,

Chapter 2: Descriptive Statistics

32

QQS1013 Elementary Statistics

s = 2

=

∑fx

2

( ∑fx ) − n −1

n

( 832 ) 14216 −

50 −1 = 7.5820

2

2

50

Standard Deviation,

s = s 2 = 7.5820 = 2.75 Thus, the standard deviation of the number of orders received at the office of this mailorder company during the past 50 days is 2.75.

2.5.3 Relative Dispersion Measurement To compare two or more distribution that has different unit based on their dispersion OR To compare two or more distribution that has same unit but big different in their value of mean. Also called modified coefficient or coefficient of variation, CV. FORMUL A

Σξϖ∆λβ

s CV =   ×100 % − ( sample ) x σ  CV =   ×100 % − ( population ) x

Exampl e 26 Given mean and standard deviation of monthly salary for two groups of worker who are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the CV for every group and determine which group is more dispersed.

Solution: Chapter 2: Descriptive Statistics

33

QQS1013 Elementary Statistics

20 × 100 % = . 286 % 700 20 CV 2 = × 100 % = . 187 % 1070 CV 1 =

The monthly salary for group 1 worker is more dispersed compared to group 2.

2.6 MEASURE OF POSITION



Determines the position of a single value in relation to other values in a sample or a population data set.



Quartiles Quartiles are three summary measures that divide ranked data set into four equal parts.

The 1st quartiles – denoted as Q1 FORMUL A

Σ ∆λβ ξϖ

n +1 4

Depth of Q1 =

The 2nd quartiles – median of a data set or Q2 The 3rd quartiles – denoted as Q3 FORMUL A

Σξϖ∆λβ

Depth of Q 3 =

3( n + 1) 4

Exampl e 27

Table below lists the total revenue for the 11 top tourism company in Malaysia

109.7

79.9

21.2

76.4

80.2

82.1

79.4

89.3

98.0

103.5

86.8 Solution: Chapter 2: Descriptive Statistics

34

QQS1013 Elementary Statistics

Step 1: Arrange the data in increasing order 76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0

103.5

109.7

121.2 Step 2: Determine the depth for Q1 and Q3

Depth of Q1 =

n + 1 11 + 1 = =3 4 4

Depth of Q 3 =

3 ( 11 + 1) 3( n + 1) = = 9 4 4

Step 3: Determine the Q1 and Q3 76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0 103.5

109.7

121.2

Exampl

Q1 = 79.9 ; Q3 = 103.5

e 28 Table below lists the total revenue for the 12 top tourism company in Malaysia

109.7

79.9

74.1

98.0

103.5

86.8

121.2

76.4

80.2

82.1

79.4

89.3

Solution: Step 1: Arrange the data in increasing order 74.1 76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0 103.5

109.7

98.0 103.5

109.7

121.2 Step 2: Determine the depth for Q1 and Q3 n +1 = 4

Depth of Q =1

Depth of Q =3

12 + 1 = 4

3(n + 1) = 4

3.25

3 (12 + 1 = 4

)

975 .

Step 3: Determine the Q1 and Q3 74.1 76.4

79.4

79.9

80.2

82.1

86.8

89.3

121.2 Q1 = 79.4 + 0.25 (79.9 – 79.4) = 79.525 Q3 = 98.0 + 0.75 (103.5 – 98.0) = 102.125 Chapter 2: Descriptive Statistics

35

QQS1013 Elementary Statistics



Interquartile Range  The difference between the third quartile and the first quartile for a data set. FORMUL A

Σξϖ∆λβ

IQR = Q3 – Q1

Exampl e 29

By referring to example 28, calculate the IQR.

Solution: IQR = Q3 – Q1 = 102.125 – 79.525 = 22.6

2.6.2 Grouped Data Measurement •

Quartiles

From Median, we can get Q1 and Q3 equation as follows: FORMUL A

Σξϖ∆λβ

n   4 - F Q1 =L Q+  i 1 f Q 1      3n   4 - F Q3 =LQ 3+  i f Q3    

Exampl e 30 Refer to example 22, find Q1 and Q3

Solution: 1st Step: Construct the cumulative frequency distribution Time to travel to work

Chapter 2: Descriptive Statistics

Frequency

Cumulative Frequency

36

QQS1013 Elementary Statistics 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50

8 14 12 9 7

8 22 34 43 50

2nd Step: Determine the Q1 and Q3

Class Q

1

n 50 = = =12 5 4 4

.

Class Q1 is the 2nd class

Therefore,

n   4 -F Q1 = LQ1 +  i f  Q1     12.5 - 8  = 10.5 +   10  14  = 13.7143 Class Q 3 =

3n 3 ( 50 ) = = 37 5 . 4 4

Class Q3 is the 4th class Therefore,

n   4 -F Q3 = LQ3 +  i f Q  3     37.5 - 34  = 30.5 +  10 9   = 34.3889 •

Interquartile Range FORMUL A

Σξϖ∆λβ

IQR = Q3 – Q1

Chapter 2: Descriptive Statistics

37

QQS1013 Elementary Statistics

Exampl e 31 Refer to example 30, calculate the IQR.

Solution: IQR = Q3 – Q1 = 34.3889 – 13.7143 = 20.6746

2.7 MEASURE OF SKEWNESS To determine the skewness of data (symmetry, left skewed, right skewed) Also called Skewness Coefficient or Pearson Coefficient of Skewness FORMUL A

Σ ∆λβ ξϖ

Sk =

Mean − Mode s

or Sk =

3( Mean − Mode ) s

If Sk +ve  right skewed If Sk -ve  left skewed If Sk = 0  symmetry 

If Sk takes a value in between (-0.9999, -0.0001) or (0.0001, 0.9999)  approximately symmetry.

Exampl e 32 The duration of cancer patient warded in Hospital Seberang Jaya recorded in a frequency distribution. From the record, the mean is 28 days, median is 25 days and mode is 23 days. Given the standard deviation is 4.2 days. What is the type of distribution? Find the skewness coefficient Chapter 2: Descriptive Statistics

38

QQS1013 Elementary Statistics

Solution: This distribution is right skewed because the mean is the largest value

Sk =

Sk =

Mean - Mode 28 − 23 = = 11905 . s 4.2 OR 3 ( Mean - Median ) s

=

3 ( 28 − 25 ) 4.2

= 21429 .

So, from the Sk value this distribution is right skewed.

ADDITIONAL INFORMATION Use of Standard Deviation 1. Chebyshev’s Theorem According to Chebyshev’s Theorem, for any number k greater than 1, at least (1 – 1/k2) of the data values lie within k standard deviations of the mean.

1 k2 1 =1 − ( 2) 2 = 0.75 @ 75 % =1 −

Thus; for example if k = 2, then Therefore, according to Chebyshev’s Theorem, at least 75% of the values of a data set lie within two standard deviation of the mean

Empirical Rule •

For a bell-shaped distribution, approximately Chapter 2: Descriptive Statistics

39

QQS1013 Elementary Statistics



1.68%of the observations lie within one standard deviation of the mean.



2.95% of the observations lie within two standard deviations of mean.



3.99.7% of the observations lie within three standard deviations of the mean.

Measure of Position 1.

Ungrouped Data - Quartile Deviation QD is a mean for Interquartile Range It used to compare the dissemination of two data set. If the QD value is high, it means that the data is more disseminated.

Quartile Deviation = Interquartile Range / 2 = (Q3 - Q1) / 2

2.

Ungrouped Data – Percentile

Pk = value of the (kn)th term in a ranked set 100 Where: k = the number of percentile Chapter 2: Descriptive Statistics

40

QQS1013 Elementary Statistics

n = the sample size

Percentile rank of xi = Number of values than xi X 100 Total number of values in the data set

Chapter 2: Descriptive Statistics

41

a.

QQS1013 Elementary Statistics

EXERCISE 2 1. A survey research company asks 100 people how many times they have been to the dentist in the last five years. Their grouped responses appear below. Number of Visits 0–4 5–9 10 – 14 15 – 19

Number of Responses 16 25 48 11

What are the mean and variance of the data?

2. A researcher asked 25 consumers: “How much would you pay for a television adapter that provides Internet access?” Their grouped responses are as follows: Amount ($)

Number of Responses

0 – 99 100 – 199 200 – 249 250 – 299 300 – 349 350 – 399 400 – 499 500 – 999

2 2 3 3 6 3 4 2

Calculate the mean, variance, and standard deviation.

3.

The following data give the pairs of shoes sold per day by a particular shoe store in the last 20 days. 85 89

90 86

89 71

70 76

79 77

80 89

83 70

83 65

75 90

76 86

Calculate the a. mean and interpret the value. b.median and interpret the value. c. mode and interpret the value. d.standard deviation.

4.

The followings data shows the information of serving time (in minutes) for 40 customers in a post office:

2.0 4.5 2.5 2.9 4.2 2.9 3.2 2.9 4.0 3.0 3.8 2.5 2.1 3.1 3.6 4.3 4.7 2.6 4.6 2.8 5.1 2.7 2.6 4.4 2.7 3.9 2.9 2.9 2.5 3.7 Construct a frequency distribution table with 0.5 of class width. Chapter 2: Descriptive Statistics

3.5 2.3 4.1 3.5 3.3

2.8 3.5 3.1 3.0 2.4

42

QQS1013 Elementary Statistics

b.

Construct a histogram.

c.

Calculate the mode and median of the data.

d.

Find the mean of serving time.

e.

Determine the skewness of the data.

.

Find the first and third quartile value of the data.

g.

Determine the value of interquartile range.

5.

In a survey for a class of final semester student, a group of data was obtained for the number of text books owned. Number of students 12 9 11 15 10 8

Number of text book owned 5 5 3 2 1 0

Find the average number of text book for the class. Use the weighted mean.

6.The following data represent the ages of 15 people buying lift tickets at a ski area. 15 30

25 53

26 28

17 40

38 20

16 35

60 31

21

Calculate the quartile and interquartile range. 7.A student scores 60 on a mathematics test that has a mean of 54 and a standard deviation of 3, and she scores 80 on a history test with a mean of 75 and a standard deviation of 2. On which test did she perform better? 8.The following table gives the distribution of the share’s price for ABC Company which was listed in BSKL in 2005. Price (RM) 12 – 14 15 – 17 18 – 20 21 – 23 24 – 26 27 - 29

Frequency 5 14 25 7 6 3

Find the mean, median and mode for this data.

ANSWER EXERCISE 1 1. a) Frequency distribution table, relative frequencies, percentages and angle sizes of all categories. Chapter 2: Descriptive Statistics

43

QQS1013 Elementary Statistics

Method of payment Cash Check Credit Card Debit Other Total

Frequency, f

Relative frequency 0.2500 0.3125 0.2500 0.1250 0.0625 1.0

4 5 4 2 1 16

Angle Size (o)

Percentage (%) 25 31.25 25 12.50 6.25 100

90 112.5 90 45 22.5 360

b). Pie Chart 6% 13%

25% Cash Check Credit Card Debit

25%

Other 31%

2. a). Frequency distribution table, relative frequencies, percentages and angle sizes of all categories. Type of product A B C D E Total

Frequency 13 12 5 9 11 50

Relative Frequency 0.26 0.24 0.1 0.18 0.22 1

Percentage (%) 26 24 10 18 22 100

Angle Size (o) 93.6 86.4 36 64.8 79.2 360

b). Pie Chart

E, 11

A, 13 A B C D

D, 9

E B, 12 C, 5

3. Time series graph

Chapter 2: Descriptive Statistics

44

QQS1013 Elementary Statistics

1200 No. of Fatalities

1000 800 600 400 200 0 1

2

3

4

5

6

7

Time

4. a). Frequency Distribution Table Source of news Newspaper Television Radio Magazine Total b). Bar Graph

Frequency, f 8 5 7 5 25

9

Frequency

8 7 6 5 4 3 2 1 0 Newspaper

Television

Radio

Magazine

Source of news

5. Component bar graph 70

Frequency

60 50 40

Import

30

Export

20 10 0 September

October

November

December

Month

6. Bar Graph

Chapter 2: Descriptive Statistics

45

QQS1013 Elementary Statistics

Quantity (mm)

P ah Te an g re ng ga nu K el an ta n S ar aw ak S ab ah

Jo ho r

K S L em bi la n M el ak a

N eg er i

K ed ul au ah P in an g P er ak S el an go r

Quantity (mm)

P

P

er lis

1400 1200 1000 800 600 400 200 0

The highest quantity of rain fall is coming from Terengganu state, second goes to Pahang and followed by Kuala Lumpur. The lowest rain fall is Pulau Pinang state. The rain fall is not equally distributed.

ANSWER EXERCISE 2 1. Class 0-4 5-9 10 -14 15 - 19

f

x 16 25 48 11 100

fx 2 7 12 17

fx^2 32 175 576 187 970

64 1225 6912 3179 11380

970/100 =

9.7

Standard Deviation =

4.46196

Mean =

Variance =

19.90909

2. Class 0-99 100-199 200-249 250-299 300-349 350-399 400-499 500-999

f

x 2 2 3 3 6 3 4 2 25

fx 49.5 149.5 224.5 274.5 324.5 374.5 449.5 749.5

Chapter 2: Descriptive Statistics

fx^2 99 299 673.5 823.5 1947 1123.5 1798 1499 8262.5

4900.5 44700.5 151200.75 226050.75 631801.5 420750.75 808201 1123500.5 3411106.25

Mean =

330.5

Standard Deviation =

168.368396

Variance =

28347.9167

46

QQS1013 Elementary Statistics

3.

position of median = 65 70 70 71 75 76 76 77 79 80 83 83 85 86 86 89 89 89 90 90 1609

4225 4900 4900 5041 5625 5776 5776 5929 6241 6400 6889 6889 7225 7396 7396 7921 7921 7921 8100 8100 130571

median =

10.5 (80+83)/2 =

81.5

Mean =

80.45

Mode =

89

Variance =

59.31316

s=

7.701504

4. Sturge's Formula Number of classes, c = 1 + 3.3 log 40 = 6.2868 = 6 Class Width, I > (5.1 - 2)/6 = 0.5167 = 0.6 Starting Point = 2.0 Frequency Distribution Table Class f CF 2.0 - 2.5 7 7 2.6 - 3.1 15 22 3.2 - 3.7 7 29 3.8 - 4.3 6 35 4.4 - 4.9 4 39 5.0 - 5.5 1 40 40 Chapter 2: Descriptive Statistics

x

fx 2.2 2.7 3.2 3.7 4.2 4.7

15.4 40.5 22.4 22.2 16.8 4.7 122

fx^2 33.88 109.35 71.68 82.14 70.56 22.09 389.7

47

QQS1013 Elementary Statistics

class mode = second class mode =

2.85

class median = 40/2 = 20 = second class Median = Mean =

3.07 3.05

Skewness = mode - median - mean = Right Skew 40/4 = 10 Q1 =

2.67

Q3 =

3.85

IQR =

1.18

5. x

w 12 9 11 15 10 8

xw 5 5 3 2 1 0 16

60 45 33 30 10 0 178

Mean =

11.125

6. 15 16 17 20 21 25 26 28 30 31 35 38 40 53 60

Chapter 2: Descriptive Statistics

48

7. CV (Mathematics) =

CV (History) =

Position of Q1 = Q1 =

4 20

Position of Q3 = Q3 =

12 QQS1013 Elementary Statistics 38

IQR =

18

3/54 * 100% =

5.5556

2/75 * 100% =

2.6667

Since the coefficient of variation for History is less than Mathematics so, the student performs better for History.

Chapter 2: Descriptive Statistics

49

Related Documents