Hs550_week 1-3.pdf

  • Uploaded by: Praveen Choudhary
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Hs550_week 1-3.pdf as PDF for free.

More details

  • Words: 1,780
  • Pages: 34
HS550: Statistical Methods (3-0-2-4) (Feb-June Semester, 2019)

Course instructor: Shyamasree Dasgupta Office: A1-303; ​Phone: 1905267122 Drop me an email anytime at [email protected] Indian Institute of Technology Mandi

2/28/2019

1

Module 1: Representation of Data and Descriptive Statistics [Week 1-3 (7 lectures)]*

How to represent the data that you have – table and charts? Who is the “one” representative value of the dataset? What is the average deviation from the representative value? If you have data on two variables, how to check their relationship? * No classes on 20th and 22nd Feb 2/28/2019

2

Books (Also consult Basic Statistics by Nagar and Das)

2/28/2019

3

Let’s appreciate the need for “quantity” as well as “quality”

2/28/2019

4

Comparison between these two events are possible only when both quantitative and qualitative information are available

• British Indian Army killed a huge number of civilians in Jallianwala Bagh

• Thuggees killed a huge number civilians in various parts of India

• Around 400 people were killed

• Around 2 million individuals were killed

• Open fired on a group of unarmed, nonviolent protesters and pilgrims

• It was a part of criminal activities of the Thuggees and was related to robbery

• It happened in the year 1919

• It happened over a period of 600 years – (1290 -1870)

2/28/2019

5

• The distinction between the quantitative and qualitative information, as they are often articulated, is misleading. • Both are equally important to know whether an event/ a finding is typical or atypical.

2/28/2019

6

Tracing the history of data representation Mortality Table of John Graunt (1661)

Overcoming the problem of “can’t see the forest for the tree” 2/28/2019

7

Scottish imports/exports by W. Playfair (1786)

2/28/2019

8

1859: Florence Nightingale’s polar area diagram 2/28/2019

9

Not the numbers but the arrangement of the numbers tells the story! Click on the link!

2/28/2019

10

Types of Data Nominal variable (or Attribute based variable): Pass/Fail? Category

Cardinal variable: height? 5.5ft A number Ordinal variable: 1st/2nd/3rd/...../last but one/last? Rank

Observe the Data_1 carefully and identify the variables as Nominal, Ordinal or Cardinal. Also observe that all the cardinal variables can be converted to ordinal variables

2/28/2019

11

What are the heights of the students in IIT Mandi? Height of Roll no. .....is 5.2 ft Height of Roll no. ..... Is 4.9 ft and so on....

When data is in a raw format, the first task is to arrange them in a meaningful manner. You may lose some of the details while doing it, but that’s fine! 2/28/2019

12

Arrangement makes life easy! • Height (in ft) of 30 students: 5.2, 5.9, 4,9, 5.6, 6.1, 4.9, 5.5, 5.8, 5.7,6.0,5.0, 6.2, 5.7, 4.8, 5.8, 5.6, 5.7, 6.0, 4.8, 5.7,5.9, 5.4, 5.2, 4.8, 5.4, 5.2, 5.2, 5.4, 5.7, 5.7

2/28/2019

13

Tabular representation of data

2/28/2019

14

A table is prepared to represent the summary of the data. The table that you want to create out of any raw data should depend on your research objective. Same data can be tabulated in various ways to answer the particular research question that you are trying to address. Further, ask yourself the following questions before you proceed to create any table. Tables based on Cardinal data?

Tables based on Nominal Data?

Here you are the one to construct class intervals, which will act as categories

Here you know your categories

A table for

A table for more than one variables?

one variable? This is rather simple! 2/28/2019

Think carefully how would you like to create subgroups of a variable! 15

Representing one variable in a table Table 1: Distribution of households according to ownership of agricultural land (Based on Data_1) Classes (in Classes (in hectare) acre)

Landless Marginal Small Semi medium Medium Large Total

0 <1 1-2 2-4 4-10 >10

0 <2.5 2.5-5 5-10 10-30 >30

No. of Midpoint households % of (xi) (fi) households

0 1.25 3.75 7.5 20 45

11 2 2 0 2 7

46% 8% 8% 0% 8% 29%

24

100%

Observe that there is a logic behind defining the classes in such a manner. [Note: In India, following categories of landholdings are generally used: Marginal: <1 ha; Small: 1.01–2 ha; Semi-medium: 2–4 ha; Medium: 4–10 ha; Large: >10 ha. However, to use these categories as your classes, you have to convert the landholding from acre to hectare (ha) and 1 ha =2.5 acre (approx)]

2/28/2019

17

Representing 2 variables in one table Table 2: Distribution of households according to their castes (mentioned as 'category') in various villages (Based on Data_1)

Caste Village

SC

ST

OBC

Gen

Total

Paschim Malipur

0

0

0

9

9

Sherpara

0

0

1

6

7

Madanpur

0

0

0

3

3

Gopalpur

3

0

0

1

4

Purushia

0

1

0

1

2

Total

3

1

1

20

25

2/28/2019

19

Table 3: Distribution of households according to caste, religion and monthly expenditure

Stub

Title

Caption

Caste

SC

ST

OBC

Gen

Total

Religion Expenditure in Rs.

H

M

T

H

M

T

H

M

T

H

M

T

H

M

T

<5000

2

0

2

0

0

0

0

1

1

5

5

10

7

6

13

5000-10000

1

0

1

0

0

0

0

0

0

4

2

6

5

2

7

10000-15000

0

0

0

1

0

1

0

0

0

1

1

2

2

1

3

>15000

0

0

0

0

0

0

0

0

0

0

1

1

0

1

1

Total

3

0

3

1

0

1

0

1

1

10

9

19

14

10

24

Observe that there is column that displays the total number of households under each category. H: Hindu, M: Muslim, T: Total. Since one data point is missing under the variable monthly expenditure, the total count will remain 24. Body of the Table 2/28/2019

22

Frequency Distribution

Table 1: Distribution of households according to ownership of agricultural land

Landless Marginal Small Semi-med Medium Large Total 2/28/2019

Classes (in acre) 0 0-2.5 2.5-5 5-10 10-30 30-60

Midpoint (xi) 0 1.25 3.75 7.5 20 45

No. of households (fi) 11 2 2 0 2 7 N=24

Cumulative Frequency (Fi)

Relative Frequency (fi/N) < type > type 11 24 0.46 0.08 13 13 0.08 15 11 0.00 15 9 0.08 17 9 0.30 24 7

Freq. Density (fi/class length) 0.8 0.8 0 0.1 0.23

1 24

Diagrammatic representation of data

2/28/2019

25

Figure 1: Distribution of households according to ownership of agricultural land

12

Bar/column diagram

11

Bar diagram

No. of households

10 8

7

6

Pie chart

4 2

2

2

2 0

0 Landless Marginal Small

Semi Medium medium

Large

Large 29% Landless 46% Medium 8%

2/28/2019

Small 8% Marginal 9%

26

Table 2: Distribution of households according to their castes (mentioned as 'category') in various villages

Caste Village

SC

ST

OBC

Gen

Total

Paschim Malipur

0

0

0

9

9

Sherpara

0

0

1

6

7

Madanpur

0

0

0

3

3

Gopalpur

3

0

0

1

4

Purushia

0

1

0

1

2

Total

3

1

1

20

25

2/28/2019

27

Figure 2: Distribution of households according to their castes in various villages No of households

10

Bar diagram

SC

8

ST

6

OBC

4

Gen

2 0 Paschim Malipur

Sherpara

Madanpur

Gopalpur

Purushia

Stacked bar diagrams

100%

10

90% 80%

8

70% 60%

6

50% 40%

4

30% 20%

2

10% 0% Paschim Malipur

2/28/2019

Sherpara Madanpur Gopalpur

Purushia

0 Paschim Sherpara Madanpur Gopalpur Purushia Malipur

28

Table 3: Distribution of households according to caste, religion and monthly expenditure Caste Religion

SC

ST

OBC

Gen

Total

H

M

T

H

M

T

H

M

T

H

M

T

H

M

T

<5000

2

0

2

0

0

0

0

1

1

5

5

10

7

6

13

5000-10000

1

0

1

0

0

0

0

0

0

4

2

6

5

2

7

10000-15000

0

0

0

1

0

1

0

0

0

1

1

2

2

1

3

>15000

0

0

0

0

0

0

0

0

0

0

1

1

0

1

1

Total

3

0

3

1

0

1

0

1

1

10

9

19

14

10

24

Expenditure

Observe that there is column that displays the total number of households under each category. H: Hindu, M: Muslim, T: Total. Since one data point is missing under the variable monthly expenditure, the total count will remain 24. 2/28/2019

29

Figure 3. Distribution of households according to caste, religion and monthly expenditure 12

No. of households

10 8 >15000

6

10000-15000 5000-10000

4

<5000

2 0 H

M SC

2/28/2019

T

H

M ST

T

H

M OBC

T

H

M Gen

30

• Histogram is another way of representation – isto-s – ‘mast’/ something set upright – gram-ma – something written/graphics – Histogram

Not really!! This is a column diagram. Also remember, the term `histogram' was coined by the statistician Karl Pearsonwhile talking about the geometry of statistics (1892). 2/28/2019

32

• Have a careful look at the monthly expenditure data • Identify the highest value and the lowest value • The highest is 18000 and the lowest is 2000 (in INR)

• So, consider the range 2000 INR – 18000 INR • Bin the range into a series of intervals (continuous but disjoint) and identify frequency corresponding to each range • Bins may contain less that the lowest value and more than the highest value

2/28/2019

33

Frequency Table Class Interval (expenditure in INR)

Midpoint (xi)

Frequency (fi)

Freq. Density

0-5000

2500

13

0.0026

5000-10000

7500

7

0.0014

10000-15000

12500

3

0.0006

15000-20000

17500

1

0.0002

24

0.0048

Total

2/28/2019

34

Proportion of households

0.0050

0.0040

0.0030

0.0020

Frequency curve 0.0010

0.0000 2500

7500

12500

17500

Monthly expenditure in INR 1. 2. 3. 4. 2/28/2019

Widths are proportional to classes Heights are proportional to frequency density The area of each bar represents the frequency Notice, histogram is appropriate even when the class intervals are unequal 35

Central tendency and dispersion

2/28/2019

36

Type of data

Measure of central tendency

Measure of dispersion

Cardinal

Mean

Standard deviation

Ordinal

Median

Quartile deviation

Nominal/ Attributes

Mode

Range

2/28/2019

37

Correlation: Chalk and Talk!

2/28/2019

38

Food for thought… 1. The household that spends INR 18000 has 14 family members, whereas the household that spends INR 7000 has only 3 family members!! How do you take this information into account? 2. A histogram can be drawn with unequal class intervals. Why? Try creating a Histogram based on the data on landholding 3. How do you calculate the appropriate central value and dispersion for the variables in the data set?

2/28/2019

39

Questions?

Cartoon curtsey: The Cartoon Guide to Statistics By Larry Gonick and Woollcott Smith 2/28/2019

40

Related Documents


More Documents from ""

Hs550_week 1-3.pdf
October 2019 21
Assignment 1.pdf
October 2019 16
Problem Set 2.pdf
October 2019 12