Understanding Data: Dr. Rohit Vishal Kumar

  • Uploaded by: api-3697538
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Understanding Data: Dr. Rohit Vishal Kumar as PDF for free.

More details

  • Words: 1,654
  • Pages: 25
UNDERSTANDING DATA Dr. Rohit Vishal Kumar Reader, Department of Marketing Xavier Institute of Social Service PO Box No 7, Purulia Road Ranchi – 834001, Jharkhand, India Email: [email protected]

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

1

What is Data? • Observations of a set of variables • Lowest level of abstraction from which information is derived • Each Discipline has evolved it’s own method of classification of data • Two Broad Classification of Data Based on Source – Primary Data: • Data Collected from Primary Source

– Secondary Data: • Data Collected From Secondary Source

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

2

Classification :: Statistics • Categorical Data – The Objects are grouped into categories based on some Qualitative Trait – The resultant data are merely labels or categories – Example: • Hair Color: Brown / Black / Red • Smoking Status: Favor / Neutral / Against

• Measurement Data – The Objects are “measured” on some Quantitative Trait – The resultant data is a set of numbers – Example: • Age of the Students • JEMAT Score • Number of Students Not Attending Class

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

3

Categorical Data • Nominal Data – A type of categorical data in which numbers act as a label without having any specific meaning – Example: • Male : • Female:

1 2

• Ordinal Data – A type of categorical data in which numbers act as an guide to the level of importance of the object – Example: • Mild • Moderate • Severe

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

4

Measurement Data • Discrete Data – – – –

Only Certain Values are Possible There are gaps between the possible value Are generated through the process of Counting Example: • Number of students in the class • Number of Employees Absent from Work

• Continuous Data – Any value within an interval is possible with a suitable measuring device – Theoretically, the number can be accurate to any desired number of decimal places – Are generated through the process of Measurement – Example: • Height in cm • Time to complete the assignment

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

5

Classification :: Scaling Theory • Nominal Data

ORDER

DISTANCE

ORIGIN

– A type of categorical data in which numbers act as a label without having any specific meaning – Example: • Male : • Female:

1 2

• Ordinal Data – A type of categorical data in which numbers act as an guide to the level of importance of the object – Example: • Mild • Moderate • Severe

ORDER (C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

DISTANCE

ORIGIN 6

Classification :: Scaling Theory • Interval Data

ORDER

DISTANCE

ORIGIN

– Quantitative Data but does not has any real zero point – Allows comparison within the scale but cannot compare outside the scale – Used in Social Research, but most researcher not clear about Interval scale – Example: • Definitely Will Buy / Probably Will Buy / May or May not Buy / Probably Will not Buy / Definitely Will not Buy

• Ratio Data – Quantitative Data but has real zero point – Allows conversion and preservation on the magnitude in another scale – Example: • Distance in Kms

ORDER (C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

DISTANCE

ORIGIN 7

Why understand Data? • The type of Analysis depends on the Type of data you have collected • General Guideline is a follows: – Nominal Data

Mode, Chi-Square

– Ordinal Data

+ Median / Percentiles

– Interval Data

+ Mean / SD / Correlation / Regression /

ANOVA – Ratio Scale

+ Geometric Mean / Harmonic Mean /

Coefficient of Variation / Logarithms (C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

8

Some Points to Remember • • • •

Tend to use Interval Scales Data need not be comparable with other studies Data has to make sense in your context Students fail to understand the importance of Data – Wrong Approach • “Data Collect Kore Niyechi… Ebar Ki Kori”

– Right Approach • “Amar Ki Data Dorkar? Kano Daokar? Kothay Pabo? Kibhabe Analyse Kore Uttor Pabo”

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

9

Descriptive Statistics :: A Quick Review

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

10

Measures of Central Tendency • Central tendency is “loosely” defined as the concept of location of the center of a distribution of data • Three basic measures – Arithmetic Mean – Median – Mode

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

11

Arithmetic Mean • Advantages: – – – – –

Easy to Compute Affected by every value in the set of observations Defined by rigid mathematical formulation It is relatively reliable It represents the “center of gravity” of the data

• Disadvantages: – Unduly affected by small and / or large values – Cannot be calculated for data with open ended class – Is a good measure only when the distribution is fairly symmetric

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

12

Median • Advantages – – – – – –

Refers to the “Middle Value” of the distribution It is a “positional measure” Useful in case of open ended class Not seriously affected by Extreme Values Most appropriate for dealing with Qualitative Rank Data Has a series of related positional measures like Quartiles, Deciles, Percentiles

• Disadvantages: – It does not take every value into consideration – It is not capable of algebraic treatment – It is erratic if the number of items are smalle

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

13

Mode • Advantages: – It is the most typical or representative value of a distribution – Not unduly affected by extreme values – It can be used to describe qualitative phenomenon

• Disadvantages: – Mode may not be there in a distribution or may be present more than once in a distribution – Not capable of algebraic treatment – It is not rigidly defined for calculation

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

14

Relation Between the 3 Measures • In moderately skewed distribution: Mode = 3 Median – 2 Mean

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

15

Measures of Dispersion • Dispersion is defined as the degree to which data tends to spread about a central value • Four Absolute & Relative Measures – – – –

Range Quartile Deviation Mean Absolute Deviation Standard Deviation

Coefficient of Range Coefficient of Quartile Deviation Coefficient of MAD Coefficient of Variation

• Range and QD are positional measures of dispersion • AD and SD are calculation measures of dispersion

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

16

Range • Range • Coefficient of Range:

• Advantages – Simplest to understand and compute

• Disadvantages: – Not based on each and every item in the data – Does not take into account the shape of distribution – Cannot be computed in case of open ended classes

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

17

Quartile Deviation • Inter Quartile Range (IQR)

• Quartile Deviation (Semi IQR)

• Coefficient of QD

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

18

Quartile Deviation • Advantages: – Can measure variation in open ended distributions – It is extremely useful in case of erratic or badly skewed data – It is not affected by extreme values

• Disadvantages: – Ignores 50% of the data – Is not capable of mathematical manipulation – Is not considered as a measure of dispersion: • Effectively shows the distance between two positional points

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

19

Mean Absolute Deviation • Mean Absolute Deviation (MAD) defined as:

• Coefficient of MAD defined as: = MAD / Median or MAD / Mean • Advantages: – Simple to understand and compute – Based on each and every item in the data – Less affected by extreme values than other measured

• Disadvantage: – It is not capable of mathematical treatment (C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

20

Standard Deviation • Defined as “Root Mean Squared Deviation from Mean”

• Coefficient of Variation

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

21

Standard Deviation • Advantages: – Best Measure of Dispersion – Possible to calculate the combined standard deviation of two or more groups – Chebycheff’s Theorem (1821-1894) • What so ever be the distribution at least 75% of the values will fall within +/- 2 sd from the mean of the distribution and at least 89% will fall within +/- 3 sd from the mean of the distribution

– Has relation with other measures: • QD = 0.667 SD • MD = 0.80 SD

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

22

Skewness • Refers to the asymmetry in the shape of the distribution

• Important to test skewness in data analysis as skewed data suggest that the assumption of normality is violated

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

23

Kurtosis • Kurtosis means “Bulginess” • Refers to the degree of flatness or peaked-ness in the region about the mode of the distribution: – Lepto-Kurtic : If the curve is more peaked than Normal Curve – Meso-Kurtic : If the curve is the same as the Normal Curve – Platy-Kurtic : If the curve is less peaked than Normal Curve

• The peakedness of Normal Curve is taken as 3 • Presence of Kurtosis does not violate normality • Important to check Kurtosis because it shows the distribution of data around the mode

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

24

What is Descriptive Statistics? • The following Needs to Be Reported: – – – – – – – – – – – –

Arithmetic Mean Median Mode Standard Deviation Variance Kurtosis Skewness Range Minimum Maximum Sum Count

(C) Rohit Vishal Kumar

Presented at WBUT 25-May-09

25

Related Documents

Vishal Kumar
October 2019 13
Vishal Kumar Sinha
May 2020 4
Rohit
May 2020 7
Rohit
October 2019 14