STX 1110 INTRODUCTION TO QUANTITATIVE METHODS LECTURE 2 COLLECTING DATA 2 1
CONTENTS 1. Main Stages in a Statistical Investigation 2. Data - Overview - Definitions 3. Survey (Data Collection Method) - Interviews - Postal Questionnaire 4. Survey Guidelines - Questionnaire Design - Pilot Survey - Errors in Surveying - Validity and Precision
2
MAIN STAGES IN A STATISTICAL INVESTIGATION Pose a question Collect relevant data Summarise and present the data Analyse and interpret the results 3
OVERVIEW OF DATA Data Attributes(categorical)
Types
Nominal*
Variables(Numerical)
Ordinal* Discrete
Continuou s
Data Source
Primary Data
Secondary Data
Example of Collection Method
Survey
Document extraction
Interview
Postal Questionnaire
* Not under the syllabus of this Module.
Published Statistics
Annual Report 4
DEFINITIONS Data (raw materials of statistics) is simply a scientific term for facts, figures, information and measurement, both numerical and non-numerical. Data collected are stored as variables and consist of qualitative and quantitative in nature. Attribute (Categorical or Qualitative) is something an object has either got or not got. E.g. gender (male or female); blood group (A; AB; B; O); T-Shirt size (S; M; L; XL). Qualitative data describes characteristics that cannot be measured. 5
DEFINITIONS (Cont’d) Numerical (Quantitative) is something can be measured or counted. E.g. children (number); height (in cm); weight (in kg). Discrete variables are represented by whole numbers only (mainly counts). E.g. number of children. Continuous variables may take on any value and are typically measured rather than counted. E.g. distance; height. 6
DEFINITIONS (Cont’d) Primary data are data collected especially for the purpose of whatever survey is being conducted. Secondary data are data which have already been collected elsewhere, for some other purpose, but which can be used or adapted for the survey being conducted. E.g. financial figures extracted from published annual report.
7
SURVEY (DATA COLLECTION METHOD) In the absence of suitable secondary data, primary data will be generated through survey or experiment. Survey Conducted through
Observation (Observe people’s behaviours)
Questionnaire (Ask people questions)
Interviews
Postal Questionnaire 8
INTERVIEWS Face to Face (Personal) Interviews
Telephone Interviews
Advantages
High response rate More reliable in general
Rapid response Cheaper cost
Disadvantages
Time consuming High cost Interviewer bias Respondents might not talk freely
Some people do not have telephone Higher refusal rate Respondents might not talk freely 9
Type
POSTAL QUESTIONNAIRE Advantages: • Cheap and easy to organise • No interviewer bias • Respondents might express more freely Disadvantages: • Low response rate • No clarification on respondents’ doubt is possible Reasonable expectation on response rate for a survey is generally 20%. 10
QUESTIONNAIRE DESIGN Questions should be: • as short as possible • simple, easy and clear (unambiguous) • avoiding technical jargon • following a logical sequence • not offensive or leading • not involving calculations or tests of memory • avoiding open questions where possible – should have answer categories • relevant to the survey 11
PILOT SURVEY Pre-testing the questionnaire i.e. to trial it on a few respondents before using it to collect the required data. Revise the questionnaire if any problems discover in the pilot survey. It may save lots of time and cost later. The final version of questionnaire will gather the required data. 12
ERRORS IN SURVEYING • Sampling Error - Arises when the sample selected is not representative of the population. • Response Error - Occurs when respondents are unable to response (may be couldn’t understand the questions) or answer incorrectly. • Non Response Error - Occurs when respondents refuse to take part in the survey.
13
VALIDITY AND PRECISION Data Quality: • Validity - The data obtained in the survey should be relevant, i.e. related to the objectives of the survey. • Precision - The data obtained in the survey should be reliable and accurate. - Precision of recording data can affect calculations and cause rounding errors. 14
Some Survey Questions • Do you often go to pubs and restaurants? • Do you like Klinko coffee? • How old are you? • Are you angry about the government’s current plans to deal with housing? • How much money do you have? • How often do your parents visit the doctor? • How did you travel to work today? 15
STX 1110 INTRODUCTION TO QUANTITATIVE METHODS LECTURE 2 SUMMARISING AND PRESENTING DATA 1 16
CONTENTS • • • •
Ways/Methods of Presenting Data Format of Tables, Charts and Graphs Use Percentages to Compare Counts Interpretation of Tables, Charts and Graphs • Advantages and Disadvantages of Each Method of Presenting Data 17
WAYS/METHODS OF PRESENTING DATA • • • • • • • • • • •
Frequency Table or Frequency Distribution Cross Tabulation / Contingency Table Pie Chart Bar Chart Pareto Chart Pictogram Group Frequency Distribution (to be discussed in Week 3) Histogram (to be discussed in Week 3) Frequency Polygons (to be discussed in Week 3) Line Graph (to be discussed in Week 3) Stem and Leaf Display (to be discussed in Week 4) 18
FREQUENCY TABLE / FREQUENCY DISTRIBUTION A tabular summary of a set of data showing the frequency (or number) of data items in each category. Gender Male Female Total
Gender for a Workforce Frequency Relative Frequency 12 0.8 3 15
0.2 1.0
% 80 20 100
For discussion purpose 19
FREQUENCY TABLE / FREQUENCY DISTRIBUTION (Cont’d) Gender for a Workforce
20
FREQUENCY TABLE / FREQUENCY DISTRIBUTION (Cont’d) Exercise Construct a frequency table for number of children in a family based on the following data obtained from 23 families: 0 1 2 0 3 0 1 1 0 2 3 2 1 1 2 4 3 2 2 2 1 0 3 21
FREQUENCY TABLE / FREQUENCY DISTRIBUTION (Cont’d)
22
CROSS TABULATION / CONTINGENCY TABLE A table showing data of two variables simultaneously, which reflects the relationship of the two tabulated variables. Workforce by Gender and Marital Status Marital Status Gender Total Male Female Single 1 1 2 Married 10 2 12 Widowed 1 0 1 Total 12 3 15 For discussion purpose
23
CROSS TABULATION / CONTINGENCY TABLE (Cont’d) A cross tabulation can be summarised by calculating percentage of the row or column totals. If one variable (the explanatory variable) is believed to influence the other (the response variable), then one normally takes percentages of the totals for the explanatory variable.
24
CROSS TABULATION / CONTINGENCY TABLE (Cont’d) Workforce by Gender and Marital Status Marital Status Gender Total Male Female Single 8% 33% 13% Married 84% 67% 80% Widowed 8% 0% 7% Total 100% 100% 100% Single Married Widowed Total
50% 84% 100% 80%
50% 16% 0% 20%
100% 100% 100% 100%
To identify “Explanatory Variable” and “Response Variable” for each table.
25
CROSS TABULATION / CONTINGENCY TABLE (Cont’d) Example 1 Production shift against type of defect for a furniture manufacturing process
26
CROSS TABULATION / CONTINGENCY TABLE (Cont’d) Comparison of type of defect by shift
27
CROSS TABULATION / CONTINGENCY TABLE (Cont’d) Example 2 Cross tabulation of the quality of a meal by price
28
CROSS TABULATION / CONTINGENCY TABLE (Cont’d) Comparison of the quality of a meal by price
29
PIE CHART A pie chart is used to show pictorially the relative sizes of component elements of a total. Production Costs of Two Factories Factory A
Factory B
Admin 5%
Admin 10%
Materials 35%
Materials 20%
Overheads 20%
Overheads 45%
Labour 15%
For discussion purpose
Labour 50%
30
PIE CHART(Cont’d) Pie charts are very good for comparing the relative sizes of elements of a total. Disadvantages: •Actual numbers or % associated with each category need to presented on the diagram. •They are not a very good presentation method if there are too many different categories. •The impression they can give is easily distorted, by presenting a 3 dimensional pie chart for example. 31
BAR CHART A chart in which quantities are shown in the form of bars. 3 main types: • Simple bar chart • Component bar chart, including Percentage component bar chart • Multiple/Compound bar chart
32
BAR CHART (Cont’d) Simple bar chart is a chart consisting of one or more bars, in which the length of each bar indicates the magnitude of the corresponding data items. Number of Computers Sold by Each Company 14
Frequency
12 10 8 6 4 2 0 Apple
For discussion purpose
Compaq
Gateway
IBM
Packard Bell
33
BAR CHART (Cont’d) Component bar chart is a bar chart that gives a breakdown of each total into its components. Category of Beds in Each Hospital Percentage component bar chart
Component bar chart 250
100%
200
80%
Psychiatric Medical Surgical Maternity
150 100 50
Psychiatric Medical Surgical Maternity
60% 40% 20% 0%
0
Foothills Foothills
General
Southern
Heathview
General
Southern
Heathview
St Johns
St Johns
For discussion purpose 34
BAR CHART (Cont’d) Multiple/Compound bar chart is a bar chart in which two or more separate bars are used to present sub-divisions of data. Analysis of Marital Status by Gender 60
50
40
Count
30
Marital status Unmarried
20
Married Female
For discussion purpose
Male
Gender
35
PARETO CHART Essentially a bar chart in which the categories are arranged according to frequency with the tallest bar is at the left. Number of Computers Sold by Each Company 14 12 Frequency
10 8 6 4 2 0
For discussion purpose
Apple
Compaq
Packard Bell
IBM
Gateway
36
PICTOGRAM A form of visual presentation in which data is represented by picture/s. Number of Chairs Sold by ABC Limited 2001
= 5000 chairs
2000 1999 1998 1997
For discussion purpose 37
PICTOGRAM (Cont’d) •Very elementary form of visual representation. •Can be informative and more effective than other methods of presenting data to the general public. •Not accurate forms of presentation. •Provide lots of scope for confusion or misleading interpretations of the data.
38