Section 2 - Getting Started

  • Uploaded by: api-19867504
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Section 2 - Getting Started as PDF for free.

More details

  • Words: 1,510
  • Pages: 24
Session 2 “Getting Started” Core Skills for Data Processing ORSC 2004 - Internal Training

1 1

Core Skill Training Session Six: “Data Analysis”

Objective At the end of the training program, participants should be able to 

Understand data layouts



Understand how tables will look like



Defining data structure for various formats of data



Understand coding conventions



Get an appreciation of basic elements

2

Various data formats 

Questionnaire data can be computerised in many ways



Market Research software mostly uses FLAT files



There are customised software available for capturing MR data



QINPUT, MERLIN, Surveycraft are some of the most popular ones

3

Single Card data Serial Number/ Respondent ID 1000290022

00061860200310041324 040800100000000000 1.3979167

R1

1000390022

00061860200310041359 040800100000000000 0.6460563

R2

1001210022

00061860200310041249 040800100000000000 0.8865789

R3

1013240022

00061867200310051800 040800100000000000 0.6759740

R4

1013250022

00061867200310051831 040800100000000000 0.8857447

R5

1013260022

00061867200310051842 040800100000000000 1.3810526

1013300022

00061867200310051857 040800100000000000 1.5300000

1015240022

00062321200310041216 040800100000000000 1.4328262

Record length Respondent ID is the unique ID for the record Number of lines in the file = Sample Size Maximum Length of record = 32,767 (Size of integer) 4

Multicard data R1

R2

00048011 01 04070917213204070917374232570237550 000480202837525750 111020744t242-345235849862468-2486 0004803 1 111-4 208050505050810 245248609824096 0004804001010 55334333333433453145555413155 646890 0004805 2115245444433353443442343435514334333 425924 00070011 01 040709173010040709175624 245982496 000700201395277173 231019074646464060 0007003 1 112-7 105080803050308 426246 0007004030707 33543553245533535255452355555553 0007005 21113123322&2133222122431232323212313

Each respondent will have more than 1 line of information called “CARD” In general the length of card is 99 characters Can also have more than 99 card length Unique identification in this data format is Respondent ID + Card ID Maximum Length of record = 32,767 (Size of integer). Maximum record Length in this case is sum of record lengths of all cards

5

Quantum data format 

Quantum can handle both single card/ multicard data formats



In both the formats, quantum allows something called multipunch



In multi-punch data format, each column is capable of holding 12 values – the individual constants, 0123456789-&.



Any combination of the above 12 codes (punches) can exist in a single column



The advantage of using this format is more data can be fit into the available maximum record length – 32,767 chars

6

Introducing Quantum – What does it do? 

Check and validate the data



Edit and correct the data



Produce different types of lists and reports of data



Produce new data files



Recode data and produce new variables



Generate tables



Perform Statistical Calculations

7

Underlying concepts Quantum consists of 2 phases or sessions

For each questionnaire: -Check and correct data -Modify/ Recode data

Count questionnaires Produce Tables Format tables

Edit Section Tabulation Section

8

Underlying concepts Edit section •Data examination •Data modification •Data correction

Tables section •Cross tabulation of data •Control statements to determine layout

9

Layout of a table Table title Project Heading X-break Base size

Base Title Side headings

Frequency Percentage

Mean score 10

Coding conventions A Quantum program is a file created using an editor – Text editor  The tables section consists of statement types Each

statement starts on a new line

Each

statement consists of parameters and options

A

statement may be up to 200 characters

 The standard Quantum separator is the semi-colon (;) 

Long statements may be continued on new lines with a + in the first position. In certain cases long statements may be continued with a ++ in the first position



Comments are denoted by /* at the start of the line. You may see Quantum programs that use C at the start a line for comments.

11

Coding conventions A Sample of Quantum Program

/* /* Here is a comment /*

tab q5 brk1;c=c115’1’;nz +dsp

12

Fundamentals and Terminology

13

Fundamentals Individual constants These are ASCII characters or multicodes which are any combination of the codes 1234567890-& or blank alone. They are enclosed in single quotes: ‘1’ ‘2’ ‘123’ ‘ ‘…. A slash (/) between two numbers denotes ‘through’ in the order &-01234567890-&. 

Punch codes are referenced in apostrophes. Punches are listed individually and range of punches is denoted by a / to represent through

Examples: ‘1’ Punch 1 ‘1/5’

;

‘123’

Punches 1 or 2 or 3

Punches 1 or 2 or 3 or 4 or 5;

‘ ‘ no punches (blank)



Order of punches is & - 0 1 2 3 4 5 6 7 8 9 0 - &



‘&/9’ is the same as ‘1/&’

14

Fundamentals Individual constants The – punch is sometimes referred as the 11th or X punch, and & is sometimes referred as 12th or Y or V punch. Each code represents one answer to a question. For example, ‘What is your favorite color?’ which has the response list: Red

:

1

Yellow :

2

Blue

:

3

Green

:

4

Black

:

5

White

:

6

coded into one column. If my favorite color is green, this will appear in the data file as a 4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that column. 15

Fundamentals Strings of Data Constants Strings are lists of single ASCII characters. They are enclosed in dollar signs ($). Strings are referenced in dollar signs Refer to more than one column of data Examples: $1234$ $ABC$ $

$

16

Fundamentals Numbers - Whole Numbers - Real Numbers Variables: Variables or arrays may be defined as being data, integer or real types. Names up to 10 chars. Example: int unit 1 real weight 10s whenever “s” is used varn is interpreted as var(n)

17

Variables/ column referencing 

Columns are referred by their actual position in the data. This means, if you open the data file in any editor and see the cursor position on which the data is highlighted, the column position refers to the cursor position



In the case of single card data file, the actual column position itself is directly used for referring to a column. For example, c12 refers to column 12 in a single card data file



In the case of milticard data file, the column should be referred in combination with the card number. The format of column referencing is “cXNN” if the number of cards are less than 9 and “cXXNN” if the number of cards are more than 9. Where X refers to the card number and NN refers to the column position. One digit column positions should be referred by preceding the column number with “0”. Example:

c108 refers to 1st card 8th column

c412 refers to 4th card 12th position c1009 refers to 10th card 9th position

18

Variables/ column referencing 

A series of columns may be considered as either string or numeric and is referenced as c(m,n) where m is the start column position and n is the end column position

Examples: c(12,15) refers to columns 12 to 15 in a single card data file c(106,110) refers to columns 6 to 10 of 1st card in a multicard data file

19

Describing Data Structure

20

Data Structure 

By default Quantum reads one record or a line from your data file at a time. Each record may be up to 100 columns long



Most Market Research surveys consist of multi-card records



Some surveys consist instead of long records with more than 100 columns of data



These data structure must be described on the struct statement



Format: struct;options



The “struct” statement must be the first statement in your program

21

Data Structure – contd.. Specifying Long records struct;reclen=n where n is the length of the record in columns the maximum length of a record is approximately 32,000 columns Specifying Multi-card Data Sets This is the most common form of struct statement struct;read=2;ser=c(m,n);crd=c(p,q) Where, read = 2 denotes a multi-card set; ser = defines the columns of the serial number; crd = defines columns of the card number Example: struct;read=2;ser=c(1,4);crd=c80

22

Data Structure – contd.. When a multi-card set is read, the cards are defined as follows: Card 1

Columns 101-200

Card 2

Columns 201-300

Card 3

Columns 301-400

Card 4

Columns 401-500

….. Card 10

Columns 1001-1100

By default a maximum of 9 cards are permitted in a set. Reading Multi-card data sets with 10 or more cards The option max=n is used to define the maximum number of cards in the set Example: struct;read=2;ser=c(1,5);crd=c(6,7); max=19

23

Data Structure – contd.. Checking the structure of multi-card data sets 

Quantum automatically checks for - Duplicate card types within serial number and adjacent duplicate serial numbers



It is not mandatory that all cards should be present for every respondent in a multicard data file



It is possible check that specific cards are present using req= Example: struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1,2

In this example each record must have a card 1 and 2 present. If either or both are missing the record will be rejected If you require a series of cards to be present specify the first and last separated by a slash struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1/5

24

Related Documents

Getting Started
November 2019 40
Getting Started
November 2019 34
Getting Started
November 2019 31
Getting Started
November 2019 27
Getting Started
June 2020 16