4-1
A Comparison of Primary & Secondary Data Table 4.1
Collection Collection Collection Collection
purpose process cost time
Primary Data
Secondary Data
For the problem at hand Very involved High Long
For other problems Rapid & easy Relatively low Short
4-2
Uses of Secondary Data
Identify the problem Better define the problem Develop an approach to the problem Formulate an appropriate research design (for example, by identifying the key variables) Answer certain research questions and test some hypotheses Interpret primary data more insightfully
4-3
A Classification of Secondary Data Fig. 4.1
Secondary Data
Internal
Ready to Use
Requires Further Processing
External
Published Materials
Computerized Databases
Syndicated Services
4-4
A Classification of Published Secondary Sources Fig. 4.2 Published Secondary Data
General Business Sources
Guides
Directories
Indexes
Government Sources
Statistical Data
Census Data
Other Government Publications
4-5
A Classification of Computerized Databases Fig. 4.3 Computerized Databases
Online
Bibliographic Databases
Numeric Databases
Internet
Full-Text Databases
Off-Line
Directory Databases
SpecialPurpose Databases
4-6
Syndicated Services: Consumers Fig. 4.4 cont.
Households / Consumers
Panels
Purchase
Volume Scanner Diary Scanner Diary Tracking Data Panels Panels with Cable TV
Surveys
Psychographic & Lifestyles
Media
Electronic scanner services
General
Advertising Evaluation
4-7
Syndicated Services: Institutions Fig. 4.4 cont.
Retailers
Institutions
Wholesalers
Industrial firms
Audits
Direct Inquiries
Clipping Services
Corporate Reports
4-8
A Classification of Marketing Research Data Fig. 5.1
Marketing Research Data
Secondary Data
Primary Data
Qualitative Data Descriptive Survey Data
Observational and Other Data
Quantitative Data Causal Experimental Data
4-9
Qualitative vs. Quantitative Research Table 5.1 Qualitative Research
Quantitative Research
Objective
To gain a qualitative understanding of the underlying reasons and motivations
To quantify the data and generalize the results from the sample to the population of interest
Sample
Small number of nonrepresentative cases
Large number of representative cases
Data Collection
Unstructured
Structured
Data Analysis
Non-statistical
Statistical
Outcome
Develop an initial understanding
Recommend a final course of action
4-10
A Classification of Qualitative Research Procedures Fig. 5.2 Qualitative Research Procedures
Direct (Non disguised)
Focus Groups
Association Techniques
Indirect (Disguised) Projective Techniques
Depth Interviews
Completion Techniques
Construction Techniques
Expressive Techniques
4-11
Definition of Projective Techniques
An unstructured, indirect form of questioning that encourages respondents to project their underlying motivations, beliefs, attitudes or feelings regarding the issues of concern. In projective techniques, respondents are asked to interpret the behavior of others. In interpreting the behavior of others, respondents indirectly project their own motivations, beliefs, attitudes, or feelings into the situation.
4-12
Word Association In word association, respondents are presented with a list of words, one at a time and asked to respond to each with the first word that comes to mind. The words of interest, called test words, are interspersed throughout the list which also contains some neutral, or filler words to disguise the purpose of the study. Responses are analyzed by calculating: (1) the frequency with which any word is given as a response; (2) the amount of time that elapses before a response is given; and (3) the number of respondents who do not respond at all to a test word within a reasonable period of time.
4-13
Completion Techniques In Sentence completion, respondents are given incomplete sentences and asked to complete them. Generally, they are asked to use the first word or phrase that comes to mind. A person who shops at Sears is ______________________ A person who receives a gift certificate good for Sak's Fifth Avenue would be __________________________________ J. C. Penney is most liked by _________________________ When I think of shopping in a department store, I ________ A variation of sentence completion is paragraph completion, in which the respondent completes a paragraph beginning with the stimulus phrase.
4-14
Completion Techniques In story completion, respondents are given part of a story – enough to direct attention to a particular topic but not to hint at the ending. They are required to give the conclusion in their own words.
4-15
Construction Techniques With a picture response, the respondents are asked to describe a series of pictures of ordinary as well as unusual events. The respondent's interpretation of the pictures gives indications of that individual's personality. In cartoon tests, cartoon characters are shown in a specific situation related to the problem. The respondents are asked to indicate what one cartoon character might say in response to the comments of another character. Cartoon tests are simpler to administer and analyze than picture response techniques.
4-16
A Cartoon Test Figure 5.4
Sears
Let’s see if we can pick up some house wares at Sears
4-17
Expressive Techniques In expressive techniques, respondents are presented with a verbal or visual situation and asked to relate the feelings and attitudes of other people to the situation. Role playing Respondents are asked to play the role or assume the behavior of someone else. Third-person technique The respondent is presented with a verbal or visual situation and the respondent is asked to relate the beliefs and attitudes of a third person rather than directly expressing personal beliefs and attitudes. This third person may be a friend, neighbor, colleague, or a “typical” person.
4-18
Advantages of Projective Techniques
They may elicit responses that subjects would be unwilling or unable to give if they knew the purpose of the study. Helpful when the issues to be addressed are personal, sensitive, or subject to strong social norms. Helpful when underlying motivations, beliefs, and attitudes are operating at a subconscious level.
4-19
A Classification of Survey Methods Fig. 6.1 Survey Methods
Telephone
Personal
In-Home
Traditional Telephone
Mall Intercept
Computer-Assisted Telephone Interviewing
Mail
Computer-Assisted Personal Interviewing Mail Interview
Electronic
E-mail
Mail Panel
Internet
Observation Methods
4-20
Structured versus Unstructured Observation
For structured observation, the researcher specifies in detail what is to be observed and how the measurements are to be recorded, e.g., an auditor performing inventory analysis in a store. In unstructured observation, the observer monitors all aspects of the phenomenon that seem relevant to the problem at hand, e.g., observing children playing with new toys.
Observation Methods
4-21
Disguised versus Undisguised Observation
In disguised observation, the respondents are unaware that they are being observed. Disguise may be accomplished by using oneway mirrors, hidden cameras, or inconspicuous mechanical devices. Observers may be disguised as shoppers or sales clerks. In undisguised observation, the respondents are aware that they are under observation.
Observation Methods
Natural versus Contrived Observation
Natural observation involves observing behavior as it takes places in the environment. For example, one could observe the behavior of respondents eating fast food in Burger King. In contrived observation, respondents' behavior is observed in an artificial environment, such as a test kitchen.
4-22
4-23
A Classification of Observation Methods Fig. 6.3
Classifying Observation Methods
Observation Methods
Personal Observation
Mechanical Observation
Audit
Content Analysis
Trace Analysis
4-24
Concept of Causality A statement such as "X causes Y " will have the following meaning to an ordinary person and to a scientist. ____________________________________________________ Scientific Meaning Ordinary Meaning ____________________________________________________ X is the only cause of Y. X is only one of a number of possible causes of Y.
X must always lead to Y (X is a deterministic cause of Y).
The occurrence of X makes the occurrence of Y more probable (X is a probabilistic cause of Y).
It is possible to prove that X is a cause of Y.
We can never prove that X is a cause of Y. At best, we can infer that X is a cause of Y.
4-25
Definitions and Concepts
Independent variables are variables or alternatives that are manipulated and whose effects are measured and compared, e.g., price levels. Test units are individuals, organizations, or other entities whose response to the independent variables or treatments is being examined, e.g., consumers or stores. Dependent variables are the variables which measure the effect of the independent variables on the test units, e.g., sales, profits, and market shares. Extraneous variables are all variables other than the independent variables that affect the response of the test units, e.g., store size, store location, and competitive effort.
4-26
Experimental Design An experimental design is a set of procedures specifying
the test units and how these units are to be divided into homogeneous subsamples, what independent variables or treatments are to be manipulated, what dependent variables are to be measured, and how the extraneous variables are to be controlled.
4-27
Validity in Experimentation
Internal validity refers to whether the manipulation of the independent variables or treatments actually caused the observed effects on the dependent variables. Control of extraneous variables is a necessary condition for establishing internal validity. External validity refers to whether the cause-and-effect relationships found in the experiment can be generalized. To what populations, settings, times, independent variables and dependent variables can the results be projected?
4-28
Controlling Extraneous Variables
Randomization refers to the random assignment of test units to experimental groups by using random numbers. Treatment conditions are also randomly assigned to experimental groups. Matching involves comparing test units on a set of key background variables before assigning them to the treatment conditions. Statistical control involves measuring the extraneous variables and adjusting for their effects through statistical analysis. Design control involves the use of experiments designed to control specific extraneous variables.
4-29
A Classification of Experimental Designs Figure 7.1 Experimental Designs
Pre-experimental
True Experimental
Quasi Experimental
One-Shot Case Study
Pretest-Posttest Control Group
Time Series
Randomized Blocks
One Group Pretest-Posttest
Posttest: Only Control Group
Multiple Time Series
Latin Square
Static Group
Solomon FourGroup
Statistical
Factorial Design
4-30
Factorial Design
Is used to measure the effects of two or more independent variables at various levels. A factorial design may also be conceptualized as a table. In a two-factor design, each level of one variable represents a row and each level of another variable represents a column.
4-31
Selecting a Test-Marketing Strategy
Very +ve Other Factors
Simulated Test Marketing
Very +ve Other Factors
Controlled Test Marketing Standard Test Marketing National Introduction Overall Marketing Strategy
-ve -ve -ve -ve
Need for Secrecy
Very +ve New Product Development Other Factors Research on Existing Products Research on other Elements
Stop and Reevaluate
Socio-Cultural Environment
Competition
4-32
Criteria for the Selection of Test Markets Test Markets should have the following qualities: 1) Be large enough to produce meaningful projections. They should contain at least 2% of the potential actual population. 2) Be representative demographically. 3) Be representative with respect to product consumption behavior. 4) Be representative with respect to media usage. 5) Be representative with respect to competition. 6) Be relatively isolated in terms of media and physical distribution. 7) Have normal historical development in the product class 8) Have marketing research and auditing services available 9) Not be over-tested
4-33
Measurement and Scaling Measurement means assigning numbers or other symbols to characteristics of objects according to certain prespecified rules. One-to-one correspondence between the numbers and the characteristics being measured. The rules for assigning numbers should be standardized and applied uniformly. Rules must not change over objects or time.
4-34
Measurement and Scaling Scaling involves creating a continuum upon which measured objects are located. Consider an attitude scale from 1 to 100. Each respondent is assigned a number from 1 to 100, with 1 = Extremely Unfavorable, and 100 = Extremely Favorable. Measurement is the actual assignment of a number from 1 to 100 to each respondent. Scaling is the process of placing the respondents on a continuum with respect to their attitude toward department stores.
4-35
Primary Scales of Measurement Scale Figure 8.1 Nominal Numbers
Finish
Assigned to Runners
Ordinal
Interval
Ratio
7
8
3
Finish
Rank Order of Winners
Performance Rating on a 0 to 10 Scale Time to Finish, in Seconds
Third place
Second place
First place
8.2
9.1
9.6
15.2
14.1
13.4
4-36
A Classification of Scaling Techniques Figure 8.2 Scaling Techniques
Noncomparative Scales
Comparative Scales
Paired Comparison
Rank Order
Constant Sum
Q-Sort and Other Procedures
Likert
Continuous Itemized Rating Scales Rating Scales
Semantic Differential
Stapel
4-37
A Comparison of Scaling Techniques
Comparative scales involve the direct comparison of stimulus objects. Comparative scale data must be interpreted in relative terms and have only ordinal or rank order properties. In noncomparative scales, each object is scaled independently of the others in the stimulus set. The resulting data are generally assumed to be interval or ratio scaled.
Preference for Toothpaste Brands Using Rank Order Scaling Figure 8.4 cont.
Form Brand
Rank Order
1. Crest
_________
2. Colgate
_________
3. Aim
_________
4. Gleem
_________
5. Macleans
_________
6. Ultra Brite
_________
7. Close Up
_________
8. Pepsodent
_________
9. Plus White
_________
10. Stripe
_________
4-38
Importance of Bathing Soap Attributes Using a Constant Sum Scale
4-39
Figure 8.5 cont.
Form
Average Responses of Three Segments
Attribute 1. Mildness 2. Lather 3. Shrinkage 4. Price 5. Fragrance 6. Packaging 7. Moisturizing 8. Cleaning Power Sum
Segment I 8 2 3 53 9 7 5 13 100
Segment II 2 4 9 17 0 5 3 60 100
Segment III 4 17 7 9 19 9 20 15 100
4-40
Noncomparative Scaling Techniques
Respondents evaluate only one object at a time, and for this reason noncomparative scales are often referred to as monadic scales. Noncomparative techniques consist of continuous and itemized rating scales.
4-41
Likert Scale The Likert scale requires the respondents to indicate a degree of agreement or disagreement with each of a series of statements about the stimulus objects. Strongly disagree
Disagree
Neither Agree agree nor disagree
Strongly agree
1. Sears sells high quality merchandise.
1
2X
3
4
5
2. Sears has poor in-store service.
1
2X
3
4
5
3. I like to shop at Sears.
1
2
3X
4
5
The analysis can be conducted on an item-by-item basis (profile analysis), or a total (summated) score can be calculated. When arriving at a total score, the categories assigned to the negative statements by the respondents should be scored by reversing the scale.
4-42
Semantic Differential Scale The semantic differential is a seven-point rating scale with end points associated with bipolar labels that have semantic meaning. SEARS IS: Powerful --:--:--:--:-X-:--:--: Weak Unreliable --:--:--:--:--:-X-:--: Reliable Modern --:--:--:--:--:--:-X-: Old-fashioned
The negative adjective or phrase sometimes appears at the left side of the scale and sometimes at the right. This controls the tendency of some respondents, particularly those with very positive or very negative attitudes, to mark the right- or left-hand sides without reading the labels. Individual items on a semantic differential scale may be scored on either a -3 to +3 or a 1 to 7 scale.
A Semantic Differential Scale for Measuring SelfConcepts, Person Concepts, and Product Concepts 1) Rugged
:---:---:---:---:---:---:---: Delicate
2) Excitable
:---:---:---:---:---:---:---: Calm
3) Uncomfortable
:---:---:---:---:---:---:---: Comfortable
4) Dominating
:---:---:---:---:---:---:---: Submissive
5) Thrifty
:---:---:---:---:---:---:---: Indulgent
6) Pleasant
:---:---:---:---:---:---:---: Unpleasant
7) Contemporary
:---:---:---:---:---:---:---: Obsolete
8) Organized
:---:---:---:---:---:---:---: Unorganized
9) Rational
:---:---:---:---:---:---:---: Emotional
10) Youthful
:---:---:---:---:---:---:---: Mature
11) Formal
:---:---:---:---:---:---:---: Informal
12) Orthodox
:---:---:---:---:---:---:---: Liberal
13) Complex
:---:---:---:---:---:---:---: Simple
14) Colorless
:---:---:---:---:---:---:---: Colorful
15) Modest
:---:---:---:---:---:---:---: Vain
4-43
4-44
Stapel Scale The Stapel scale is a unipolar rating scale with ten categories numbered from -5 to +5, without a neutral point (zero). This scale is usually presented vertically. SEARS +5 +4 +3 +2 +1 HIGH QUALITY -1 -2 -3 -4X -5
+5 +4 +3 +2X +1 POOR SERVICE -1 -2 -3 -4 -5
The data obtained by using a Stapel scale can be analyzed in the same way as semantic differential data.
4-45
Some Unique Rating Scale Configurations Figure 9.3 Thermometer Scale Instructions: Please indicate how much you like McDonald’s hamburgers by coloring in the thermometer. Start at the bottom and color up to the temperature level that best indicates how strong your preference is.
Form:
Like very much
100 75 50 25 0
Dislike very much
Smiling Face Scale Instructions: Please point to the face that shows how much you like the Barbie Doll. If you do not like the Barbie Doll at all, you would point to Face 1. If you liked it very much, you would point to Face 5.
Form:
1
2
3
4
5
4-46
Validity
Construct validity addresses the question of what construct or characteristic the scale is, in fact, measuring. Construct validity includes convergent, discriminant, and nomological validity. Convergent validity is the extent to which the scale correlates positively with other measures of the same construct. Discriminant validity is the extent to which a measure does not correlate with other constructs from which it is supposed to differ. Nomological validity is the extent to which the scale correlates in theoretically predicted ways with measures of different but related constructs.
4-47
Questionnaire Definition
A questionnaire is a formalized set of questions for obtaining information from respondents.
4-48
Questionnaire Design Process Fig. 10.1 Specify the Information Needed Specify the Type of Interviewing Method Determine the Content of Individual Questions Design the Question to Overcome the Respondent’s Inability and Unwillingness to Answer Decide the Question Structure Determine the Question Wording Arrange the Questions in Proper Order Identify the Form and Layout Reproduce the Questionnaire Eliminate Bugs by Pre-testing
Choosing Question Structure
4-49
Unstructured Questions
Unstructured questions are open-ended questions that respondents answer in their own words.
Do you intend to buy a new car within the next six months? __________________________________
Choosing Question Structure
4-50
Structured Questions
Structured questions specify the set of response alternatives and the response format. A structured question may be multiple-choice, dichotomous, or a scale.
Choosing Question Structure
4-51
Multiple-Choice Questions
In multiple-choice questions, the researcher provides a choice of answers and respondents are asked to select one or more of the alternatives given. Do you intend to buy a new car within the next six months? ____ Definitely will not buy ____ Probably will not buy ____ Undecided ____ Probably will buy ____ Definitely will buy ____ Other (please specify)
Choosing Question Structure
4-52
Dichotomous Questions
A dichotomous question has only two response alternatives: yes or no, agree or disagree, and so on. Often, the two alternatives of interest are supplemented by a neutral alternative, such as “no opinion,” “don't know,” “both,” or “none.” Do you intend to buy a new car within the next six months? _____ Yes _____ No _____ Don't know
Choosing Question Wording Use Ordinary Words
“Do you think the distribution of soft drinks is adequate?” (Incorrect) “Do you think soft drinks are readily available when you want to buy them?” (Correct)
4-53
Choosing Question Wording Use Unambiguous Words
In a typical month, how often do you shop in department stores? _____ Never _____ Occasionally _____ Sometimes _____ Often _____ Regularly (Incorrect) In a typical month, how often do you shop in department stores? _____ Less than once _____ 1 or 2 times _____ 3 or 4 times _____ More than 4 times (Correct)
4-54
4-55
Flow Chart for Questionnaire Design Fig. 10.2 Introduction Ownership of Store, Bank, and Other Charge Cards Purchased Products in a Specific Department Store during the Last Two Months Yes
How was Payment made? Credit
Cash Other
No Ever Purchased in a Department Store? Yes
No Store Charge Card
Bank Charge Card
Other Charge Card Intentions to Use Store, Bank, and other Charge Cards
4-56
Pretesting Pretesting refers to the testing of the questionnaire on a small sample of respondents to identify and eliminate potential problems.
A questionnaire should not be used in the field survey without adequate pretesting. All aspects of the questionnaire should be tested, including question content, wording, sequence, form and layout, question difficulty, and instructions. The respondents for the pretest and for the actual survey should be drawn from the same population. Pretests are best done by personal interviews, even if the actual survey is to be conducted by mail, telephone, or electronic means, because interviewers can observe respondents' reactions and attitudes.
4-57
Observational Forms Department Store Project Who: Purchasers, browsers, males, females, parents with children, or children alone. What: Products/brands considered, products/brands purchased, size, price of package inspected, or influence of children or other family members. When: Day, hour, date of observation. Where: Inside the store, checkout counter, or type of department within the store. Why: Influence of price, brand name, package size, promotion, or family members on the purchase. Way: Personal observer disguised as sales clerk, undisguised personal observer, hidden camera, or obtrusive mechanical device.
4-58
Questionnaire Design Checklist Table 10.1 Step 1. Specify The Information Needed Step 2. Type of Interviewing Method Step 3. Individual Question Content Step 4. Overcome Inability and Unwillingness to Answer Step 5. Choose Question Structure Step 6. Choose Question Wording Step 7. Determine the Order of Questions Step 8. Form and Layout Step 9. Reproduce the Questionnaire Step 10. Pretest
4-59
Sample vs. Census Table 11.1
Type of Study
Conditions Favoring the Use of Sample Census
1. Budget
Small
Large
2. Time available
Short
Long
3. Population size
Large
Small
4. Variance in the characteristic
Small
Large
5. Cost of sampling errors
Low
High
6. Cost of nonsampling errors
High
Low
7. Nature of measurement
Destructive
Nondestructive
8. Attention to individual cases
Yes
No
4-60
The Sampling Design Process Fig. 11.1
Define the Population Determine the Sampling Frame Select Sampling Technique(s) Determine the Sample Size Execute the Sampling Process
4-61
Define the Target Population The target population is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population should be defined in terms of elements, sampling units, extent, and time.
An element is the object about which or from which the information is desired, e.g., the respondent. A sampling unit is an element, or a unit containing the element, that is available for selection at some stage of the sampling process. Extent refers to the geographical boundaries. Time is the time period under consideration.
Sample Sizes Used in Marketing Research Studies
4-62
Table 11.2 Type of Study
Minimum Size Typical Range
Problem identification research (e.g. market potential) Problem-solving research (e.g. pricing)
500
1,000-2,500
200
300-500
Product tests
200
300-500
Test marketing studies
200
300-500
TV, radio, or print advertising (per commercial or ad tested) Test-market audits
150
200-300
10 stores
10-20 stores
Focus groups
2 groups
4-12 groups
4-63
Classification of Sampling Techniques Fig. 11.2
Sampling Techniques
Nonprobability Sampling Techniques
Convenience Sampling
Judgmental Sampling
Simple Random Sampling
Systematic Sampling
Probability Sampling Techniques
Quota Sampling
Stratified Sampling
Snowball Sampling
Cluster Sampling
Other Sampling Techniques
4-64
Data Preparation Process Fig. 14.1
Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Code Transcribe Clean Data Statistically Adjust the Data Select Data Analysis Strategy
4-65
Selecting a Data Analysis Strategy Fig. 14.5 Earlier Steps (1, 2, & 3) of the Marketing Research Process Known Characteristics of the Data Properties of Statistical Techniques Background and Philosophy of the Researcher Data Analysis Strategy
4-66
A Classification of Univariate Techniques Fig. 14.6
Univariate Techniques
Non-numeric Data
Metric Data One Sample * t test * Z test
Two or More Samples
Independent * TwoGroup test * Z test * One-Way ANOVA
One Sample * Frequency * Chi-Square * K-S * Runs * Binomial
Two or More Samples
Related * Paired t test
Independent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA
Related * Sign * Wilcoxon * McNemar * Chi-Square
4-67
A Classification of Multivariate Techniques Fig. 14.7
Multivariate Techniques
Dependence Technique One Dependent Variable * CrossTabulation * Analysis of Variance and Covariance * Multiple Regression * Conjoint Analysis
More Than One Dependent Variable * Multivariate Analysis of Variance and Covariance * Canonical Correlation * Multiple Discriminant Analysis
Interdependence Technique Variable Interdependence * Factor Analysis
Interobject Similarity * Cluster Analysis * Multidimensional Scaling
4-68
Frequency Distribution
In a frequency distribution, one variable is considered at a time. A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable.
Statistics Associated with Frequency Distribution
4-69
Measures of Location
The mean, or average value, is the most commonly used measure of central tendency. The mean, X ,is given by n
X = Σ X i /n i=1
Where, Xi = Observed values of the variable X n = Number of observations (sample size)
The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories.
Statistics Associated with Frequency Distribution
4-70
Measures of Location
The median of a sample is the middle value when the data are arranged in ascending or descending order. If the number of data points is even, the median is usually estimated as the midpoint between the two middle values – by adding the two middle values and dividing their sum by 2. The median is the 50th percentile.
Statistics Associated with Frequency Distribution
4-71
Measures of Variability
The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample. Range = Xlargest –
Xsmallest.
The interquartile range is the difference between the 75th and 25th percentile. For a set of data points arranged in order of magnitude, the pth percentile is the value that has p% of the data points below it and (100 - p)% above it.
Statistics Associated with Frequency Distribution
4-72
Measures of Variability
The variance is the mean squared deviation from the mean. The variance can never be negative. The standard deviation is the square root of the variance. n (Xi - X)2 sx = i =1 n - 1
Σ
The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability.
CV = s x/X
Statistics Associated with Frequency Distribution
4-73
Measures of Shape
Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other. Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution.
4-74
Skewness of a Distribution Figure 15.2
Symmetric Distribution
Skewed Distribution Mean Median Mode (a) Mean Median Mode (b)
4-75
Steps Involved in Hypothesis Testing Fig. 15.3
Formulate H0 and H1 Select Appropriate Test Choose Level of Significance Collect Data and Calculate Test Statistic
Determine Probability Associated with Test Statistic
Determine Critical Value of Test Statistic TSCR
Compare with Level of Significance, α
Determine if TSCR falls into (Non) Rejection Region
Reject or Do not Reject H0 Draw Marketing Research Conclusion
4-76
A Broad Classification of Hypothesis Tests Figure 15.6 Hypothesis Tests
Tests of Differences
Tests of Association
Distributions
Means
Proportions
Median/ Rankings
4-77
Cross-Tabulation
While a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values, e.g., Table 15.3.
4-78
Gender and Internet Usage Table 15.3 Gender Internet Usage
Male
Female
Row Total
Light (1)
5
10
15
Heavy (2)
10
5
15
Column Total
15
15
4-79
Internet Usage by Gender Table 15.4
Gender Internet Usage
Male
Female
Light
33.3%
66.7%
Heavy
66.7%
33.3%
Column total
100%
100%
4-80
Gender by Internet Usage Table 15.5
Internet Usage Gender
Light
Heavy
Total
Male
33.3%
66.7%
100.0%
Female
66.7%
33.3%
100.0%
Introduction of a Third Variable in CrossTabulation Fig. 15.7
Original Two Variables
Some Association between the Two Variables
No Association between the Two Variables
Introduce a Third Variable
Introduce a Third Variable
Refined Association between the Two Variables
4-81
No Association between the Two Variables
No Change in the Initial Pattern
Some Association between the Two Variables
4-82
Purchase of Fashion Clothing by Marital Status Table 15.6
Purchase of Fashion Clothing
Current Marital Status Married
Unmarried
High
31%
52%
Low
69%
48%
Column
100%
100%
700
300
Number of respondents
4-83
Purchase of Fashion Clothing by Marital Status Table 15.7 Pur chase of Fashion Clothing
Sex Male Marr ied
Female
High
35%
Not Mar r ied 40%
Mar r ied 25%
Not Mar r ied 60%
Low
65%
60%
75%
40%
Column totals Number of cases
100%
100%
100%
100%
400
120
300
180
Eating Frequently in Fast-Food Restaurants by Family Size
4-84
Table 15.12
Eat Frequently in FastFood Restaurants
Family Size Small
Large
Yes
65%
65%
No
35%
35%
Column totals
100%
100%
500
500
Number of cases
Eating Frequently in Fast Food-Restaurants by Family Size & Income Table 15.13
Income Eat Frequently in FastFood Restaurants
Low
Family size Small Large Yes 65% 65% No 35% 35% Column totals 100% 100% Number of respondents 250 250
High Family size Small Large 65% 65% 35% 35% 100% 100% 250 250
4-85
4-86
Chi-square Distribution Figure 15.8
Do Not Reject H0
Reject H0
Critical Value
χ2
Statistics Associated with Cross-Tabulation
4-87
Chi-Square
The chi-square statistic ( χ 2 ) is used to test the statistical significance of the observed association in a cross-tabulation. The expected frequency for each cell can be calculated by using a simple formula:
n n r fe = n c where
nr nc n
= total number in the row = total number in the column = total sample size
Statistics Associated with Cross-Tabulation
4-88
Chi-Square
For the data in Table 15.3, the expected frequencies for the cells going from left to right and from top to bottom, are: 15 X 15 = 7.50 30
15 X 15 = 7.50 30
15 X 15 = 7.50 30
15 X 15 = 7.50 30
Then the value of χ 2 is calculated as follows: χ2 =
Σ all cells
(f o - f e) 2 fe
Statistics Associated with Cross-Tabulation Chi-Square
2 χ For the data in Table 15.3, the value of is
calculated as: = (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2 7.5 7.5 7.5 7.5 =0.833 + 0.833 + 0.833+ 0.833 = 3.333
4-89
Statistics Associated with Cross-Tabulation
4-90
Lambda Coefficient
Asymmetric lambda measures the percentage improvement in predicting the value of the dependent variable, given the value of the independent variable. Lambda also varies between 0 and 1. A value of 0 means no improvement in prediction. A value of 1 indicates that the prediction can be made without error. This happens when each independent variable category is associated with a single category of the dependent variable. Asymmetric lambda is computed for each of the variables (treating it as the dependent variable). A symmetric lambda is also computed, which is a kind of average of the two asymmetric values. The symmetric lambda does not make an assumption about which variable is dependent. It measures the overall improvement when prediction is done in both directions.
A Classification of Hypothesis Testing Procedures for Examining Differences Fig. 15.9
Hypothesis Tests
Non-parametric Tests (Nonmetric Tests)
Parametric Tests (Metric Tests) One Sample * t test * Z test
Two or More Samples
Independent Samples * Two-Group t test * Z test
4-91
Paired Samples * Paired t test
One Sample * * * *
Chi-Square K-S Runs Binomial
Two or More Samples
Independent Samples * Chi-Square * Mann-Whitney * Median * K-S
* * * *
Paired Samples Sign Wilcoxon McNemar Chi-Square
4-92
Non-Parametric Tests Nonparametric tests are used when the independent variables are nonmetric. Like parametric tests, nonparametric tests are available for testing variables from one sample, two independent samples, or two related samples.
Non-Parametric Tests
4-93
One Sample
Sometimes the researcher wants to test whether the observations for a particular variable could reasonably have come from a particular distribution, such as the normal, uniform, or Poisson distribution. The Kolmogorov-Smirnov (K-S) one-sample test is one such goodness-of-fit test. The K-S compares the cumulative distribution function for a variable with a specified distribution. Ai denotes the cumulative relative frequency for each category of the theoretical (assumed) distribution, and Oi the comparable value of the sample frequency. The K-S test is based on the maximum value of the absolute difference between Ai and Oi. The test statistic is
K = Max A i - Oi
Non-Parametric Tests
4-94
One Sample
The chi-square test can also be performed on a single variable from one sample. In this context, the chi-square serves as a goodness-of-fit test. The runs test is a test of randomness for the dichotomous variables. This test is conducted by determining whether the order or sequence in which observations are obtained is random. The binomial test is also a goodness-of-fit test for dichotomous variables. It tests the goodness of fit of the observed number of observations in each category to the number expected under a specified binomial distribution.
Non-Parametric Tests
4-95
Two Independent Samples
When the difference in the location of two populations is to be compared based on observations from two independent samples, and the variable is measured on an ordinal scale, the Mann-Whitney U test can be used. In the Mann-Whitney U test, the two samples are combined and the cases are ranked in order of increasing size. The test statistic, U, is computed as the number of times a score from sample or group 1 precedes a score from group 2. If the samples are from the same population, the distribution of scores from the two groups in the rank list should be random. An extreme value of U would indicate a nonrandom pattern, pointing to the inequality of the two groups. For samples of less than 30, the exact significance level for U is computed. For larger samples, U is transformed into a normally distributed z statistic. This z can be corrected for ties within ranks.
4-96
SPSS Windows
The main program in SPSS is FREQUENCIES. It produces a table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the associated statistics. If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that can be calculated.
4-97
SPSS Windows To select these procedures click: Analyze>Descriptive Statistics>Frequencies Analyze>Descriptive Statistics>Descriptives Analyze>Descriptive Statistics>Explore The major cross-tabulation program is CROSSTABS. This program will display the cross-classification tables and provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the strength of the association that have been discussed. To select these procedures click: Analyze>Descriptive Statistics>Crosstabs
4-98
SPSS Windows The major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conduct t tests on one sample or independent or paired samples. To select these procedures using SPSS for Windows click: Analyze>Compare Means>Means … Analyze>Compare Means>One-Sample T Test … Analyze>Compare Means>IndependentSamples T Test … Analyze>Compare Means>Paired-Samples T Test …
4-99
SPSS Windows The nonparametric tests discussed in this chapter can be conducted using NONPARAMETRIC TESTS. To select these procedures using SPSS for Windows click: Analyze>Nonparametric Tests>Chi-Square … Analyze>Nonparametric Tests>Binomial … Analyze>Nonparametric Tests>Runs … Analyze>Nonparametric Tests>1-Sample K-S … Analyze>Nonparametric Tests>2 Independent Samples … Analyze>Nonparametric Tests>2 Related Samples …
4-100
Product Moment Correlation
The product moment correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. It is an index used to determine whether a linear or straight-line relationship exists between X and Y. As it was originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate correlation, or merely the correlation coefficient.
4-101
Product Moment Correlation
r varies between -1.0 and +1.0. The correlation coefficient between two variables will be the same regardless of their underlying units of measurement.
Statistics Associated with Bivariate Regression Analysis
4-102
Regression coefficient. The estimated parameter b is usually referred to as the nonstandardized regression coefficient. Scattergram. A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or observations. Standard error of estimate. This statistic, SEE, is the standard deviation of the actual Y values from the predicted Y values. Standard error. The standard deviation of b, SEb, is called the standard error.
Statistics Associated with Bivariate Regression Analysis
Standardized regression coefficient. Also termed the beta coefficient or beta weight, this is the slope obtained by the regression of Y on X when the data are standardized. Sum of squared errors. The distances of all the points from the regression line are squared and added together to arrive at the sum of squared errors, which is a measure of total error, Σe 2 j .
t statistic. A t statistic with n - 2 degrees of
freedom can be used to test the null hypothesis that no linear relationship exists between X and Y, or H0: β 1 = 0, where t = b SEb
4-103
Conducting Bivariate Regression Analysis Plot the Scatter Diagram
A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or observations. The most commonly used technique for fitting a straight line to a scattergram is the least-squares procedure.
In fitting the line, the least-squares procedure minimizes the sum of squared errors, Σe 2 j .
4-104
4-105
Conducting Bivariate Regression Analysis Fig. 17.2 Plot the Scatter Diagram Formulate the General Model Estimate the Parameters Estimate Standardized Regression Coefficients Test for Significance Determine the Strength and Significance of Association Check Prediction Accuracy Examine the Residuals Cross-Validate the Model
4-106
Multiple Regression The general form of the multiple regression model is as follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k Xk + e which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk As before, the coefficient a represents the intercept, but the b's are now the partial regression coefficients.
4-107
Multicollinearity
Multicollinearity arises when intercorrelations among the predictors are very high. Multicollinearity can result in several problems, including: The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high. The magnitudes as well as the signs of the partial regression coefficients may change from sample to sample. It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable. Predictor variables may be incorrectly included or removed in stepwise regression.
4-108
SPSS Windows The CORRELATE program computes Pearson product moment correlations and partial correlations with significance levels. Univariate statistics, covariance, and cross-product deviations may also be requested. Significance levels are included in the output. To select these procedures using SPSS for Windows click: Analyze>Correlate>Bivariate … Analyze>Correlate>Partial … Scatterplots can be obtained by clicking: Graphs>Scatter …>Simple>Define REGRESSION calculates bivariate and multiple regression equations, associated statistics, and plots. It allows for an easy examination of residuals. This procedure can be run by clicking: Analyze>Regression Linear …
Similarities and Differences between ANOVA, Regression, and Discriminant Analysis Table 18.1 ANOVA Similarities Number of dependent variables Number of independent variables Differences Nature of the dependent variables Nature of the independent variables
REGRESSION
DISCRIMINANT ANALYSIS
One
One
One
Multiple
Multiple
Multiple
Metric
Metric
Categorical
Categorical
Metric
Metric
4-109
4-110
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature. The objectives of discriminant analysis are as follows: Development of discriminant functions, or linear combinations of the predictor or independent variables, which will best discriminate between the categories of the criterion or dependent variable (groups). Examination of whether significant differences exist among the groups, in terms of the predictor variables. Determination of which predictor variables contribute to most of the intergroup differences. Classification of cases to one of the groups based on the values of the predictor variables. Evaluation of the accuracy of classification.
4-111
Statistics Associated with Discriminant Analysis
Canonical correlation. Canonical correlation measures the extent of association between the discriminant scores and the groups. It is a measure of association between the single discriminant function and the set of dummy variables that define the group membership. Centroid. The centroid is the mean values for the discriminant scores for a particular group. There are as many centroids as there are groups, as there is one for each group. The means for a group on all the functions are the group centroids. Classification matrix. Sometimes also called confusion or prediction matrix, the classification matrix contains the number of correctly classified and misclassified cases.
4-112
Statistics Associated with Discriminant Analysis
Discriminant function coefficients. The discriminant function coefficients (unstandardized) are the multipliers of variables, when the variables are in the original units of measurement. Discriminant scores. The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores. Eigenvalue. For each discriminant function, the Eigenvalue is the ratio of between-group to withingroup sums of squares. Large Eigenvalues imply superior functions.
4-113
Conducting Discriminant Analysis Fig. 18.1 Formulate the Problem
Estimate the Discriminant Function Coefficients
Determine the Significance of the Discriminant Function
Interpret the Results
Assess Validity of Discriminant Analysis
4-114
SPSS Windows The DISCRIMINANT program performs both twogroup and multiple discriminant analysis. To select this procedure using SPSS for Windows click: Analyze>Classify>Discriminant …
4-115
Factor Analysis
Factor analysis is a general name denoting a class of procedures primarily used for data reduction and summarization. Factor analysis is an interdependence technique in that an entire set of interdependent relationships is examined without making the distinction between dependent and independent variables. Factor analysis is used in the following circumstances: To identify underlying dimensions, or factors, that explain the correlations among a set of variables. To identify a new, smaller, set of uncorrelated variables to replace the original set of correlated variables in subsequent multivariate analysis (regression or discriminant analysis). To identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis.
4-116
Factor Analysis Model
It is possible to select weights or factor score coefficients so that the first factor explains the largest portion of the total variance. Then a second set of weights can be selected, so that the second factor accounts for most of the residual variance, subject to being uncorrelated with the first factor. This same principle could be applied to selecting additional weights for the additional factors.
4-117
Conducting Factor Analysis Fig 19.1
Problem formulation Construction of the Correlation Matrix Method of Factor Analysis Determination of Number of Factors Rotation of Factors Interpretation of Factors Selection of Surrogate Variables
Calculation of Factor Scores Determination of Model Fit
Conducting Factor Analysis
4-118
Determine the Number of Factors
A Priori Determination. Sometimes, because of
prior knowledge, the researcher knows how many factors to expect and thus can specify the number of factors to be extracted beforehand.
Determination Based on Eigenvalues. In this approach, only factors with Eigenvalues greater than 1.0 are retained. An Eigenvalue represents the amount of variance associated with the factor. Hence, only factors with a variance greater than 1.0 are included. Factors with variance less than 1.0 are no better than a single variable, since, due to standardization, each variable has a variance of 1.0. If the number of variables is less than 20, this approach will result in a conservative number of factors.
4-119
SPSS Windows To select this procedures using SPSS for Windows click: Analyze>Data Reduction>Factor …
4-120
Cluster Analysis
Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters. Cluster analysis is also called classification analysis, or numerical taxonomy. Both cluster analysis and discriminant analysis are concerned with classification. However, discriminant analysis requires prior knowledge of the cluster or group membership for each object or case included, to develop the classification rule. In contrast, in cluster analysis there is no a priori information about the group or cluster membership for any of the objects. Groups or clusters are suggested by the data, not defined a priori.
4-121
An Ideal Clustering Situation
Variable 1
Fig. 20.1
Variable 2
4-122
Conducting Cluster Analysis Fig. 20.3 Formulate the Problem Select a Distance Measure Select a Clustering Procedure Decide on the Number of Clusters Interpret and Profile Clusters Assess the Validity of Clustering
4-123
A Classification of Clustering Procedures Clustering Procedures
Fig. 20.4
Nonhierarchical
Hierarchical Agglomerative
Divisive Sequential Threshold
Linkage Methods
Variance Methods
Parallel Threshold Centroid Methods
Ward’s Method Single
Complete
Average
Optimizing Partitioning
Conducting Cluster Analysis
4-124
Select a Clustering Procedure – Hierarchical
Hierarchical clustering is characterized by the development of a hierarchy or tree-like structure. Hierarchical methods can be agglomerative or divisive. Agglomerative clustering starts with each object in a separate cluster. Clusters are formed by grouping objects into bigger and bigger clusters. This process is continued until all objects are members of a single cluster. Divisive clustering starts with all the objects grouped in a single cluster. Clusters are divided or split until each object is in a separate cluster. Agglomerative methods are commonly used in marketing research. They consist of linkage methods, error sums of squares or variance methods, and centroid methods.
Conducting Cluster Analysis
4-125
Select a Clustering Procedure – Linkage Method
The single linkage method is based on minimum distance, or the nearest neighbor rule. At every stage, the distance between two clusters is the distance between their two closest points (see Figure 20.5). The complete linkage method is similar to single linkage, except that it is based on the maximum distance or the furthest neighbor approach. In complete linkage, the distance between two clusters is calculated as the distance between their two furthest points. The average linkage method works similarly. However, in this method, the distance between two clusters is defined as the average of the distances between all pairs of objects, where one member of the pair is from each of the clusters (Figure 20.5).
4-126
Linkage Methods of Clustering Single Linkage
Fig. 20.5
Minimum Distance Cluster 1
Complete Linkage
Cluster 2
Maximum Distance
Cluster 1
Average Linkage
Cluster 2
Average Distance Cluster 1
Cluster 2
4-127
Other Agglomerative Clustering Methods Fig. 20.6
Ward’s Procedure
Centroid Method
4-128
SPSS Windows To select this procedures using SPSS for Windows click: Analyze>Classify>Hierarchical Cluster … Analyze>Classify>K-Means Cluster …