Marketing Research 4

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Marketing Research 4 as PDF for free.

More details

  • Words: 7,935
  • Pages: 128
4-1

A Comparison of Primary & Secondary Data Table 4.1

Collection Collection Collection Collection

purpose process cost time

Primary Data

Secondary Data

For the problem at hand Very involved High Long

For other problems Rapid & easy Relatively low Short

4-2

Uses of Secondary Data „ „ „ „

„

„

Identify the problem Better define the problem Develop an approach to the problem Formulate an appropriate research design (for example, by identifying the key variables) Answer certain research questions and test some hypotheses Interpret primary data more insightfully

4-3

A Classification of Secondary Data Fig. 4.1

Secondary Data

Internal

Ready to Use

Requires Further Processing

External

Published Materials

Computerized Databases

Syndicated Services

4-4

A Classification of Published Secondary Sources Fig. 4.2 Published Secondary Data

General Business Sources

Guides

Directories

Indexes

Government Sources

Statistical Data

Census Data

Other Government Publications

4-5

A Classification of Computerized Databases Fig. 4.3 Computerized Databases

Online

Bibliographic Databases

Numeric Databases

Internet

Full-Text Databases

Off-Line

Directory Databases

SpecialPurpose Databases

4-6

Syndicated Services: Consumers Fig. 4.4 cont.

Households / Consumers

Panels

Purchase

Volume Scanner Diary Scanner Diary Tracking Data Panels Panels with Cable TV

Surveys

Psychographic & Lifestyles

Media

Electronic scanner services

General

Advertising Evaluation

4-7

Syndicated Services: Institutions Fig. 4.4 cont.

Retailers

Institutions

Wholesalers

Industrial firms

Audits

Direct Inquiries

Clipping Services

Corporate Reports

4-8

A Classification of Marketing Research Data Fig. 5.1

Marketing Research Data

Secondary Data

Primary Data

Qualitative Data Descriptive Survey Data

Observational and Other Data

Quantitative Data Causal Experimental Data

4-9

Qualitative vs. Quantitative Research Table 5.1 Qualitative Research

Quantitative Research

Objective

To gain a qualitative understanding of the underlying reasons and motivations

To quantify the data and generalize the results from the sample to the population of interest

Sample

Small number of nonrepresentative cases

Large number of representative cases

Data Collection

Unstructured

Structured

Data Analysis

Non-statistical

Statistical

Outcome

Develop an initial understanding

Recommend a final course of action

4-10

A Classification of Qualitative Research Procedures Fig. 5.2 Qualitative Research Procedures

Direct (Non disguised)

Focus Groups

Association Techniques

Indirect (Disguised) Projective Techniques

Depth Interviews

Completion Techniques

Construction Techniques

Expressive Techniques

4-11

Definition of Projective Techniques „

„

„

An unstructured, indirect form of questioning that encourages respondents to project their underlying motivations, beliefs, attitudes or feelings regarding the issues of concern. In projective techniques, respondents are asked to interpret the behavior of others. In interpreting the behavior of others, respondents indirectly project their own motivations, beliefs, attitudes, or feelings into the situation.

4-12

Word Association In word association, respondents are presented with a list of words, one at a time and asked to respond to each with the first word that comes to mind. The words of interest, called test words, are interspersed throughout the list which also contains some neutral, or filler words to disguise the purpose of the study. Responses are analyzed by calculating: (1) the frequency with which any word is given as a response; (2) the amount of time that elapses before a response is given; and (3) the number of respondents who do not respond at all to a test word within a reasonable period of time.

4-13

Completion Techniques In Sentence completion, respondents are given incomplete sentences and asked to complete them. Generally, they are asked to use the first word or phrase that comes to mind. A person who shops at Sears is ______________________ A person who receives a gift certificate good for Sak's Fifth Avenue would be __________________________________ J. C. Penney is most liked by _________________________ When I think of shopping in a department store, I ________ A variation of sentence completion is paragraph completion, in which the respondent completes a paragraph beginning with the stimulus phrase.

4-14

Completion Techniques In story completion, respondents are given part of a story – enough to direct attention to a particular topic but not to hint at the ending. They are required to give the conclusion in their own words.

4-15

Construction Techniques With a picture response, the respondents are asked to describe a series of pictures of ordinary as well as unusual events. The respondent's interpretation of the pictures gives indications of that individual's personality. In cartoon tests, cartoon characters are shown in a specific situation related to the problem. The respondents are asked to indicate what one cartoon character might say in response to the comments of another character. Cartoon tests are simpler to administer and analyze than picture response techniques.

4-16

A Cartoon Test Figure 5.4

Sears

Let’s see if we can pick up some house wares at Sears

4-17

Expressive Techniques In expressive techniques, respondents are presented with a verbal or visual situation and asked to relate the feelings and attitudes of other people to the situation. Role playing Respondents are asked to play the role or assume the behavior of someone else. Third-person technique The respondent is presented with a verbal or visual situation and the respondent is asked to relate the beliefs and attitudes of a third person rather than directly expressing personal beliefs and attitudes. This third person may be a friend, neighbor, colleague, or a “typical” person.

4-18

Advantages of Projective Techniques „

„

„

They may elicit responses that subjects would be unwilling or unable to give if they knew the purpose of the study. Helpful when the issues to be addressed are personal, sensitive, or subject to strong social norms. Helpful when underlying motivations, beliefs, and attitudes are operating at a subconscious level.

4-19

A Classification of Survey Methods Fig. 6.1 Survey Methods

Telephone

Personal

In-Home

Traditional Telephone

Mall Intercept

Computer-Assisted Telephone Interviewing

Mail

Computer-Assisted Personal Interviewing Mail Interview

Electronic

E-mail

Mail Panel

Internet

Observation Methods

4-20

Structured versus Unstructured Observation

„

„

For structured observation, the researcher specifies in detail what is to be observed and how the measurements are to be recorded, e.g., an auditor performing inventory analysis in a store. In unstructured observation, the observer monitors all aspects of the phenomenon that seem relevant to the problem at hand, e.g., observing children playing with new toys.

Observation Methods

4-21

Disguised versus Undisguised Observation

„

„

In disguised observation, the respondents are unaware that they are being observed. Disguise may be accomplished by using oneway mirrors, hidden cameras, or inconspicuous mechanical devices. Observers may be disguised as shoppers or sales clerks. In undisguised observation, the respondents are aware that they are under observation.

Observation Methods

Natural versus Contrived Observation

„

„

Natural observation involves observing behavior as it takes places in the environment. For example, one could observe the behavior of respondents eating fast food in Burger King. In contrived observation, respondents' behavior is observed in an artificial environment, such as a test kitchen.

4-22

4-23

A Classification of Observation Methods Fig. 6.3

Classifying Observation Methods

Observation Methods

Personal Observation

Mechanical Observation

Audit

Content Analysis

Trace Analysis

4-24

Concept of Causality A statement such as "X causes Y " will have the following meaning to an ordinary person and to a scientist. ____________________________________________________ Scientific Meaning Ordinary Meaning ____________________________________________________ X is the only cause of Y. X is only one of a number of possible causes of Y.

X must always lead to Y (X is a deterministic cause of Y).

The occurrence of X makes the occurrence of Y more probable (X is a probabilistic cause of Y).

It is possible to prove that X is a cause of Y.

We can never prove that X is a cause of Y. At best, we can infer that X is a cause of Y.

4-25

Definitions and Concepts „

„

„

„

Independent variables are variables or alternatives that are manipulated and whose effects are measured and compared, e.g., price levels. Test units are individuals, organizations, or other entities whose response to the independent variables or treatments is being examined, e.g., consumers or stores. Dependent variables are the variables which measure the effect of the independent variables on the test units, e.g., sales, profits, and market shares. Extraneous variables are all variables other than the independent variables that affect the response of the test units, e.g., store size, store location, and competitive effort.

4-26

Experimental Design An experimental design is a set of procedures specifying „

„

„

„

the test units and how these units are to be divided into homogeneous subsamples, what independent variables or treatments are to be manipulated, what dependent variables are to be measured, and how the extraneous variables are to be controlled.

4-27

Validity in Experimentation „

„

Internal validity refers to whether the manipulation of the independent variables or treatments actually caused the observed effects on the dependent variables. Control of extraneous variables is a necessary condition for establishing internal validity. External validity refers to whether the cause-and-effect relationships found in the experiment can be generalized. To what populations, settings, times, independent variables and dependent variables can the results be projected?

4-28

Controlling Extraneous Variables „

„

„

„

Randomization refers to the random assignment of test units to experimental groups by using random numbers. Treatment conditions are also randomly assigned to experimental groups. Matching involves comparing test units on a set of key background variables before assigning them to the treatment conditions. Statistical control involves measuring the extraneous variables and adjusting for their effects through statistical analysis. Design control involves the use of experiments designed to control specific extraneous variables.

4-29

A Classification of Experimental Designs Figure 7.1 Experimental Designs

Pre-experimental

True Experimental

Quasi Experimental

One-Shot Case Study

Pretest-Posttest Control Group

Time Series

Randomized Blocks

One Group Pretest-Posttest

Posttest: Only Control Group

Multiple Time Series

Latin Square

Static Group

Solomon FourGroup

Statistical

Factorial Design

4-30

Factorial Design „

„

„

Is used to measure the effects of two or more independent variables at various levels. A factorial design may also be conceptualized as a table. In a two-factor design, each level of one variable represents a row and each level of another variable represents a column.

4-31

Selecting a Test-Marketing Strategy

Very +ve Other Factors

Simulated Test Marketing

Very +ve Other Factors

Controlled Test Marketing Standard Test Marketing National Introduction Overall Marketing Strategy

-ve -ve -ve -ve

Need for Secrecy

Very +ve New Product Development Other Factors Research on Existing Products Research on other Elements

Stop and Reevaluate

Socio-Cultural Environment

Competition

4-32

Criteria for the Selection of Test Markets Test Markets should have the following qualities: 1) Be large enough to produce meaningful projections. They should contain at least 2% of the potential actual population. 2) Be representative demographically. 3) Be representative with respect to product consumption behavior. 4) Be representative with respect to media usage. 5) Be representative with respect to competition. 6) Be relatively isolated in terms of media and physical distribution. 7) Have normal historical development in the product class 8) Have marketing research and auditing services available 9) Not be over-tested

4-33

Measurement and Scaling Measurement means assigning numbers or other symbols to characteristics of objects according to certain prespecified rules. „ One-to-one correspondence between the numbers and the characteristics being measured. „ The rules for assigning numbers should be standardized and applied uniformly. „ Rules must not change over objects or time.

4-34

Measurement and Scaling Scaling involves creating a continuum upon which measured objects are located. Consider an attitude scale from 1 to 100. Each respondent is assigned a number from 1 to 100, with 1 = Extremely Unfavorable, and 100 = Extremely Favorable. Measurement is the actual assignment of a number from 1 to 100 to each respondent. Scaling is the process of placing the respondents on a continuum with respect to their attitude toward department stores.

4-35

Primary Scales of Measurement Scale Figure 8.1 Nominal Numbers

Finish

Assigned to Runners

Ordinal

Interval

Ratio

7

8

3

Finish

Rank Order of Winners

Performance Rating on a 0 to 10 Scale Time to Finish, in Seconds

Third place

Second place

First place

8.2

9.1

9.6

15.2

14.1

13.4

4-36

A Classification of Scaling Techniques Figure 8.2 Scaling Techniques

Noncomparative Scales

Comparative Scales

Paired Comparison

Rank Order

Constant Sum

Q-Sort and Other Procedures

Likert

Continuous Itemized Rating Scales Rating Scales

Semantic Differential

Stapel

4-37

A Comparison of Scaling Techniques „

„

Comparative scales involve the direct comparison of stimulus objects. Comparative scale data must be interpreted in relative terms and have only ordinal or rank order properties. In noncomparative scales, each object is scaled independently of the others in the stimulus set. The resulting data are generally assumed to be interval or ratio scaled.

Preference for Toothpaste Brands Using Rank Order Scaling Figure 8.4 cont.

Form Brand

Rank Order

1. Crest

_________

2. Colgate

_________

3. Aim

_________

4. Gleem

_________

5. Macleans

_________

6. Ultra Brite

_________

7. Close Up

_________

8. Pepsodent

_________

9. Plus White

_________

10. Stripe

_________

4-38

Importance of Bathing Soap Attributes Using a Constant Sum Scale

4-39

Figure 8.5 cont.

Form

Average Responses of Three Segments

Attribute 1. Mildness 2. Lather 3. Shrinkage 4. Price 5. Fragrance 6. Packaging 7. Moisturizing 8. Cleaning Power Sum

Segment I 8 2 3 53 9 7 5 13 100

Segment II 2 4 9 17 0 5 3 60 100

Segment III 4 17 7 9 19 9 20 15 100

4-40

Noncomparative Scaling Techniques „

„

Respondents evaluate only one object at a time, and for this reason noncomparative scales are often referred to as monadic scales. Noncomparative techniques consist of continuous and itemized rating scales.

4-41

Likert Scale The Likert scale requires the respondents to indicate a degree of agreement or disagreement with each of a series of statements about the stimulus objects. Strongly disagree

Disagree

Neither Agree agree nor disagree

Strongly agree

1. Sears sells high quality merchandise.

1

2X

3

4

5

2. Sears has poor in-store service.

1

2X

3

4

5

3. I like to shop at Sears.

1

2

3X

4

5

„

„

The analysis can be conducted on an item-by-item basis (profile analysis), or a total (summated) score can be calculated. When arriving at a total score, the categories assigned to the negative statements by the respondents should be scored by reversing the scale.

4-42

Semantic Differential Scale The semantic differential is a seven-point rating scale with end points associated with bipolar labels that have semantic meaning. SEARS IS: Powerful --:--:--:--:-X-:--:--: Weak Unreliable --:--:--:--:--:-X-:--: Reliable Modern --:--:--:--:--:--:-X-: Old-fashioned „

„

„

The negative adjective or phrase sometimes appears at the left side of the scale and sometimes at the right. This controls the tendency of some respondents, particularly those with very positive or very negative attitudes, to mark the right- or left-hand sides without reading the labels. Individual items on a semantic differential scale may be scored on either a -3 to +3 or a 1 to 7 scale.

A Semantic Differential Scale for Measuring SelfConcepts, Person Concepts, and Product Concepts 1) Rugged

:---:---:---:---:---:---:---: Delicate

2) Excitable

:---:---:---:---:---:---:---: Calm

3) Uncomfortable

:---:---:---:---:---:---:---: Comfortable

4) Dominating

:---:---:---:---:---:---:---: Submissive

5) Thrifty

:---:---:---:---:---:---:---: Indulgent

6) Pleasant

:---:---:---:---:---:---:---: Unpleasant

7) Contemporary

:---:---:---:---:---:---:---: Obsolete

8) Organized

:---:---:---:---:---:---:---: Unorganized

9) Rational

:---:---:---:---:---:---:---: Emotional

10) Youthful

:---:---:---:---:---:---:---: Mature

11) Formal

:---:---:---:---:---:---:---: Informal

12) Orthodox

:---:---:---:---:---:---:---: Liberal

13) Complex

:---:---:---:---:---:---:---: Simple

14) Colorless

:---:---:---:---:---:---:---: Colorful

15) Modest

:---:---:---:---:---:---:---: Vain

4-43

4-44

Stapel Scale The Stapel scale is a unipolar rating scale with ten categories numbered from -5 to +5, without a neutral point (zero). This scale is usually presented vertically. SEARS +5 +4 +3 +2 +1 HIGH QUALITY -1 -2 -3 -4X -5

+5 +4 +3 +2X +1 POOR SERVICE -1 -2 -3 -4 -5

The data obtained by using a Stapel scale can be analyzed in the same way as semantic differential data.

4-45

Some Unique Rating Scale Configurations Figure 9.3 Thermometer Scale Instructions: Please indicate how much you like McDonald’s hamburgers by coloring in the thermometer. Start at the bottom and color up to the temperature level that best indicates how strong your preference is.

Form:

Like very much

100 75 50 25 0

Dislike very much

Smiling Face Scale Instructions: Please point to the face that shows how much you like the Barbie Doll. If you do not like the Barbie Doll at all, you would point to Face 1. If you liked it very much, you would point to Face 5.

Form:

1

2

3

4

5

4-46

Validity „

„

„

„

Construct validity addresses the question of what construct or characteristic the scale is, in fact, measuring. Construct validity includes convergent, discriminant, and nomological validity. Convergent validity is the extent to which the scale correlates positively with other measures of the same construct. Discriminant validity is the extent to which a measure does not correlate with other constructs from which it is supposed to differ. Nomological validity is the extent to which the scale correlates in theoretically predicted ways with measures of different but related constructs.

4-47

Questionnaire Definition „

A questionnaire is a formalized set of questions for obtaining information from respondents.

4-48

Questionnaire Design Process Fig. 10.1 Specify the Information Needed Specify the Type of Interviewing Method Determine the Content of Individual Questions Design the Question to Overcome the Respondent’s Inability and Unwillingness to Answer Decide the Question Structure Determine the Question Wording Arrange the Questions in Proper Order Identify the Form and Layout Reproduce the Questionnaire Eliminate Bugs by Pre-testing

Choosing Question Structure

4-49

Unstructured Questions

„

Unstructured questions are open-ended questions that respondents answer in their own words.

Do you intend to buy a new car within the next six months? __________________________________

Choosing Question Structure

4-50

Structured Questions „

Structured questions specify the set of response alternatives and the response format. A structured question may be multiple-choice, dichotomous, or a scale.

Choosing Question Structure

4-51

Multiple-Choice Questions „

In multiple-choice questions, the researcher provides a choice of answers and respondents are asked to select one or more of the alternatives given. Do you intend to buy a new car within the next six months? ____ Definitely will not buy ____ Probably will not buy ____ Undecided ____ Probably will buy ____ Definitely will buy ____ Other (please specify)

Choosing Question Structure

4-52

Dichotomous Questions „

„

A dichotomous question has only two response alternatives: yes or no, agree or disagree, and so on. Often, the two alternatives of interest are supplemented by a neutral alternative, such as “no opinion,” “don't know,” “both,” or “none.” Do you intend to buy a new car within the next six months? _____ Yes _____ No _____ Don't know

Choosing Question Wording Use Ordinary Words

“Do you think the distribution of soft drinks is adequate?” (Incorrect) “Do you think soft drinks are readily available when you want to buy them?” (Correct)

4-53

Choosing Question Wording Use Unambiguous Words

In a typical month, how often do you shop in department stores? _____ Never _____ Occasionally _____ Sometimes _____ Often _____ Regularly (Incorrect) In a typical month, how often do you shop in department stores? _____ Less than once _____ 1 or 2 times _____ 3 or 4 times _____ More than 4 times (Correct)

4-54

4-55

Flow Chart for Questionnaire Design Fig. 10.2 Introduction Ownership of Store, Bank, and Other Charge Cards Purchased Products in a Specific Department Store during the Last Two Months Yes

How was Payment made? Credit

Cash Other

No Ever Purchased in a Department Store? Yes

No Store Charge Card

Bank Charge Card

Other Charge Card Intentions to Use Store, Bank, and other Charge Cards

4-56

Pretesting Pretesting refers to the testing of the questionnaire on a small sample of respondents to identify and eliminate potential problems. „

„

„

„

A questionnaire should not be used in the field survey without adequate pretesting. All aspects of the questionnaire should be tested, including question content, wording, sequence, form and layout, question difficulty, and instructions. The respondents for the pretest and for the actual survey should be drawn from the same population. Pretests are best done by personal interviews, even if the actual survey is to be conducted by mail, telephone, or electronic means, because interviewers can observe respondents' reactions and attitudes.

4-57

Observational Forms Department Store Project „ Who: Purchasers, browsers, males, females, parents with children, or children alone. „ What: Products/brands considered, products/brands purchased, size, price of package inspected, or influence of children or other family members. „ When: Day, hour, date of observation. „ Where: Inside the store, checkout counter, or type of department within the store. „ Why: Influence of price, brand name, package size, promotion, or family members on the purchase. „ Way: Personal observer disguised as sales clerk, undisguised personal observer, hidden camera, or obtrusive mechanical device.

4-58

Questionnaire Design Checklist Table 10.1 Step 1. Specify The Information Needed Step 2. Type of Interviewing Method Step 3. Individual Question Content Step 4. Overcome Inability and Unwillingness to Answer Step 5. Choose Question Structure Step 6. Choose Question Wording Step 7. Determine the Order of Questions Step 8. Form and Layout Step 9. Reproduce the Questionnaire Step 10. Pretest

4-59

Sample vs. Census Table 11.1

Type of Study

Conditions Favoring the Use of Sample Census

1. Budget

Small

Large

2. Time available

Short

Long

3. Population size

Large

Small

4. Variance in the characteristic

Small

Large

5. Cost of sampling errors

Low

High

6. Cost of nonsampling errors

High

Low

7. Nature of measurement

Destructive

Nondestructive

8. Attention to individual cases

Yes

No

4-60

The Sampling Design Process Fig. 11.1

Define the Population Determine the Sampling Frame Select Sampling Technique(s) Determine the Sample Size Execute the Sampling Process

4-61

Define the Target Population The target population is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population should be defined in terms of elements, sampling units, extent, and time. „

„

„ „

An element is the object about which or from which the information is desired, e.g., the respondent. A sampling unit is an element, or a unit containing the element, that is available for selection at some stage of the sampling process. Extent refers to the geographical boundaries. Time is the time period under consideration.

Sample Sizes Used in Marketing Research Studies

4-62

Table 11.2 Type of Study

Minimum Size Typical Range

Problem identification research (e.g. market potential) Problem-solving research (e.g. pricing)

500

1,000-2,500

200

300-500

Product tests

200

300-500

Test marketing studies

200

300-500

TV, radio, or print advertising (per commercial or ad tested) Test-market audits

150

200-300

10 stores

10-20 stores

Focus groups

2 groups

4-12 groups

4-63

Classification of Sampling Techniques Fig. 11.2

Sampling Techniques

Nonprobability Sampling Techniques

Convenience Sampling

Judgmental Sampling

Simple Random Sampling

Systematic Sampling

Probability Sampling Techniques

Quota Sampling

Stratified Sampling

Snowball Sampling

Cluster Sampling

Other Sampling Techniques

4-64

Data Preparation Process Fig. 14.1

Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Code Transcribe Clean Data Statistically Adjust the Data Select Data Analysis Strategy

4-65

Selecting a Data Analysis Strategy Fig. 14.5 Earlier Steps (1, 2, & 3) of the Marketing Research Process Known Characteristics of the Data Properties of Statistical Techniques Background and Philosophy of the Researcher Data Analysis Strategy

4-66

A Classification of Univariate Techniques Fig. 14.6

Univariate Techniques

Non-numeric Data

Metric Data One Sample * t test * Z test

Two or More Samples

Independent * TwoGroup test * Z test * One-Way ANOVA

One Sample * Frequency * Chi-Square * K-S * Runs * Binomial

Two or More Samples

Related * Paired t test

Independent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA

Related * Sign * Wilcoxon * McNemar * Chi-Square

4-67

A Classification of Multivariate Techniques Fig. 14.7

Multivariate Techniques

Dependence Technique One Dependent Variable * CrossTabulation * Analysis of Variance and Covariance * Multiple Regression * Conjoint Analysis

More Than One Dependent Variable * Multivariate Analysis of Variance and Covariance * Canonical Correlation * Multiple Discriminant Analysis

Interdependence Technique Variable Interdependence * Factor Analysis

Interobject Similarity * Cluster Analysis * Multidimensional Scaling

4-68

Frequency Distribution „

„

In a frequency distribution, one variable is considered at a time. A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable.

Statistics Associated with Frequency Distribution

4-69

Measures of Location „

The mean, or average value, is the most commonly used measure of central tendency. The mean, X ,is given by n

X = Σ X i /n i=1

Where, Xi = Observed values of the variable X n = Number of observations (sample size) „

The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories.

Statistics Associated with Frequency Distribution

4-70

Measures of Location „

The median of a sample is the middle value when the data are arranged in ascending or descending order. If the number of data points is even, the median is usually estimated as the midpoint between the two middle values – by adding the two middle values and dividing their sum by 2. The median is the 50th percentile.

Statistics Associated with Frequency Distribution

4-71

Measures of Variability „

The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample. Range = Xlargest –

Xsmallest.

„

The interquartile range is the difference between the 75th and 25th percentile. For a set of data points arranged in order of magnitude, the pth percentile is the value that has p% of the data points below it and (100 - p)% above it.

Statistics Associated with Frequency Distribution

4-72

Measures of Variability „

„

The variance is the mean squared deviation from the mean. The variance can never be negative. The standard deviation is the square root of the variance. n (Xi - X)2 sx = i =1 n - 1

Σ

„

The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability.

CV = s x/X

Statistics Associated with Frequency Distribution

4-73

Measures of Shape „

„

Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other. Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution.

4-74

Skewness of a Distribution Figure 15.2

Symmetric Distribution

Skewed Distribution Mean Median Mode (a) Mean Median Mode (b)

4-75

Steps Involved in Hypothesis Testing Fig. 15.3

Formulate H0 and H1 Select Appropriate Test Choose Level of Significance Collect Data and Calculate Test Statistic

Determine Probability Associated with Test Statistic

Determine Critical Value of Test Statistic TSCR

Compare with Level of Significance, α

Determine if TSCR falls into (Non) Rejection Region

Reject or Do not Reject H0 Draw Marketing Research Conclusion

4-76

A Broad Classification of Hypothesis Tests Figure 15.6 Hypothesis Tests

Tests of Differences

Tests of Association

Distributions

Means

Proportions

Median/ Rankings

4-77

Cross-Tabulation „

„

While a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values, e.g., Table 15.3.

4-78

Gender and Internet Usage Table 15.3 Gender Internet Usage

Male

Female

Row Total

Light (1)

5

10

15

Heavy (2)

10

5

15

Column Total

15

15

4-79

Internet Usage by Gender Table 15.4

Gender Internet Usage

Male

Female

Light

33.3%

66.7%

Heavy

66.7%

33.3%

Column total

100%

100%

4-80

Gender by Internet Usage Table 15.5

Internet Usage Gender

Light

Heavy

Total

Male

33.3%

66.7%

100.0%

Female

66.7%

33.3%

100.0%

Introduction of a Third Variable in CrossTabulation Fig. 15.7

Original Two Variables

Some Association between the Two Variables

No Association between the Two Variables

Introduce a Third Variable

Introduce a Third Variable

Refined Association between the Two Variables

4-81

No Association between the Two Variables

No Change in the Initial Pattern

Some Association between the Two Variables

4-82

Purchase of Fashion Clothing by Marital Status Table 15.6

Purchase of Fashion Clothing

Current Marital Status Married

Unmarried

High

31%

52%

Low

69%

48%

Column

100%

100%

700

300

Number of respondents

4-83

Purchase of Fashion Clothing by Marital Status Table 15.7 Pur chase of Fashion Clothing

Sex Male Marr ied

Female

High

35%

Not Mar r ied 40%

Mar r ied 25%

Not Mar r ied 60%

Low

65%

60%

75%

40%

Column totals Number of cases

100%

100%

100%

100%

400

120

300

180

Eating Frequently in Fast-Food Restaurants by Family Size

4-84

Table 15.12

Eat Frequently in FastFood Restaurants

Family Size Small

Large

Yes

65%

65%

No

35%

35%

Column totals

100%

100%

500

500

Number of cases

Eating Frequently in Fast Food-Restaurants by Family Size & Income Table 15.13

Income Eat Frequently in FastFood Restaurants

Low

Family size Small Large Yes 65% 65% No 35% 35% Column totals 100% 100% Number of respondents 250 250

High Family size Small Large 65% 65% 35% 35% 100% 100% 250 250

4-85

4-86

Chi-square Distribution Figure 15.8

Do Not Reject H0

Reject H0

Critical Value

χ2

Statistics Associated with Cross-Tabulation

4-87

Chi-Square „

„

The chi-square statistic ( χ 2 ) is used to test the statistical significance of the observed association in a cross-tabulation. The expected frequency for each cell can be calculated by using a simple formula:

n n r fe = n c where

nr nc n

= total number in the row = total number in the column = total sample size

Statistics Associated with Cross-Tabulation

4-88

Chi-Square

For the data in Table 15.3, the expected frequencies for the cells going from left to right and from top to bottom, are: 15 X 15 = 7.50 30

15 X 15 = 7.50 30

15 X 15 = 7.50 30

15 X 15 = 7.50 30

Then the value of χ 2 is calculated as follows: χ2 =

Σ all cells

(f o - f e) 2 fe

Statistics Associated with Cross-Tabulation Chi-Square

2 χ For the data in Table 15.3, the value of is

calculated as: = (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2 7.5 7.5 7.5 7.5 =0.833 + 0.833 + 0.833+ 0.833 = 3.333

4-89

Statistics Associated with Cross-Tabulation

4-90

Lambda Coefficient „

„

„

„

Asymmetric lambda measures the percentage improvement in predicting the value of the dependent variable, given the value of the independent variable. Lambda also varies between 0 and 1. A value of 0 means no improvement in prediction. A value of 1 indicates that the prediction can be made without error. This happens when each independent variable category is associated with a single category of the dependent variable. Asymmetric lambda is computed for each of the variables (treating it as the dependent variable). A symmetric lambda is also computed, which is a kind of average of the two asymmetric values. The symmetric lambda does not make an assumption about which variable is dependent. It measures the overall improvement when prediction is done in both directions.

A Classification of Hypothesis Testing Procedures for Examining Differences Fig. 15.9

Hypothesis Tests

Non-parametric Tests (Nonmetric Tests)

Parametric Tests (Metric Tests) One Sample * t test * Z test

Two or More Samples

Independent Samples * Two-Group t test * Z test

4-91

Paired Samples * Paired t test

One Sample * * * *

Chi-Square K-S Runs Binomial

Two or More Samples

Independent Samples * Chi-Square * Mann-Whitney * Median * K-S

* * * *

Paired Samples Sign Wilcoxon McNemar Chi-Square

4-92

Non-Parametric Tests Nonparametric tests are used when the independent variables are nonmetric. Like parametric tests, nonparametric tests are available for testing variables from one sample, two independent samples, or two related samples.

Non-Parametric Tests

4-93

One Sample

Sometimes the researcher wants to test whether the observations for a particular variable could reasonably have come from a particular distribution, such as the normal, uniform, or Poisson distribution. The Kolmogorov-Smirnov (K-S) one-sample test is one such goodness-of-fit test. The K-S compares the cumulative distribution function for a variable with a specified distribution. Ai denotes the cumulative relative frequency for each category of the theoretical (assumed) distribution, and Oi the comparable value of the sample frequency. The K-S test is based on the maximum value of the absolute difference between Ai and Oi. The test statistic is

K = Max A i - Oi

Non-Parametric Tests

4-94

One Sample „

„

„

The chi-square test can also be performed on a single variable from one sample. In this context, the chi-square serves as a goodness-of-fit test. The runs test is a test of randomness for the dichotomous variables. This test is conducted by determining whether the order or sequence in which observations are obtained is random. The binomial test is also a goodness-of-fit test for dichotomous variables. It tests the goodness of fit of the observed number of observations in each category to the number expected under a specified binomial distribution.

Non-Parametric Tests

4-95

Two Independent Samples „

„ „ „

„

When the difference in the location of two populations is to be compared based on observations from two independent samples, and the variable is measured on an ordinal scale, the Mann-Whitney U test can be used. In the Mann-Whitney U test, the two samples are combined and the cases are ranked in order of increasing size. The test statistic, U, is computed as the number of times a score from sample or group 1 precedes a score from group 2. If the samples are from the same population, the distribution of scores from the two groups in the rank list should be random. An extreme value of U would indicate a nonrandom pattern, pointing to the inequality of the two groups. For samples of less than 30, the exact significance level for U is computed. For larger samples, U is transformed into a normally distributed z statistic. This z can be corrected for ties within ranks.

4-96

SPSS Windows „

„

„

The main program in SPSS is FREQUENCIES. It produces a table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the associated statistics. If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that can be calculated.

4-97

SPSS Windows To select these procedures click: Analyze>Descriptive Statistics>Frequencies Analyze>Descriptive Statistics>Descriptives Analyze>Descriptive Statistics>Explore The major cross-tabulation program is CROSSTABS. This program will display the cross-classification tables and provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the strength of the association that have been discussed. To select these procedures click: Analyze>Descriptive Statistics>Crosstabs

4-98

SPSS Windows The major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conduct t tests on one sample or independent or paired samples. To select these procedures using SPSS for Windows click: Analyze>Compare Means>Means … Analyze>Compare Means>One-Sample T Test … Analyze>Compare Means>IndependentSamples T Test … Analyze>Compare Means>Paired-Samples T Test …

4-99

SPSS Windows The nonparametric tests discussed in this chapter can be conducted using NONPARAMETRIC TESTS. To select these procedures using SPSS for Windows click: Analyze>Nonparametric Tests>Chi-Square … Analyze>Nonparametric Tests>Binomial … Analyze>Nonparametric Tests>Runs … Analyze>Nonparametric Tests>1-Sample K-S … Analyze>Nonparametric Tests>2 Independent Samples … Analyze>Nonparametric Tests>2 Related Samples …

4-100

Product Moment Correlation „

„

„

The product moment correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. It is an index used to determine whether a linear or straight-line relationship exists between X and Y. As it was originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate correlation, or merely the correlation coefficient.

4-101

Product Moment Correlation „ „

r varies between -1.0 and +1.0. The correlation coefficient between two variables will be the same regardless of their underlying units of measurement.

Statistics Associated with Bivariate Regression Analysis „

„

„

„

4-102

Regression coefficient. The estimated parameter b is usually referred to as the nonstandardized regression coefficient. Scattergram. A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or observations. Standard error of estimate. This statistic, SEE, is the standard deviation of the actual Y values from the predicted Y values. Standard error. The standard deviation of b, SEb, is called the standard error.

Statistics Associated with Bivariate Regression Analysis „

„

„

Standardized regression coefficient. Also termed the beta coefficient or beta weight, this is the slope obtained by the regression of Y on X when the data are standardized. Sum of squared errors. The distances of all the points from the regression line are squared and added together to arrive at the sum of squared errors, which is a measure of total error, Σe 2 j .

t statistic. A t statistic with n - 2 degrees of

freedom can be used to test the null hypothesis that no linear relationship exists between X and Y, or H0: β 1 = 0, where t = b SEb

4-103

Conducting Bivariate Regression Analysis Plot the Scatter Diagram „

„

A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or observations. The most commonly used technique for fitting a straight line to a scattergram is the least-squares procedure.

In fitting the line, the least-squares procedure minimizes the sum of squared errors, Σe 2 j .

4-104

4-105

Conducting Bivariate Regression Analysis Fig. 17.2 Plot the Scatter Diagram Formulate the General Model Estimate the Parameters Estimate Standardized Regression Coefficients Test for Significance Determine the Strength and Significance of Association Check Prediction Accuracy Examine the Residuals Cross-Validate the Model

4-106

Multiple Regression The general form of the multiple regression model is as follows:

Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k Xk + e which is estimated by the following equation:

Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk As before, the coefficient a represents the intercept, but the b's are now the partial regression coefficients.

4-107

Multicollinearity „

„

Multicollinearity arises when intercorrelations among the predictors are very high. Multicollinearity can result in several problems, including: „ The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high. „ The magnitudes as well as the signs of the partial regression coefficients may change from sample to sample. „ It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable. „ Predictor variables may be incorrectly included or removed in stepwise regression.

4-108

SPSS Windows The CORRELATE program computes Pearson product moment correlations and partial correlations with significance levels. Univariate statistics, covariance, and cross-product deviations may also be requested. Significance levels are included in the output. To select these procedures using SPSS for Windows click: Analyze>Correlate>Bivariate … Analyze>Correlate>Partial … Scatterplots can be obtained by clicking: Graphs>Scatter …>Simple>Define REGRESSION calculates bivariate and multiple regression equations, associated statistics, and plots. It allows for an easy examination of residuals. This procedure can be run by clicking: Analyze>Regression Linear …

Similarities and Differences between ANOVA, Regression, and Discriminant Analysis Table 18.1 ANOVA Similarities Number of dependent variables Number of independent variables Differences Nature of the dependent variables Nature of the independent variables

REGRESSION

DISCRIMINANT ANALYSIS

One

One

One

Multiple

Multiple

Multiple

Metric

Metric

Categorical

Categorical

Metric

Metric

4-109

4-110

Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature. The objectives of discriminant analysis are as follows: „ Development of discriminant functions, or linear combinations of the predictor or independent variables, which will best discriminate between the categories of the criterion or dependent variable (groups). „ Examination of whether significant differences exist among the groups, in terms of the predictor variables. „ Determination of which predictor variables contribute to most of the intergroup differences. „ Classification of cases to one of the groups based on the values of the predictor variables. „ Evaluation of the accuracy of classification.

4-111

Statistics Associated with Discriminant Analysis „

„

„

Canonical correlation. Canonical correlation measures the extent of association between the discriminant scores and the groups. It is a measure of association between the single discriminant function and the set of dummy variables that define the group membership. Centroid. The centroid is the mean values for the discriminant scores for a particular group. There are as many centroids as there are groups, as there is one for each group. The means for a group on all the functions are the group centroids. Classification matrix. Sometimes also called confusion or prediction matrix, the classification matrix contains the number of correctly classified and misclassified cases.

4-112

Statistics Associated with Discriminant Analysis „

„

„

Discriminant function coefficients. The discriminant function coefficients (unstandardized) are the multipliers of variables, when the variables are in the original units of measurement. Discriminant scores. The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores. Eigenvalue. For each discriminant function, the Eigenvalue is the ratio of between-group to withingroup sums of squares. Large Eigenvalues imply superior functions.

4-113

Conducting Discriminant Analysis Fig. 18.1 Formulate the Problem

Estimate the Discriminant Function Coefficients

Determine the Significance of the Discriminant Function

Interpret the Results

Assess Validity of Discriminant Analysis

4-114

SPSS Windows The DISCRIMINANT program performs both twogroup and multiple discriminant analysis. To select this procedure using SPSS for Windows click: Analyze>Classify>Discriminant …

4-115

Factor Analysis „

„

„

Factor analysis is a general name denoting a class of procedures primarily used for data reduction and summarization. Factor analysis is an interdependence technique in that an entire set of interdependent relationships is examined without making the distinction between dependent and independent variables. Factor analysis is used in the following circumstances: „ To identify underlying dimensions, or factors, that explain the correlations among a set of variables. „ To identify a new, smaller, set of uncorrelated variables to replace the original set of correlated variables in subsequent multivariate analysis (regression or discriminant analysis). „ To identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis.

4-116

Factor Analysis Model „

„

„

It is possible to select weights or factor score coefficients so that the first factor explains the largest portion of the total variance. Then a second set of weights can be selected, so that the second factor accounts for most of the residual variance, subject to being uncorrelated with the first factor. This same principle could be applied to selecting additional weights for the additional factors.

4-117

Conducting Factor Analysis Fig 19.1

Problem formulation Construction of the Correlation Matrix Method of Factor Analysis Determination of Number of Factors Rotation of Factors Interpretation of Factors Selection of Surrogate Variables

Calculation of Factor Scores Determination of Model Fit

Conducting Factor Analysis

4-118

Determine the Number of Factors „

„

A Priori Determination. Sometimes, because of

prior knowledge, the researcher knows how many factors to expect and thus can specify the number of factors to be extracted beforehand.

Determination Based on Eigenvalues. In this approach, only factors with Eigenvalues greater than 1.0 are retained. An Eigenvalue represents the amount of variance associated with the factor. Hence, only factors with a variance greater than 1.0 are included. Factors with variance less than 1.0 are no better than a single variable, since, due to standardization, each variable has a variance of 1.0. If the number of variables is less than 20, this approach will result in a conservative number of factors.

4-119

SPSS Windows To select this procedures using SPSS for Windows click: Analyze>Data Reduction>Factor …

4-120

Cluster Analysis „

„

Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters. Cluster analysis is also called classification analysis, or numerical taxonomy. Both cluster analysis and discriminant analysis are concerned with classification. However, discriminant analysis requires prior knowledge of the cluster or group membership for each object or case included, to develop the classification rule. In contrast, in cluster analysis there is no a priori information about the group or cluster membership for any of the objects. Groups or clusters are suggested by the data, not defined a priori.

4-121

An Ideal Clustering Situation

Variable 1

Fig. 20.1

Variable 2

4-122

Conducting Cluster Analysis Fig. 20.3 Formulate the Problem Select a Distance Measure Select a Clustering Procedure Decide on the Number of Clusters Interpret and Profile Clusters Assess the Validity of Clustering

4-123

A Classification of Clustering Procedures Clustering Procedures

Fig. 20.4

Nonhierarchical

Hierarchical Agglomerative

Divisive Sequential Threshold

Linkage Methods

Variance Methods

Parallel Threshold Centroid Methods

Ward’s Method Single

Complete

Average

Optimizing Partitioning

Conducting Cluster Analysis

4-124

Select a Clustering Procedure – Hierarchical „

„

„

„

Hierarchical clustering is characterized by the development of a hierarchy or tree-like structure. Hierarchical methods can be agglomerative or divisive. Agglomerative clustering starts with each object in a separate cluster. Clusters are formed by grouping objects into bigger and bigger clusters. This process is continued until all objects are members of a single cluster. Divisive clustering starts with all the objects grouped in a single cluster. Clusters are divided or split until each object is in a separate cluster. Agglomerative methods are commonly used in marketing research. They consist of linkage methods, error sums of squares or variance methods, and centroid methods.

Conducting Cluster Analysis

4-125

Select a Clustering Procedure – Linkage Method „

„

„

The single linkage method is based on minimum distance, or the nearest neighbor rule. At every stage, the distance between two clusters is the distance between their two closest points (see Figure 20.5). The complete linkage method is similar to single linkage, except that it is based on the maximum distance or the furthest neighbor approach. In complete linkage, the distance between two clusters is calculated as the distance between their two furthest points. The average linkage method works similarly. However, in this method, the distance between two clusters is defined as the average of the distances between all pairs of objects, where one member of the pair is from each of the clusters (Figure 20.5).

4-126

Linkage Methods of Clustering Single Linkage

Fig. 20.5

Minimum Distance Cluster 1

Complete Linkage

Cluster 2

Maximum Distance

Cluster 1

Average Linkage

Cluster 2

Average Distance Cluster 1

Cluster 2

4-127

Other Agglomerative Clustering Methods Fig. 20.6

Ward’s Procedure

Centroid Method

4-128

SPSS Windows To select this procedures using SPSS for Windows click: Analyze>Classify>Hierarchical Cluster … Analyze>Classify>K-Means Cluster …

Related Documents