Marketing Research Notes Chapter12

  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Marketing Research Notes Chapter12 as PDF for free.

More details

  • Words: 3,732
  • Pages: 5
RESEARCH METHODOLOGY

LESSON 12: MEASUREMENT & SCALING Levels of Measurement

• o Most opinion and attitude scales or indexes in the social

We know that the level of measurement is a scale by which a variable is measured. For 50 years, with few detractors, science has used the Stevens (1951) typology of measurement levels (scales). There are three things, which you need to remember about this typology: Any thing that can be measured falls into one of the four types The higher the level of measurement, the more precision in measurement and every level up contains all the properties of the previous level. The four levels of measurement, from lowest to highest, are as follows: • Nominal • Ordinal • Interval • Ratio

sciences are ordinal in nature

Interval Level of Measurement The interval level of measurement describes variables that have more or less equal intervals, or meaningful distances between their ranks. For example, if you were to ask somebody if they were first, second, or third generation immigrant, the assumption is that the distance, or number of years, between each generation is the same. Ratio Level of Measurement The ratio level of measurement describes variables that have equal intervals and a fixed zero (or reference) point. It is possible to have zero income, zero education, and no involvement in crime, but rarely do we see ratio level variables in social science since it’s almost impossible to have zero attitudes on things, although “not at all”, “often”, and “twice as often” might qualify as ratio level measurement. Advanced statistics require • At least interval level measurement, so the researcher always

strives for this level, • Accepting ordinal level (which is the most common) only

when they have to. • Variables should be conceptually and operationally defined

with levels of measurement in mind since it’s going to affect the analysis of data later

Types of Measurement Scales Ordinal and nominal data are always discrete. Continuous data has to be at either ratio or interval level of measure Now let us discuss these in detail: Nominal Level of Measurement

Nominal variables include demographic characteristics like sex, race, and religion. The nominal level of measurement describes variables that are categorical in nature. The characteristics of the data you’re collecting fall into distinct categories: • If there are a limited number of distinct categories (usually

only two), then you’re dealing with a dichotomous variable. • If there are an unlimited or infinite number of distinct

categories, then you’re dealing with a continuous variable. Ordinal Level of Measurement • The ordinal level of measurement describes variables that can

be ordered or ranked in some order of importance. • It describes most judgments about things, such as big or

little, strong or weak.

74

© Copy Right: Rai University

11.556

Anything you do to standardize or clarify your measurement instrument to reduce user error will add to your reliability. It’s also important consider the time frame that is appropriate for what you’re studying as soon as possible. Some social and psychological phenomena (most notably those involving behaviour or action) lend themselves to a snapshot in time. If so, your research need only be carried out for a short period of time, perhaps a few weeks or a couple of months. In such a case, your time frame is referred to as cross-sectional. Sometimes, crosssectional research is criticized as being unable to determine cause and effect A longer time frame is called when cross-sectional data fails to depict the cause- effect relationship, one that is called longitudinal, which may add years onto carrying out your research. There are many different types of longitudinal research, such as those that involve time-series (such as tracking a third world nation’s economic development over four years or so). The general rule is to use longitudinal research the greater the number of variables you’ve got operating in your study and the more confident you want to be about cause and effect.

Methods of Measuring Reliability Now, the question arises that how will you measure the reliability of a particular measure? There are four good methods of measuring reliability: • Test-retest • Multiple forms • Inter-rater

Figure: Levels Of Measurement Reliability and Validity For a research study to be accurate, its findings must be both reliable and valid. Reliability

Research means that the findings would be consistently the same if the study were done over again Validity

A valid measure is one that provides the information that it was intended to provide. The purpose of a thermometer, for example, is to provide information on the temperature, and if it works correctly, it is a valid thermometer. A study can be reliable but not valid, and it cannot be valid without first being reliable. There are many different threats to validity as well as reliability but an important early consideration is to ensure you have internal validity. Reliable but not Valid

Not Reliable (so not valid either)

Reliable AND Valid

This means that you are using the most appropriate research design for what you’re studying (experimental, quasi-experimental, survey, qualitative, or historical), and it also means that you have

11.556

• Split-half • Test-retest

Test-retest The Test Retest in the same group technique is to administer your test, instrument, survey, or measure to the same group of people at different points in time. Most researchers administer what is called a pretest for this, and to troubleshoot bugs at the same time. All reliability estimates are usually in the form of a correlation coefficient, so here, all you do is calculate the correlation coefficient between the two scores of each group and report it as your reliability coefficient. Multiple Forms The multiple forms technique has other names, such as parallel forms and disguised test-retest, but it’s simply the scrambling or mixing up of questions on your survey, for example, giving it to the same group twice. It’s a more rigorous test of reliability. Inter-rater Inter-rater reliability is most appropriate when you use assistants to do interviewing or content analysis for you. To calculate this kind of reliability, all you do is report the percentage of agreement on the same subject between your raters, or assistants. Split-half Taking half of your test, instrument, or survey, and analyzing that half as if it were the whole thing estimate split-half reliability.

© Copy Right: Rai University

75

RESEARCH METHODOLOGY

screened out spurious variables as well as thought out the possible contamination of other variables creeping into your study.

RESEARCH METHODOLOGY

Then, you compare the results of this analysis with your overall analysis.

• Attitudes do not change much over time

Methods of Measuring Validity

• Attitudes can be related to preferences.

Once you find that your measurement of variable under study is reliable, you will want to measure its validity. There are four good methods of estimating validity:

Attitudes can be measured using the following procedures:

• Face • Content • Criterion • Construct

Face Validity Face validity is the least statistical estimate (validity overall is not as easily quantified as reliability) as it’s simply an assertion on the researcher’s part claiming that they’ve reasonably measured what they intended to measure. It’s essentially a “take my word for it” kind of validity. Usually, a researcher asks a colleague or expert in the field to vouch for the items measuring what they were intended to measure. Content Validity Content validity goes back to the ideas of conceptualization and operationalization. If the researcher has focused in too closely on only one type or narrow dimension of a construct or concept, then it’s conceivable that other indicators were overlooked. In such a case, the study lacks content validity Content validity is making sure you’ve covered all the conceptual space. There are different ways to estimate it, but one of the most common is a reliability approach where you correlate scores on one domain or dimension of a concept on your pretest with scores on that domain or dimension with the actual test. Another way is to simply look over your inter-item correlations. Criterion Validity Criterion validity is using some standard or benchmark that is known to be a good indicator. There are different forms of criterion validity: • Concurrent validity is how well something estimates actual

day-by-day behavior;

• Self-reporting - subjects are asked directly about their

attitudes. Self-reporting is the most common technique used to measure attitude. • Observation of behaviour - assuming that one’s behaviour is a result of one’s attitudes, attitudes can be inferred by observing behaviour. For example, one’s attitude about an issue can be inferred by whether he/she signs a petition related to it. • Indirect techniques - use unstructured stimuli such as word

association tests. • Performance of objective tasks - assumes that one’s

performance depends on attitude. For example, the subject can be asked to memorize the arguments of both sides of an issue. He/she is more likely to do a better job on the arguments that favor his/her stance. • Physiological reactions - subject’s response to a stimulus is

measured using electronic or mechanical means. While the intensity can be measured, it is difficult to know if the attitude is positive or negative. • Multiple measures - a mixture of techniques can be used to

validate the findings; especially worthwhile when selfreporting is used.There are several types of attitude rating scales: Attitude Measurement Many of the questions in a marketing research survey are designed to measure attitudes. Attitudes are a person’s general evaluation of something. Customer attitude is an important factor for the following reasons: • Attitude helps to explain how ready one is to do something. • Attitudes do not change much over time. • Attitudes produce consistency in behavior. • Attitudes can be related to preferences.

• Predictive validity is how well something estimates some

future event or manifestation that hasn’t happened yet. It is commonly found in criminology.

Construct Validity Construct validity is the extent to which your items are tapping into the underlying theory or model of behavior. It’s how well the items hang together (convergent validity) or distinguish different people on certain traits or behaviors (discriminant validity). It’s the most difficult validity to achieve. You have to either do years and years of research or find a group of people to test that have the exact opposite traits or behaviors you’re interested in measuring. Attitude Measurement Many of the questions in a questionnaire are designed to measure attitudes. Attitudes are a person’s general evaluation of something. Customer attitude is an important factor for the following reasons: • Attitude helps to explain how ready one is to do something.

76

• Attitudes produce consistency in behavior.

Scaling Defined Scaling is a “procedure for the assignment of numbers (or other symbols) to a property of objects in order to impart some of the characteristics of numbers to the properties in question.”1 Thus, one assigns a number scale to the various levels of heat and cold and call it a thermometer. Response Methods Questioning is a widely used stimulus for measuring concepts. A manager may be asked his or her views concerning an employee. The response is,” a good machinist,” “a troublemaker,” “a union activist,” “reliable,” or “a fast worker with a poor record of attendance.” These answers represent different frames of reference for evaluating the worker and are often of limited value to the researcher. Two approaches improve the usefulness of such replies. First, various properties may be separated arid the respondent asked to judge each specific facet. Here, several questions are substituted for

© Copy Right: Rai University

11.556

Example of a Likert Scale

How would you rate the following aspects of your food store?

To quantify dimensions that are essentially qualitative, rating scales or ranking scales are used. Rating Scales One uses rating scales to judge properties of objects without reference to other similar objects. These ratings may be in such forms as “like-dislike,” “approve-indifferent disapprove,” or other classifications using even more categories. There is little conclusive support for choosing a three-point scale over scales with five or more points. Some researchers think that more points on a rating scale provide an opportunity for greater sensitivity of measurement and extraction of variance. The most widely used scales range from three to seven points, but it does not seem to make much difference which number is used-with two exceptions.4 First, a larger number of scale points is needed to produce accuracy with single-item versus multiple-item scales. Second, in cross-cultural measurement, the culture may condition respondents to a standard metric-a ten-point scale in Italy. Ranking Scales In ranking scales, the subject directly compares two or more objects and makes choices among them. Frequently, the respondent is asked to select one as the “best” or the “most preferred.” When there are only two choices, this approach is satisfactory, but it often results in “ties” when more than two choices are found. For example, respondents are asked to select the most preferred among three or more models of a product. Assume that 40 percent choose model A, 30 percent choose model B. and 30 percent choose model C. “Which is, the preferred model?” The analyst would be taking a risk to suggest that A is most preferred. Perhaps that interpretation is correct, but 60 percent of the respondents chose some model other than A. Perhaps all B and C voters would place A last, preferring either B or C to it. This ambiguity can be avoided by using some of the techniques described in this section. Some of the measurement scales are discussed below:

Extremely

Extremely

Important

unimportant

Service Check outs Bakery Deli

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

5 5 5 5

6 6 6 6

7 7 7 7

Semantic Differential scale A semantic differential scale is constructed using phrases describing attributes of the product to anchor each end. For example, the left end may state, “Hours are inconvenient” and the right end may state, “Hours are convenient”. The respondent then marks one of the seven blanks between the statements to indicate his/her opinion about the attribute. The process entitled Semantic Differential employs a similar approach as the Likert scaling in that it seeks a range of responses between extreme polarities but it seeks to place the ordinal range of responses between two keywords expressing opposite “ideas” or concepts.Bobbie’s illustration provides the best illustration of the concept.

Semantic Differential: Feelings about Musical Selections Very Much

Somewhat

Neither

Somewhat

Very Much

Enjoyable

Unenjoyable

Simple

Complex

Discordant

Harmonic

Traditional

Modern

Equal-appearing Interval Scaling

In this scale a set of statements are assembled. These statements are selected according to their position on an interval scale of favorableness. Statements are chosen that has a small degree of dispersion. Respondents then are asked to indicate with which statements they agree. Likert Method of Summated Ratings

In this scale a statement is made and the respondents indicate their degree of agreement or disagreement on a five-point scale (Strongly Disagree, Disagree, Neither Agree Nor Disagree, Agree, Strongly Agree). It actually extends beyond the simple ordinal choices of “strongly agree”, “agree”, “disagree”, and “strongly disagree” In fact, Likert scaling is initially assigned through a process that calculates the average index score for each item in an index and subsequently ranks them in order of intensity (recall the process for constructing Turnstone scales). Once ordinality has been assigned, the assumption is that a respondent choosing a response weighted with say a 15 out of 20 in an increasing scale of intensity is placed at that level for the index.

11.556

One of the first things that strike you is the highly interpretative nature of Bobbie’s example. Choices such as “enjoyable” and “un enjoyable” simply reflect preference, but the other choices are sufficiently ambiguous as to invite imprecise understanding. If you are seeking nothing more than attitudinal information to an abstract social artifact such as a piece of music, the process of semantic differential may be usable. Otherwise, its ambiguity in application remains problematic. As with the Likert, Bogardus and Thurstone scales, Guttman scaling seeks to place indicators into an ordinal progression from “weak” indicators to “strong” ones (well, that’s the difference between a scale and an index in the first place). Similarly, the assumption that a respondent indicating a given level of preference, attitude or belief will also demonstrate all “weaker” indicators of the same thing. However, the premise of the Guttman scale extends even further, in that it examines all of the responses to the survey and separates out the number of responses that do not exactly reflect the scalar

© Copy Right: Rai University

77

RESEARCH METHODOLOGY

a single one. Second, we can replace the free-response reply with structuring devices.

RESEARCH METHODOLOGY

pattern; that is the number of response sets that do not reflect the assumption that a respondent choosing one level of response would give the same type of response to all inferior levels. The number of response sets that violate the scalar pattern is compared to the number that do reflect the pattern and what is referred to as a coefficient of reproducibility. Again, Bobbie’s illustration provides a very clear understanding. Guttman Scaling and Coefficient of Reproducibility

Scale Types

Mixed Types

Response Pattern

Number of Cases

Index Scores

Scale Scores

Total Scale Errors

+++ ++= +== ===

612 448 92 79

3 2 1 0

3 2 1 0

0 0 0 0

=+= +=+ ==+ =++

15 5 2 5

1 2 1 2

2 3 0 3

15 5 2 5

Coefficient of Reproducibility = 1 -

___

___ ___

___

___ dirty

Bright

___

___ ___

___

___ dark

Low quality

___

___ ___

___

__high quality

Conservative

___

___ ___

___

__innovative

Stapel Scale

It is similar to the semantic differential scale except that numbers identifies points on the scale, only one statement is used and if the respondent disagrees a negative number should marked, and there are 10 positions instead of seven. This scale does not require that bipolar adjectives be developed and it can be administered by telephone. Q-sort Technique

Number of Errors

Number of Guesses 27 27 In the example = 1 = = .993 or 99.3% 1,258 x 3 3,774

The entire exercise is really just a way of indicating that the degree to which a set of responses accurately reflects the scalar assumptions is an indication of the degree to which the entire set could be recreated from the scale itself. What the above illustration shows is that if we were to project an imaginary “sample” from the coefficient of reproducibility of 99.3%, then the projection would reflect the real sample to that degree. Guttman scaling shows that a well constructed scale can very accurately the profile of a response set. But then, you only know the coefficient of reproducibility after you have run the survey and crunched the numbers so it is not a predictive tool, it is a proof of the strength of the scale as a measure. A brief word on typologies is in order. So far, we have limited ourselves to an examination of unidirectional variables; that is one thing in one direction (attitudes for or against abortion, etc.). Often relationships are better explained as the function of the intersection of several variables. This is referred to as a typology. Remember what we have noted about making sure that your indices and scales are comprised of single dimension indicators. Recall that while “religion” can have a strong correlation with “attitudes on abortion”, that does not mean that a question on religion belongs in an index or scale of questions on “attitudes on abortion”. But, if you wish to examine the intersection of the two, you can construct a typology effectively showing, for example that “Catholics” may be “conservative” on “abortion” but remain “liberal” on “other human rights”. Bobbie warns us that typologies are useful as independent variables (“religion” may be a good causal factor in “attitudes on abortion”) but can be problematic as dependent variables (explaining the “why” isn’t always clear). Catholics may be more anti-abortion because the church has forbidden it but what of other groups? You can get onto some very shaky ground using typologies as the “effect” or dependent variable. Example of Semantic Differential How would you describe Kmart, Target, and Wal-Mart on the following scale? 78

Clean

In Q- sort Technique the respondent if forced to construct a normal distribution by placing a specified number of cards in one of 11 stacks according to how desirable he/she finds the characteristics written on the cards. This technique is faster and less tedious for subjects than paired comparison measures. It also forces the subject to conform to quotas at each point of the scale so as to yield a normal or quasi – normal distribution. Thus we can say that the objective of Q-Technique is intensive study of individuals. Selection of an appropriate attitude measurement of scale: We have examined a number of different techniques, which are available for the measurement of attitudes. Each method has got certain strengths and weaknesses. Almost all the techniques can be used for the measurement of any component of attitudes. But all the techniques are not suitable for all purposes. The selection depends upon the stage and size of research. Generally, Q-sort and Semantic differential scale are preferred in the preliminary stages. The Likert scale is used for item analysis. For specific attributes the semantic differential scale is very appropriate. Overall the semantic differential is simple in concept and results obtained are comparable with more complex, onedimensional methods. Hence it is widely used. Limitations of Attitude Measurement Scales The main limitation of these tools is the emphasis on describing attitudes rather than predicting behaviour. This is primarily because of a lack of models that describe the attitudes in behaviour

Tutorial Prepare a questionnaire on any one of the following objectives 1. To know the corporate productivity 2. Job analysis / needs and satisfaction level of employees/ motivation level of employees /job involvement etc. 3. Product testing / Feedback of after sales services

References Donald R. Cooper – Business Research Methods, Tata McGraw – Hill Publication Kothari C R – Quantitative Techniques (Vikas Publishing House 3rd ed.) Levin R I & Rubin DS - Statistics for Management (Prentice Hall of India, 2002)

© Copy Right: Rai University

11.556

Related Documents