COPY TESTING Copy testing is a specialized field of marketing research, it is the study of television commercials prior to airing them. It is defined as research to determine an ad’s effectiveness based on customers’ responses to the ad and covers all media including print, TV, radio, Internet etcAlthough also known as copy testing, pretesting is considered the more accurate, modern name (Young, p.4) for the prediction of how effectively an ad will perform, based on the analysis of feedback gathered from the target audience. Each test will either qualify the ad as strong enough to meet company action standards for airing or identify opportunities to improve the performance of the ad through editing. (Young, p.213) Pre-testing is also used to identify weak spots within an ad campaign, to more effectively edit 60-second ads to 30-second ads or 30’s to 15’s, to select images from the spot to use in an integrated campaign’s print ad, to pull out the key moments for use in ad tracking, and to identify branding moments. [1] Features of a Good Copy Testing system In 1982, a consortium of 21 leading advertising agencies including N.W.Ayers, D’Arcy, Grey, McCann-Erikson, Needham Harper & Steers, Ogilvy & Mather, J.Walter Thompson, Young & Rubicam etc released a public document where they laid out the PACT (Positioning Advertising Copy Testing) Principles on what constitutes a good copy testing system. According to PACT, a good copy testing system is one that meets the following criteria: 1.Provides measurements which are relevant to the objectives of the advertising 2.Requires agreements about how the results will be used in advance of each specific test. 3.Provides multiple measurements – because single measurements are generally inadequate to assess the performance of an advertisement/ 4.Based on a model of human response to communications – the reception of a stimulus, the
comprehension of the stimulus and the response to the stimulus. 5.Allows for consideration of whether the advertising stimulus should be exposed more than once. 6.Recognizes that the more finished a piece of copy is, the more soundly it can be evaluated and requires, as a minimum, that alternative executions be tested in the same degree of finish. 7. Provides controls to avoid the biasing effects of the exposure context. 8.Takes into account basic considerations of sample definition. 9.Demonstrates reliability and validity.
Contents •
1 Four Types of Copy Testing Scores ○
1.1 Report Card Measures
○
1.2 Diagnostic Measures
○
○
1.1.1 Obstacles
1.2.1 Obstacles
1.3 Non-Verbal Measures
1.3.1 Obstacles
1.3.2 Solutions
1.4 Moment-by-Moment Measures
1.4.1 Obstacles
1.4.2 Solutions
•
2 The Future: Seven Trends
•
3 Relevant Terms
•
4 Copy Testing Companies
•
5 References
[edit] Four Types of Copy Testing Scores There are four general themes woven into the last century of copy testing. To understand how the different types of measures relate to one another, see the heuristic advertising model here Ameritest TV Ad Model. [edit] Report Card Measures The first theme is the quest for a valid, single-number statistic to capture the overall performance of the advertising creative. This search has spawned the creation of various report card measures. These measures are used to filter commercial executions and help management make the go/no go decision about which ads to air. (Young, p. 7). The predominant copy testing measure of the 1950s and 1960s, Day-After Recall (DAR) was interpreted to measure an ad’s ability to “break through” into the mind of the consumer and register a message from the brand in long-term memory. (Honomichl) Once this measure was adopted by Procter and Gamble, it became a research staple. (Honomichl) In the 1970s and 1980s, after DAR was determined to be a poor predictor of sales, the research industry began to depend on the measure of persuasion as an accurate predictor of sales. This shift was led, in part, by researcher Horace Schwerin who pointed out, “the obvious truth is that a claim can be well remembered but completely unimportant to the prospective buyer of the product – the solution the marketer offers is addressed to the wrong need.” (Honomichl). As with DAR, it was Procter and Gamble’s acceptance of the persuasion measure (also known as motivation) that made it an industry standard. Recall scores were still provided in copy testing reports with the understanding that persuasion was the measure that mattered. (Honomichl)
The 1970s also saw a re-examination of the “breakthrough” measure. As a result, an important distinction was made between the attention-getting power of the creative execution and how well “branded” the ad was. Thus, the separate measures of attention and branding were born. (Young, p.12) [edit] Obstacles In the 70s, 80s, and 90s, tests were conducted to validate a link between the recall score and actual sales. For example, Procter and Gamble reviewed 10 year’s worth of split-cable tests (100 total) and found no significant relationship between recall scores and sales. (Young, pp. 3-30) In addition, Wharton University’s marketing guru Leonard Lodish conducted an even more extensive review of test market results and also failed to find a relationship between recall and sales. (Lodish pp. 125-139) Harold Ross of Mapes & Ross found that persuasion was a better predictor of sales than recall. (Ross pp.13-16) [edit] Diagnostic Measures The second theme is the development of diagnostic copy testing, the main purpose of which is optimization. Understanding why diagnostic measures such as attention, brand linkage, and motivation are high or low can help advertisers identify creative opportunities to improve executions. (Young, p.7) [edit] Obstacles Different approaches have been developed by research companies to determine the report card measures of attention, brand linkage, and motivation. For example, Unilever analyzed a database of commercials “triple-tested” using the three leading approaches to the measure of branding (Ameritest, ASI, and Millward Brown) which shows that each of the three is measuring something uncorrelated with, and therefore different from, the other two. (Kastenholtz, Kerr & Young). [edit] Non-Verbal Measures
The third theme is the development of non-verbal measures in response to the belief of many advertising professionals that much of a commercial’s effects – e.g. the emotional impact – may be difficult for respondents to put into words or scale on verbal rating statements. In fact, many believe the commercial’s effects may be operating below the level of consciousness. (Young, p.7) According to researcher Chuck Young, “There is something in the lovely sounds of our favorite music that we cannot verbalize – and it moves us in ways we cannot express.” (Young, p.22) [edit] Obstacles In the 1970s, researchers, such as Herbert Krugman sought to measure these nonverbal measures biologically by tracking brain wave activities as respondents watched commercials. (Krugman) Others experimented with galvanic skin response, voice pitch analysis, and eye-tracking. (Young, p.22) These efforts were not popularly adopted, in part, because of the limitations of the technology as well as the poor cost-effectiveness of what was widely perceived as academic, not actionable research. Solutions In the 1990s, the Picture Sorts were created as a method of deconstructing a viewer’s dynamic response to the film on multiple levels. A Flow of Attention graph, as one example of a Picture Sort, measures how the eye pre-consciously filters the visual information in an ad and serves both as a gatekeeper for human consciousness and as an interactive search engine. More mainstream than the biological measures, Picture Sorts have been used extensively for on-line ad testing and, because they are not language-dependent, have been used around the world by major advertisers as diverse as IBM and Unilever. (Young, p.24) Example of Ameritest Flow of Attention Graph More recently, research companies have started to use psychological tests, such as the Stroop effect, to measure the emotional impact of copy. These techniques
exploit the notion that viewers do not know why they react to a product, image, or ad in a certain way (or that they reacted at all) because such reactions occur outside of awareness, through changes in networks of thoughts, ideas, and images. [edit] Moment-by-Moment Measures The fourth theme, which is a variation on the previous two, is the development of moment-by-moment measures to describe the internal dynamic structure of the viewer’s experience of the commercial, as a diagnostic counterpoint to the various gestalt measures of commercial performance or predicted impact. (Young, p.7) In the early 1980s the shift in analytical perspective from thinking of a commercial as the fundamental unit of measurement to be rated in its entirety, to thinking of it as a structured flow of experience, gave rise to experimentation with moment-bymoment systems. The most popular of these was the dial-a-meter response which required respondents to turn a meter, in degrees, toward one end of a scale or another to reflect their opinion of what was on screen at that moment. PDF [edit] Obstacles Unless the dial-a-meter is calibrated by normalizing the data to each individual’s reaction time, the aggregate sample data will be spread across many measurement intervals. Second, dial-a-meters contain an uncertainty range around which moment is actually being measured because of differences in respondent response times. Relatively little has been published to validate dial-a-meter diagnostics to traditional measures of overall ad performance such as recall and persuasion. PDF [edit] Solutions In the 1990s, the Ameritest Picture Sorts shifted the frame of measurement from clock time (the dial-a-meter approach) to the “subjective time” of experience which is tied to the rate of information flow in the film, or the ad’s visual complexity. Instead of providing a rating whenever the alarm rings, respondents
rate a Picture Sort image only when the mood, message, or image changes significantly. The data results are clear, easy to understand, and visually appealing. (Young, p. 23) Examples of an Ameritest Flow of Emotion Graph can be seen in The Advertising Research Handbook, (Young, p. 202) and here [2] in Exhibit 2. In addition, the dial-a-meter’s single-scale limitations are overcome with a set of moment-by-moment measures in three dimensions: wiktionary: Flow of Attention Flow of Attention which measures the memorability of each moment, Flow of Emotion which measures the positive or negative emotional response to each moment, and Flow of Meaning which measures how well the brand’s strategic values are being communicated in each moment.