A Methodology for the Calculation of Usability Economics Applied to Voice Activated Services Juan Carlos Luengo Patrocinio
Sebastián Sánchez Prieto
Daniel Tapias Merino
Teléfonica Móviles–Universidad de Alcalá Distrito C, Edf. Este 3, Pl. 3ª Ronda Comunicación s/n, 28050 MADRID
[email protected]
Universidad de Alcalá, Esc. Politécnica Carretera Madrid-Barcelona Km. 33.6 28871 Alcalá de Henares, MADRID
[email protected]
Telefónica Móviles Distrito C, Edf. Sur 1, Pl. 8 Ronda Comunicación s/n, 28050 MADRID
[email protected]
Pedro Concejero
Juan José Rodríguez
Teléfonica Investigación y Desarrollo Emilio Vargas 6, 28043 MADRID
[email protected]
Teléfonica Investigación y Desarrollo Emilio Vargas 6, 28043 MADRID
ABSTRACT In this paper we present a first approach to the methodology we have developed to calculate the economics of voice activated services. This methodology must be validated with the experiences ongoing in definition.
Keywords Usability, Vocal Access, Mobile Services, Economics
Area Usability, Vocal Access, Mobile Services, Economics
1. INTRODUCTION Usability has been widely recognized as one key aspect in the design of all kind of elements. Even when this concept has been deeply connected to technical issues, in fact usability must be applied to all fields. In every field of our daily life, usability is something we talk about, mostly when we miss it. Nobody talks about an easy to use tool, but when you have difficulties dealing with a concrete device, immediately the question “who designed this?” arises. And when reducing the area to Operators and more technological services, there is an important deviation in the usability community, to mainly consider this matter when talking about web pages and software development. Even when these areas were the first used to evaluate and determine the importance of the usability, there are a lot of other areas in which deeper work is required, in order to clarify the important role that usability plays in the success of a concrete service. On the other hand, modern companies need a clear and simple “cost model”, to understand the return on investment (not only the ROI as financial indicator) of each euro that is used inside the company. This is why it is important to have a clear perception of the return that usability investment gives to the companies. And the word investment is not randomly selected, since it is important to understand that usability is going to have a return, and that it is not another expenditure to afford. The works about Usability ROI began several years ago, but as mentioned before, mainly focused on web usability [1]. In this paper, we focus on creating a methodology to allow an economical evaluation of the benefits of usability.
2. VOICE ACTIVATED SERVICES The Telefónica Group has invested more than 20 years in research and development of speech technology and in its use in real services over the telephone. During all this time, the voice activated services have experienced an important evolution. And, if in the past the design of voice user interfaces (VUIs) was often driven by the need to overcome the limitations of the underlying technology, currently, since the technology has improved, more effective and user friendly designs have become the focus of the design process. Therefore, usability in speech applications is not simply a matter of the core technology, but mostly of tools and skills in developing applications, i.e.: ease of use is the result of systematic usability activities carried out through the project lifecycle. This methodology of service development applied to speech applications together with our current state-of-the-art speech technology has made it possible to implement more than 3,000 speech recognition and text to speech conversion telephone channels that are providing our customers with different services worldwide. The voice portal (404) and the information service (11818) with a vocabulary of about 10,000 words are good examples of services of this kind in Spain.
2.1 The Design of Voice Activated Services Thus it is important to have a clear idea of the requirements that this kind of services have, in order to properly design them. Considering the public information about this item [2], as more confidential information, related to the works carried out inside Telefónica Móviles España to improve its Voice Portal Service [3], there are several general recommendations that must be considered when developing the service, in order to get an appropriate result in the usability of these services. From these two previously mentioned references, we could extract the different aspects that must be considered when designing speech interfaces. It is not the objective of this paper to cover in detail all the questions that should be carefully studied before implementing the interface, but we think it is important to put them on the table, since this will give skeptics a closer perception of the amount of work required to properly define an interface. As mentioned before, and as we will mention several times during this paper, usability is something we talk about when we miss it. Thus the main points, that are here briefly presented, are widely revised in [2],
1) Physical properties of Speech Machines. Several parameters have to be controlled and properly defined before a vocal application is ready to be offered to the public. The loudness and noise (including the volume), the sound quality of spoken output (covering selected sex for the output voice), timing values (almost for every response a timing value has to be defined to avoid incorrect behaviour of interaction), proportion values of speech and silence, rhythm and rate. 2) Adequate selection of the wording. The voice, person, tense and mood are important, together with an appropriate selection of the “opening greeting” 3) Prompts. The design of the prompt, the type of prompt used, and the way they are included in the general flow of speech application. 4) Feedback. The types of feedback (trying to avoid literal feedback, replace apology and blame with prompts) and the way this feedback is included in the application flow are other points to consider. In addition to this, and as presented in [4], Telefónica Móviles España is working very hard in evaluating how important is the vocal access to mobile services, and which is the best way for developing these vocal interfaces. For that study, the Yavoy (commercial name in Telefónica Móviles España for the Ring Back Tones service) was selected, and a pilot was carried out in order to clarify these points. But, as it is clear in other areas too and was mentioned in [4], usability is not the only parameter to fine-tunning. There are other actions that we must carry out in order to encourage customers to use services. If we take as an example the ring back tones service, there are some other important things to increase the purchasing of tones by the users: 1. Update regularly the list of available ring back tones, (musical hits, jokes or monologues). 2. Launch marketing campaigns to make the service well known among the potential users. Even when this is a clear example of a “viral service”, good marketing campaigns always have influence on the services usage. That is, an excellent interface is not enough to boost the service if the content is not interesting and/or potential users do not know the existence of the service. However, assuming that all these conditions are granted, usability is essential.
3. GENERAL ECONOMICAL CONSIDERATIONS ABOUT USABILITY Nowadays, almost every aspect of our daily life can be quantified and has an economical definition. Even when this is a very arguable assessment, the truth is that this is the way almost every sector works. Moving into the business arena, this statement is even more important. And this need leads us to the objective of understanding all the economical implications that usability has. As a basic premise when talking about usability economics, and even when everybody has this idea in mind, we think important to clearly state the following: “Usability will always have a positive impact in the Company Balance Sheet”. But the activities around this statement go through clearly identify which are those impacts, which degree of importance have and, and as the main objective of our work, try to quantify them.
In the following sections of this paper we make a first approach to it. Initially, giving a classification of these impacts and, after that, explaining how these impacts could be quantified.
3.1 Qualifying the different effects of usability Our initial classification considers different levels of importance in the positive impact that usability generates. Depending on the type of service when speech application is used, different concepts should be considered, as explained when defining those degrees. First Degree Here there is a clear difference in the impact to consider, depending on the type of service considered (same differentiation is presented in [2]). If we are talking about Cost Saving Applications (CSA), in this case the impact to evaluate is the reduction of the cost of those concrete activities by using this new system. But if talking about Value Added Applications (VAA), direct increment in the usage of the service can be measured, generating new revenues. In this second case, a new differentiation has to be made, this time considering the tariff scheme defined. And this is something that can change with time during the lifetime of a service, but is much more difficult (not to say impossible) that a VAA becomes a CSA. Second Degree At this level two concepts must be included: Churn Reduction and Innovation Perception or Technological Leadership. The first concept is widely used and, even today, is quantified in most operators, considering the cost that getting a new client has, and the even bigger cost that recapturing an old gone client has. All these features and interfaces, that make simpler the usage of the services, and that allow the customer easily obtain the information required, obviously act as an “exit wall”. The second concept is included here as a second degree effect, since our experience shows that this perception is very important when talking about technology related products, even when some people consider it as a third degree effect (in fact, as part of this third degree concept, as we will see when talking about the following degree). Third Degree In this level Branding Perception or Corporate Reputation is included. This parameter is always affected by every action and decision that a Company takes. But the effect of these decisions are normally considered for bad, not including the positive effect that these kind of new features have in the brand. A first approach and more information about this concept can be found in [7], where differente methodologies to evaluate and quantify the Corporate Reputation are presented. For this approach, we have selected the RepTrack™ as the model to use, since it is the world’s first standardized and integrated tool for tracking corporate reputation internationally, and it uses dimensions statistically independent. Basicly, it is a model that considers the following parameters for the evaluation: Products/Services, Innovation, Workplace, Governance, Citizenship, Leadership, Performance. Each one of this parameter has a weight, being a general assumption, that could be probably slightly modified depending on the sector and concrete company, the following: Product/Services - 33%, Innovation – 10%, Workplace – 17%, Governance - 15%, Citizenship – 10%, Leadership – 10%, Performance – 7%
And if these new feature is properly explained to general public, this third degree effect could become a second degree effect. Another less probable effect is the generation of interest in possible new customers that begin to consider the possibility of becoming our customers when knowing about this service. Obviously this case is more related to marketing campaigns, but usability plays an important role, since it can allow a “trial” user to easily get into the service, pushing him/her to become a customer. At this level we can consider too the concept presented in [6] as Internal Social ROI that means the importance that, for Usability Group, a clear recognition of the importance of their work by their Company colleagues has. This obviously will improve the functioning of the group and, in the same sense, will improve the results of internal satisfaction surveys, but its influence is too much difficult to evaluate. Normally, when defining the return that a concrete usability investment has, only First Degree effects are considered. Being a CSA or a VAA only matters when considering if the money is saved o earned. But we consider important to move into Second and Third Degrees, since they have the positive impact mentioned, and thus it could be quantified, being this probably the most difficult part of this work.
3.2 Quantifying the previously qualified aspects There is an important statement, included in [5], “adding speech recognition capabilities to an existing touch-tone IVR platform increase usage by 20% to 60%.” And this is considering that the sentence only says “adding speech recognition capabilities”, and not “adding adequately designed speech recognition capabilities”. Probably an incorrect design will increase usage by 20% and a correct design will increase usage by 60% (this is something not mentioned in the study, just an idea). Thus, only because of this statement usability in voice activated services should be considered as a major investment. But, as it was previously mentioned, this statement only covers what we called First Degree effects. And is the quantification of previously indicated Second and Third Degree effects the most recurrent activity that usability managers have to afford when explaining their team activities to Top Management. When talking about “quantification”, it is necessary to generate a model and/or a mathematical formula to know how to translate into concrete return numbers the results obtained using the methodology that is going to be explained in the next section. There are different ways of quantifying SD and TD effects (FD effects have a clearer way of quantification), and the one selected should fit with the general perception that the company has of these cases.
4. METHODOLOGY TO EVALUATE The work in progress in Telefónica Móviles España and Universidad de Alcalá has the target of generating an economical model, or at least and as a first step, a methodology that could allow usability and/or designers teams to evaluate the impact that usability improvements have in the services. And if this model or methodology could be applied to several services, we finally could get a range of values, and used when estimating in advance economical importance of usability. Initially, we propose a methodology, and work is required to
obtain the mathematical model, which first approach is presented in this paper.
4.1 First Definition of Methodology Here we indicate the steps required to properly obtain the required information, following the information presented in [8] about the way the Usability Lab has to be designed, and in [9] and [10]. Another point about the evaluation of these features is that is very difficult to obtain valuable data from Focus Groups that have not used the service. This premise is considered when defining the methodology.
4.1.1 Step1: Define new service Define a service pilot considering the usability comments made during design process (new service). Maintain too a pilot of the service as it was originally defined (old service).
4.1.2 Step2: Create Parallel Groups With same profile members, and offer “old service” to Group A and “new service” to Group B (Note: if talking about the improvements created for an existing and commercially launched service, then information about the “old service” can be obtained from normal users). The creation of these groups could follow the scheme of a Focus Group in the beginning, and only if required a deeper analisys could be considered, considering more groups with more members each one.
4.1.3 Step3: Make tests It is necessary to make two kind of tests; Oriented Tests (identifying several concrete tasks that must be covered by the user) and General Tests (not defining concrete tasks, but indicating that it is necessary to use the service a minimum time per day/week/month). Once we have listed the main requirements to be considered in the creation of the questionnaire, then the next steps should be covered. Step 3.1: For Oriented Tests, create a questionnaire to fill out just when finish the test. This could be formed by “closedended” questions, that provide respondants with a list of the possible answers. Step 3.2: For General Tests, create a questionnaire to fill out whenever the customer wants. In this case, a combination of “closed-ended” and “open-ended” questions could be used. The “open-ended” questions allows the respondant the possibility to explain his perception of the service using his own words. Of course the evaluation of these “open” tests is much more costly (in time and resources), but we think it is very useful to confront the responses with a previously created list of “key words” that will allow us to extract the perception of the user about the service and, by far, of the Company. This list could become two lists, one with possitive terms and other with negative ones, and the number of coincidences will give information about the position of the user about the service. Step 3.3: If possible, that is, whenever these Tests (Oriented or General) are carried out in the Company premises, record and save their reactions when using the service in the Usability Lab, to study them properly. Encourage the people selected in the groups to talk loud about any thought, idea or question they consider necessary to mention about the process. Probably at the beginning of the test, users will not be totally open to this,
Creation of the Group
Definition of the task
Recording of
Oriented Test and
the execution
General Test
Execution of the task Recording of the fullfilment
Fullfilment of the questionnaire
Final Report
Questionnaire to Obtain information Figure 1.- Graphical representation of the procedure to follow
but as soon as they go into the test they will relax and make more useful observations. In this case, it is necessary an adequate definition of the questionnaires to fullfil. As indicated in [11], a wrong definition of the questionnaires will give totally deviated information, and talking about economical concrete questions this is not an option. Presented in [11] too, there are some general rules that must be followed to assure the questions defined obtention of enough informati Assure that each question ask only a single question. Avoid double-barrelled questions that would give uninterpretable answers. Give all the alternatives for each “closed-ended” question. If the answer is biased for the use of some “positive or negative” words, this could favor one option and affect the results. For some inconvenient questions, that could obtain an answer not socially acceptable, it could be necessary to “counterbias” the question, that is to write the question in such a way that respondants find easy to answer properly. It is necessary to avoid words with “vague” meaning, or “slang” words that could have different or offensive meaning. Figure 1 gives a graphical representation of the proccess, considering the steps previously mentioned. Now all this information has to be evaluated and processed, and when comparing the results between “old service” and “new service”, probably enough information about the quantification of usability improvements can be obtained. In order to properly do that, it is necessary to use a measurement scale. There are three commonly used category scale formats, as presented in [11]. Semantic differential scales. A sequence of unlabeled categories between two bipolar adjectives.
Stapel scale. Only one single adjective and a range of values from –X to +X. Likert scale. A number of statements are developed, to measure different aspects that might influence the respondent’s overall attitude toward the service. These scales are generally used, but it is quite clear that it is necessary to asses scale validity and reliability for the concrete case of use of the scale. And thus the scale accuracy refers to the degree to which a measurement scale is free of both bias and random error. Our main preference is the usage of the Likert scale, but the other scales mentioned could be used with good and interesting results too.
4.2 How the Methodology fits with the effects listed. Once the list of FD, SD and TD to consider is defined, and after explaining how the quantification of those effects can be made. it is necessary to identify how the required information for the quantification can be extracted from the activities covered by the methodology defined. After a detailed study of the diferent sources that could be used, and considering how important and realiable these sources are, Table 1 shows which are the sources that have to be considered when quantifying. We have to diferentiate between Primary Sources (PS), those mandatory to obtain valuable information about that concrete Degree, and Secondary Sources (SS) those interesting to add granularity and information to the result, but not mandatory. From the following table we see that, in general, all the different Degrees could be affected by information obtained from all the different Steps, but the fact is that some are primary sources (PS) and some are secondary sources (SS).
Step/Degree
First
Second
Third
3.1
PS
PS
--
3.2
PS
SS
PS
3.3
SS
SS
SS
Table 1.- Sources of information and where they can be used
4.3 First Approach to the Final Model. After identifying these relationships, now it is time to define which is the way to combine all the information to obtain a clear and simple model to finally obtain a clear measure of the ROI obtained from usability decissions and activities. This activity is probably the most delicate, and thus the less evident, part of the work, since the translation between obtained general data and final economical results can have a lot of different ways to do it. Our first approach, that have to be confirmed and confronted with several real situations to be totally accepted, is based in the experience. Thus, considering these limitations, our first approach is covered by the following formula, which has two versions, depending on if we are talking about CSA or about VAA : ROI = ∆E(VAA) + ChR * VChP + IP * VIPP + BP * VBPP Where ∆E(VAA): Earning increment obtained in one year due to the improvements in service usability. ChR: Churn Reduction; VChP: Value of Churn Point. IP: Innovation Perception; VIPP: Value of Innovation Perception Point. BP: Branding Perception; VBPP: Value of Branding Perception Point. This formula changes when considering ∆S(CSA) instead of ∆E(VAA), being ∆S(CSA): Saving increment obtained in one year due to the improvements in service usability, but the rest of the parameters would be the same. The information obtained from the questionnaires mentioned in the previous points is the base of the calculation of ChR, IP and BP. The way the final value is obtained will depend on the information obtained, but a general model is being defined in the proccess of this work. The value of VChP, VIPP and VBPP is something that each company has to define for each case and moment, since it is clear that the value of one point of churn could be different depending on the market share, general situation and other factors that could vary in time.
5. CONCLUSIONS AND FUTURE WORK The importance of usability in the design and development of products and services has grown in the last years. In the case of voice activated services, usability becomes crucial when the application reaches a minimum level of complexity. Additionally, as we have shown, making services easy-to-use, require the inclusion of technical and usability considerations in the design and development process.
This paper shows which are the effects that usability has in the Company Economics, and defines a methodology that must be used to properly obtain the information of the benefits obtained with an adequate design of the services. This methodology should lead us to a model that allows designers a quick evaluation of usability benefits. The first approach of this final model has been presented, but it is necessary to make the translation from the information obtained from the questionnaires to the values required in the formula presented. The other pending work is to use this model in different real cases, and then make a model assessment and confirm its value. Short term future work must cover in deep this quantitative approach, making different studies following the proposed methodology and evaluating the information. From all this information, a concrete model should be extracted and applied in other different applications.
6. REFERENCES [1] J. Nielsen, S. Gilutz, “Usability Return on Investment”, Norman Nielsen Group, http://www.NNgroup.com/reports/roi (January 2003) [2] Bruce Balentine, David P. Morgan, “How to Build a Speech Recognition Application; A Style Guide for Telephony Dialogues”, Second Edition, EIG Press (2001) [3] “Design Recommendations for Telefónica Móviles Voice Portal”, EIG and Infospeech, December 2003. [4] J.C. Luengo, C. Lázaro, D. Tapias, P. Concejero, J.J. Rodríguez, “Usability Considerations in the Design and Development of Voice Activated Services”. ICIN 2006 (May-June. 2006), Bourdeaux (France) [5] Donna M. Fluss, “The Practical Guide to Speech Recognition. Using Speech Recognition to Decrease Cost and Increase Revenue” DMG Consulting (2007) [6] Randolph G. Bias, Deborah J. Mayhew, “Cost-Justifying Usability: An Update for the Internet Age”, 2nd Edition, Elsevier (2005) [7] Cees B.M. van Riel, Charles J. Fombrun, “Essentials of Corporate Communication”, Routledge, (2007) [8] J.J. Rodríguez, P. Concejero, S. Diego, J.A. Collado, D. Tapias, A.J. Sánchez, “Laboratorio de Usabilidad de Telefónica Móviles España”. Boletín de Factores Humanos de Telefónica I+D No 27 (Agosto 2005) [9] Joseph S. Dumas, Janice C. Redish, “A Practical Guide to Usability Testing”. Revised Edition, Intellect Books, (1999) [10] Jeffrey Rubin, “Handbook of Usability Testing: How to Plan, Design and Conduct Effective Tests”. John Wiley & Sons, (1994) [11] Melvin Crask, Richard J. Fox, Roy G. Stout, “Marketing Research: Principles and Applications”, Prentice Hall, (1995)