Farmers' Exchange Experiment Design Team: Matt Gedigian, Jessica Santana, Joyce Tsai
Project: Farmers' Exchange
Sessions: 4/12 7:00-9:45 • The Berkeley team further discussed implementation and brainstormed experiment design. 4/13 3:00-3:30 • Matt, Joyce, and Neil discussed the experiment design and the interactive prototype over the phone.
Contributions: All: Brainstormed experiment design, including qualitative and quantitative methods, what users to recruit, where to test, the format of the test. Matt: Control conditions, experiment variants Jessica: Quantitative and qualitative valuation, tasks Joyce: Control conditions, experiment variants, evaluation
Tasks The primary goal of our experiment with this prototype is to determine how farmers react to the system components, namely the phone tree structure and the search function. We have simplified our prototype to test these factors, removing extraneous options. Testing this stripped-down version will allow us to get feedback on the basic design and evaluating two different designs. Although we have designed a more comprehensive system capable of handling some edge cases, we chose not to test this because it requires more complex scenarios for the test subjects. We have also added a second search structure that allows the user to browse questions and answers based on topic hierarchy. By allowing the user to test both the hierarchical browsing and the keyword search, we can identify user preferences and areas for improvement. After validating the basic design and determining preferences, we can pursue refinements which enhance features and usability in future work.
Our other goal in designing these tasks is to learn more about the farmer's mental model in approaching the system. We have incorporated open-ended tasks as well as pre-determined scenarios to learn what the farmer would expect to do in each case without external guidance. In this manner, we can observe the questions that farmers ask, whether or not they provide enough information when posing the question or if they will more often need to follow up with more information, and whether or not the answer provided will suffice or if the farmer will need more information. We will have the participants complete an informed consent form at the start of the session. Before the first task, we will briefly describe the system to the farmer to appropriately set their expectations. We will then ask the farmer a series of short questions about demographics and how they currently shared and receive agricultural advice (which crops they produce, how long they've farmed, where they are from, where they get most of their agricultural advice, how they share advice and in what cases). We will explain that they are to complete the tasks on their own and that we will not intervene or provide assistance. Each participant will perform all four tasks. We will ask questions about each task immediately after it is completed, as well as a set of overall questions at the end. TASK 1. Leave a question. Proposed scenario: Your crops [we can personalize this based on the farmer] are wilting and you aren't sure why. You need to find out what could be the cause. Call the Farmers' Exchange phone number and leave a question about this problem. TASK 2. Retrieve an answer. Farmers' Exchange will call to notify you that your question has been answered. Follow the system's instructions to retrieve your answer. TASK 3. Browse for a specific topic using keyword search. Proposed scenario: You are searching for information on which pesticides you can use on your organic crops. Call the Farmers' Exchange phone number and search for related questions and answers. TASK 4. Browse for a specific topic using hierarchical search. Proposed scenario: You are searching for information on which pesticides you can use on your organic crops. Call the Farmers' Exchange phone number and browse for this topic.
Control Conditions Since there is not existing technology server the needs of our target users, there is no obvious control condition. One possibility would be to compare our service against a farmer directly asking an advisor, interpreter, or fellow farmer for information. This is problematic for a few different reasons. This is a large degree of variability in the existing methods of getting information, so we would have to essentially choose how well we want the competing method to perform. In our test setting, we can't simulate the process taking several days (leaving messages, playing phone tag, asking follow-up questions). For these reasons, we chose not to include a control condition, and instead we will use A/B testing (to compare different designs) and ask subjects to compare these methods to their existing alternatives. Our experiment variants are: 1) if the users try asking first, then browsing or vice versa, and 2) comparing browsing-via-search with browsing-via-navigation. We are switching the order for who asks first and who browses first so that we can get a set of users who are effectively new to the system for both ask and browse. For browsing-via-search and browsing-via-navigation, we will test both versions of browse on all users. This is based on research that demonstrates that users who see more than one version of a prototype are
more likely to critique the prototype and offer negative opinions. Since the browse functionality of the phone system is the least familiar, we want to be particularly sure of getting the interaction design for it right.
Users We will recruit at least 6 farmers to test our phone interface. Although the Farmers' Exchange system is designed for low-English-proficiency farmers, we will do our initial testing on English-speaking farmers. However, like our target audience, the test users will also be small farmers who do not use the internet frequently. Testing with LEP farmers would necessitate hiring or otherwise finding an interpreter, and because this is our first round testing with actual users, we want to have a more polished version of the system before spending resources on bringing in interpreters. Also, one of our partner groups, CAFF (Community Alliance for Family Farmers), recently lost their Hmong interpreter. We plan on recruiting test participants via the Small Farm Program and CAFF. We will not be offering monetary compensation, although we will bring cookies. Since each of our farmer testers will be completing four tasks, we anticipate some learning fatigue. However, because we anticipate each task taking less than a minute to complete, the learning fatigue should not be very severe. Furthermore, as our farmers will be using the system when they are working or at the end of the day, when they are exhausted, we would like to know how their usage differs when they have just started testing versus when learning fatigue has set in. If our advisor and interpreter phone interface is ready, we plan on demoing it at a FarmLink meeting on 4/20. However, for the scope of this assignment, we will be focusing on testing the farmer interface.
Evaluation The success of our system depends on how quickly a user can complete their task and the amount of value this task adds to the user's work. If a user has to repeat a prompt or has difficulty finding their destination, the user will take more time to complete the task. Here we equate non-completion with infinite time. We will measure time-based user behavior with a stop-watch and will also have the system log each conversation as a back-up. We will measure satisfaction using survey techniques (binary and gauged response) as well as open-ended questions.
Quantitative Metrics Completion Rate: • How many participants complete each task. • Measured by "Proportion Tasks Completed per User" and "Average User Completion Rate" Time Spent on Each Task: • How long each participant takes to complete each task. • Accounts for consistency of a participant in taking a longer or shorter amount of time on all tasks.
Errors: • •
Number of invalid keys pressed or invalid commands. Number of lacks of response where response is required, resulting in repeated prompts. • Accounting for noise, including side comments. • Number of unrecognized voice commands. Assists: • Number of requested assists. The quantitative metrics we will analyze include completion rate, time spent on each task, error rate, and assist rate. Completion rate is calculated based on how many participants complete each task. Completion rate consists of both the proportion of tasks completed per user and the average user completion rate. We will also ask participants questions in the qualitative survey about tasks that took longer to complete -- why it took longer and how we might improve the system. Time spent on each task is calculated based on how long each participant takes to complete each task. Some participants may dedicate more time to each task than other participants. We will weight the results to account for consistant participant dedication to all tasks. Error rate is calculated based on the number of invalid keys pressed or invalid voice commands, the number of lacks of response where response is required (resulting in repeated prompts), and the number of unrecognized voice commands. These errors will be divided into participant-based error and system-based error. We will account for noise, such as side comments, in our analysis. Finally, assist rate is calculated based on the number of requested assists. We will explain to the participant before the tasks that we are unable to assist them, but we still anticipate participants requesting assistance if they become confused. We will signal to the participant that we cannot assist them, and will stop the experiment if they are obviously unable to complete the task. Any signal from the user requesting assistance will be counted as an assist.
Qualitative Metrics Reality Degree of Experiment • Was there anything you would change about the question you asked? • On a scale of 1 to 5, with 1 being least important and 5 most important, how important would you say the question you asked is to your work? Why? Work Flow Disruption Level • Would you call the Farmers’ Exchange phone number while you were working outside, when you are not working, both, or neither? Why? • What activities can you imagine yourself doing when you decide to ask a question like this? Satisfaction/Engagement/Interest • What kinds of questions would you have after you received the answer to your question? How would you look for answers to such questions? • How can we make Farmers’ Exchange easier to use? • What did you like most about the system? Frustration • What did you like least about the system? Likelihood to Use Again • Would you call this number again?
• Would you use the browse feature again? In what instance can you imagine using it? • Which browse feature (keyword search or topic browse) did you prefer? Why? Likelihood to Tell Others • Would you tell your friends and colleagues about Farmers’ Exchange? Why or why not? The qualitative metrics we will analyze are packaged into a qualitative survey that we will initiate with each participant after each experiment. These questions can be categorized as the degree of experiment reality, the work flow disruption level, satisfaction, frustration, likelihood to use again, and likelihood to tell others. The degree of experiment reality will alert us to any misrepresentation of reality in the experiment. Our aim is to mimic a real scenario as much as possible. If we fail to incorporate a significant facet of reality, the results of our experiment may fail to indicate likely results in actual use of the system. We calculate the degree of experiment reality based on the realistic nature of the questions posed to the system. The level of work flow disruption indicates how well the system fits into the participant's lifestyle. Our goal is to design a system that fits seamlessly with the participant's routine. The participant may reject the system if the participant's perception of the value added by the system is less than the level of work flow disruption. We calculate the level of work flow disruption based on the environment in which the participant can imagine using the system [this is after testing the system]. Engagement or interest in the experiment indicate the participant's level of satisfaction with the prototype. We measure satisfaction based on the amount of detail the participant provides in how they would continue to use the system. Frustration is measured by the amount of detail they provide in how they would improve the system. Likelihood to use again, more specifically than level of satisfaction, indicates the participant's willingness to use features again. Likelihood to tell others about the system indicates the participant's identification with the system. If the farmer is unwilling to be associated with the system, this alerts us that the system has significant failures that must be remedied. We would follow up on this question to determine why the farmer is not satisfied with the system. In addition to the survey questions, we may also ask qualitative questions after each task to clarify the participant's responses.
Conditions of Success We are using a composite measure of success which combines values from the different metrics. Although we are interested in the results from all our metrics, the three most important ones we will be looking at are high satisfaction, high likelihood to use again, and high likelihood to tell others, as these three seem most correlated with how widely Farmers' Exchange will be adopted. We currently have no numerical benchmark, as we do not know what the average is and therefore cannot compare. Our first experiments will most likely provide numbers for benchmarking, and we will compare later experiments with the first to see if there is any improvement.
Supporting Materials • Qualitative survey • Quantitative checklist: One team member will be timing the user according to the metrics on the checklist • Timer for quantitative data • Video/audio recording equipment: We plan on recording the user's face and recording his or her entire interaction with Farmers' Exchange • Script with the scenario, if the user is ask-first or browse-first, and scenarios for helping confused users • Informed consent/permission forms • Phone with speaker phone: We will put the user on speaker phone so the entire team can hear the same prompts the user does