My Masters Thesis

  • Uploaded by: Micheal Axelsen
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View My Masters Thesis as PDF for free.

More details

  • Words: 24,909
  • Pages: 103
The University of Queensland Faculty of Business, Economics & Law Department of Commerce

Information Request Ambiguity and End User Query Performance: Theory and Empirical Evidence

A Thesis submitted to the Department of Commerce, the University of Queensland, in partial fulfilment of the requirements for the degree of Master of Information Systems.

By Micheal Axelsen

15th June 2000

Supervisor: Dr Paul Bowen

Acknowledgments

I wish to express my appreciation and thanks to my supervisor, Dr Paul Bowen, for his assistance, advice, and patience in the preparation of this thesis. To my mother I offer thanks for making it all possible. I also express sincere gratitude to my wife, Leeanne Klan, whose obstinate patience continues to assist in putting the world in focus.

I also thank workshop participants at Nanyang Technological University in Singapore for their comments and contributions to this thesis.

i

Abstract

The increasing reliance of organisations on information technology and the persistent shortage of IT/IS professionals requires end users to satisfy many information requests by querying complex information systems. Because many business decisions are now based on the results of the end users' queries, information request ambiguity has extensive ramifications for business practices. Where the queries do not match the requirements of the information requests, the business decisions are likely to be fundamentally flawed.

This paper develops a theory of ambiguity in information requests and reports the results of an initial empirical investigation of that theory. The theory identifies seven ambiguities: lexical, syntactical, inflective, pragmatic, extraneous, emphatic, and suggestive. A laboratory experiment with sixty-six participants was used to investigate the empirical effect of ambiguity on end user query performance. End user query performance was measured by the number of total errors in the proposed solution, the time taken to complete the solution, and the end user's confidence in the solution.

The results indicate that ambiguity significantly degrades end user query performance. The seven types of ambiguity were analysed to determine their individual effects on end user query performance. Actual (pragmatic, extraneous) and imaginary (emphatic, suggestive) ambiguities show significant relationships with total errors and duration. In general, potential (lexical, syntactical, and inflective) ambiguities were not significantly associated with total errors or end user confidence. The results should have important implications for consulting firms, for organisations with ad hoc work groups, and for entities that make extensive use of electronic mail for information requests.

ii

Table of Contents

1.

Introduction............................................................................................................................. 1

2.

Information Request Ambiguity and End User Query Performance ..................................... 3 2.1 .A Theoretical Model of Information Request Ambiguity .......................................................... 3 2.2 .The Nature of Ambiguity ......................................................................................................... 5 2.2.1 Potential Ambiguity ..................................................................................................... 7 Lexical Ambiguity........................................................................................................... 7 Syntactical Ambiguity ..................................................................................................... 8 Inflective Ambiguity........................................................................................................ 9

2.2.2

Actual Ambiguity ...................................................................................................... 10 Pragmatic Ambiguity ..................................................................................................... 11 Extraneous Ambiguity ................................................................................................... 12

2.2.3

Imaginary Ambiguity................................................................................................. 14 Emphatic Ambiguity...................................................................................................... 14 Suggestive Ambiguity.................................................................................................... 15

2.2.4 Ambiguity in Practice ................................................................................................ 17 2.3 .Task Complexity ................................................................................................................... 18 2.4 .Theoretical Model Summary.................................................................................................. 19 3.

Methodology .......................................................................................................................... 20 3.1 .Experimental Design ............................................................................................................. 20 3.2 .Experiment Participants ......................................................................................................... 21 3.3 .Assessment of Participant Responses ..................................................................................... 21

4.

Results and Discussion .......................................................................................................... 23 4.1 .Overview of Experimental Results ......................................................................................... 23 4.2 .Regression Analysis............................................................................................................... 27 4.3 .Ambiguity Treatment Multiple Linear Regression Model Results ........................................... 29 4.4 .Multiple Linear Regression Model: Seven Types of Ambiguity .............................................. 31 4.5 .Summary of Results............................................................................................................... 32 4.5.1 Potential Ambiguity ................................................................................................... 34 4.5.2 Actual Ambiguity ...................................................................................................... 35 4.5.3 Imaginary Ambiguity................................................................................................. 36 4.5.4 Complexity................................................................................................................ 37

5.

Implications For Business Practice ....................................................................................... 38 5.1.1 5.1.2

6.

Electronic Mail .......................................................................................................... 38 Personnel Turnover and Work Teams......................................................................... 39

Contributions, Limitations, and Future Research ................................................................ 41 6.1 .Research Contributions .......................................................................................................... 41 6.2 .Research Limitations ............................................................................................................. 41 6.3 .Future Research ..................................................................................................................... 42

References....................................................................................................................................... 44 Appendix A: Experiment Information Requests and Model Answers .......................................... 47 Appendix B: Experiment Instruction Sheet ................................................................................... 52 Appendix C: Command Interpreter Unix Shell Script .................................................................. 58 Appendix D: Experiment Entity-Relationship Diagram ............................................................... 65 Appendix E: Experimental Design ................................................................................................ 68

iii

Appendix F: Error Marking Sheets ............................................................................................... 72 Appendix G: Annotated Corrected Participant Response ............................................................. 75 Appendix H: Pearson Correlation Matrix of Variables................................................................ 77 Appendix I: Analysis of Ambiguity's Effect On Error Type ........................................................ 78 Appendix J: Seven Ambiguity Types Question Assessment Ratings ............................................ 84 Appendix K: Ambiguity Assessment Instrument .......................................................................... 85 Appendix L: Internal Validity of the Experiment ........................................................................ 94

iv

Figures

Figure 1

Types of Ambiguity (adapted from Walton 1996)

7

Figure 2

The Theoretical Model of Ambiguity, Complexity, and End User Query Performance

19

Figure 3

Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the total errors in the participant's response.

25

Figure 4

Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the duration taken for the participant to prepare the response.

26

Figure 5

Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the participant's confidence in the response.

26

Table 1

Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests

17

Table 2

Participant Demographic Information and Descriptive Statistics: Course Background of Group A and Group B

23

Table 3

Participant Demographic Information and Descriptive Statistics: Academic Record of Group A and Group B

23

Table 4

Participant Demographic Information and Descriptive Statistics: Participant Age in Group A and Group B

24

Table 5

Comparative Statistics for all Participant Responses Grouped by Question (Q) and Treatment (T). Note that for T, a = ambiguous, c = clear

25

Table 6

Confidence Rating Transformation to a Numerical Scale

28

Table 7

Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model

29

Table 8

Regression Analysis Results for the General Ambiguity Regression Model

30

Table 9

Regression Analysis Results for the Seven Ambiguity Types Regression Model

31

Table 10

Summary of Analysis' Support for Hypotheses

32

Table 11

Participant Strata Classes

69

Tables

v

1. Introduction

Keen (1993) predicts that innovative applications of information technology will change the competitive landscape to such an extent that fifty percent of companies in some industries may not survive the next decade. This rise of the importance of information technology innovation and application has lead to the increased need for relevant, timely information at the point where that information is used and understood (Conger 1994; Delligatta and Umbaugh 1994; Nath and Lederer 1996).

The demand for information system (IS) professionals vastly overwhelms the available supply for both now and the foreseeable future (Freeman et al. 2000; Rosenthal and Jategaonkar 1995; Australian Bureau of Statistics 1997). Hence, the use of computerised information systems by end users has become compulsory in most business organisations (Cardinali 1992; Athey and Wickham 1995-1996). To provide appropriate, relevant information requires identifying and eliminating ambiguities in communication between the stakeholders or managers requesting information, and the end users querying the information systems.

Traditional structured methodologies reduce ambiguity at the expense of timeliness, flexibility, and learning. The insights that end users can achieve during interactive, iterative query sessions are also of benefit. The need for timeliness, flexibility, learning and end user insights, as well as the shortage of IS professionals, have lead to the general decline of structured reports (Ryan 1993). The use of ad hoc and iterative end user reports has increased (Tayntor 1994). Nonetheless, many end users now use more formalised processes in developing their reports than previously (Conger 1994; Tayntor 1994).

1

Information request ambiguity has potentially real and large impacts on business organisations. An ambiguous information request can result in a report that, although it appears acceptable to the person making the information request, does not contain the desired information. If that wrong report is then used to make business decisions that the correct report would not have supported, then information request ambiguity can cause substantial negative impacts.

This paper develops a theory of the impact of ambiguity in information requests on end user query performance, and tests that theory empirically. It empirically examines the strength and direction of the relationships between ambiguity types (lexical, syntactical, inflective, pragmatic, extraneous, emphatic, and suggestive), complexity, and end user query performance. The current study extends previous work (Suh and Jenkins 1992; Borthick et al. 1997; Rho and March 1997; Borthick et al. 2000) and builds upon the theory of end users' query performance in the tradition of Dubin (1978).

2

2. Information Request Ambiguity and End User Query Performance

Different forms of ambiguity can be present in a natural language information request. The primary aim of this research is to explore the impact of ambiguity on end user query performance. This chapter develops a theory of the relationship between information request ambiguity and end user query performance.

2.1

A Theoretical Model of Information Request Ambiguity

The development of an accurate SQL query by an end user depends on the user's knowledge of the information needed, the database structure, and the query language (Ogden et al. 1986). A lack of skill in any of these three domains will lead to inaccurate SQL queries (Ogden et al. 1986).

A natural language information request requires end users to transform the natural language constructs into the query components consisting of lexical items (Katzeff 1990). End users must conceptualise the information requirement and then mentally map this conceptualisation to their understanding of the database structure. Reisner (1977) proposed a template model for the manner in which users create SQL queries from a natural language information request. Each query's operator components (Halstead 1977) are drawn from a set of known query language components to address the requirements of the natural language information request.

Ambiguity affects the user's interpretation of the information needed. Because information requests are expressed using a natural language, they are ambiguous and uncertain. End users

3

must interpret and analyse the information requests to develop queries that meet the requestors' needs. The end users' uncertainty in determining the required response affects the required cognitive effort because multiple interpretations of the actual information required may be legitimately constructed (Almuallim et al. 1997).

The impact of natural language's seven types of ambiguity has not previously been examined in the context of end user query performance. These seven types of ambiguity are lexical, syntactical, inflective, pragmatic, extraneous, emphatic and suggestive (Walton 1996; Fowler and Aaron 1998). These ambiguities affect the number of legitimate interpretations of the natural language statement of the information request. The information request has "multiplicity of meaning" (Walton 1996).

Tasks that are more complex require increased cognitive effort (Campbell 1988). In the context of database queries, task complexity generally negatively impacts end user query performance (Borthick et al. 1997; Borthick et al. 2000). Task complexity is included in this research to control for complexity's established impact on end user query performance. Query performance can be measured on a number of dimensions including correctness, time required, and confidence.

Hence, the following hypotheses are proposed:

H1a:

Higher ambiguity in the information request leads to an increase in the total errors in the query formulation.

H1b:

Higher ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

4

H1c:

Higher ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

2.2

The Nature of Ambiguity

Ambiguity is an inherent property of all natural languages, including English (Jespersen 1922; Williamson 1994). Absolute precision of a language is pragmatically undesirable, because the language is unable to adapt to new concepts (Williamson 1994). The communication needed to ensure effective and efficient report production, however, requires complete clarity. Hence, a tension exists between the natural language's need for flexibility in the long term and the need for precision in the short term. Natural language is at once both dysfunctional and poorly adapted to the functions language needs to perform, yet flexible and broad-based such that it is useable in practice (Chomsky 1990).

Interest in linguistic ambiguity has an extensive history, and has been recognised as a separate branch of study since at least Aristotle's time (Kooij 1971). Aristotle noted that language must be ambiguous, as a language has limited words but an infinite number of things and concepts to which those words must apply (Kooij 1971).

Russell (1923) recognised that all natural languages are vague and ambiguous. Excluding the realm of mathematical symbolism, constructing completely unambiguous expressions is not possible with the syntax and vocabulary tools available within natural languages (Williamson 1994). To endure and survive, language requires the flexibility to communicate new concepts. Ambiguity necessarily derives from the flexibility of natural language.

5

Kooij (1971) states that ambiguity arises where a sentence can be interpreted in more than one way. Similarly, Walton (1996) considers a sentence or statement to be more ambiguous as the number of legitimate interpretations of the sentence (or paragraph) increase. Ambiguity implies multiplicity of meaning (Walton 1996).

In classical analysis, the multiplex (Latin for "multiple meaning") categorisation of Alexander of Aphrodisius (Hamblin 1970) suggests a basis for the identification of categories of ambiguity. In classical literature, Alexander of Aphrodisius identified three categories of ambiguity: potential, actual, and imaginary. Walton (1996) adapts this classical multiplex categorisation to his identified types of ambiguity.

Walton (1996) identifies six classical types of ambiguity in natural language: lexical, syntactical, inflective, pragmatic, emphatic, and suggestive. In addition to Walton's (1996) taxonomy, extraneous information and noise in the communication can also be a source of ambiguity. Extraneous ambiguity arises where the communication is not parsimonious, or the communication includes information that is not directly relevant to the message being communicated (Fowler and Aaron 1998). Extraneous ambiguity is an actual ambiguity within the Walton (1996) taxonomy.

Each ambiguity type can be independently present within the communication. Walton's (1996) modified taxonomy and model of ambiguity is presented in Figure 1.

6

Ambiguity

Potential

Lexical

Actual

Inflective

Imaginary

Pragmatic Extraneous

Emphatic

Suggestive

Multiplex Categories of Ambiguity

Types of Ambiguity

Syntactical

Figure 1 Types of Ambiguity (adapted from Walton 1996)

2.2.1 Potential Ambiguity

Potential ambiguity arises when a term or a sentence is ambiguous in and of itself, for example, before its use in the context of a sentence or paragraph. Three types of ambiguity are categorised as potential ambiguity: lexical, syntactical, and inflective.

Lexical Ambiguity

Lexical ambiguity is the most commonly known form of ambiguity (Reilly 1991; Walton 1996). It occurs when words have more than one meaning as commonly defined and understood. Considerable potential ambiguity arises when a word with various meanings is used in a statement of information request. For example, "bank" may variously mean the "bank" of a river (noun), to "bank" as related to aeroplane or a roller-coaster (verb), a savings "bank" (noun), to "bank" money (verb), or a "bank" of computer terminals (noun) (Turner 1987). Lexical ambiguity is often reduced or mitigated by the context of the sentence.

In the case of an information request, lexical ambiguity exists in the statement "A report of our clients for our marketing brochure mail-out". The word "report" may have several

7

meanings, independent of its context. A gunshot report may echo across the hillside. A student can report to the lecturer. A heavy report can be dropped on the foot. Although the context may make the meaning clear, the lexical ambiguity contributes to the overall ambiguity of the statement and increases cognitive effort.

The following hypotheses are proposed:

H2a:

Higher lexical ambiguity in the information request leads to an increase in the total errors in the query formulation.

H2b:

Higher lexical ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H2c:

Higher lexical ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

Syntactical Ambiguity

Syntactical ambiguity is a structural or grammatical ambiguity of a whole sentence that occurs in a sub-part of a sentence (Reilly 1991; Walton 1996). Syntactical ambiguity is a grammatical construct, and results from the difficulty of applying universal grammatical laws to sentence structure. An example of syntactical ambiguity is "Bob hit the man with the stick". This phrasing is unclear as to whether a man was hit with a stick, or whether a man with a stick was struck by Bob. The context can substantially reduce syntactical ambiguity. For example, knowing that either Bob, or the man, but not both, had a stick resolves the syntactical ambiguity.

8

Comparing the phrase "Bob hit the man with the stick" to the analogous "Bob hit the man with the scar" provides some insights. As a scar is little suited to physical, violent use, the latter formulation clearly conveys that the man with the scar was struck by Bob (Kooij 1971).

In the case of an information request, syntactical ambiguity exists in the request "A report of poor-paying clients and client managers. Determine their effect on our profitability for the last twelve months." The request is syntactically ambiguous because the end user can interpret "their" to mean the poor paying clients, the client managers, or both. Although the context may reduce or negate the ambiguity, syntactically the request is ambiguous.

The following hypotheses are proposed:

H3a:

Higher syntactical ambiguity in the information request leads to an increase in the total errors in the query formulation.

H3b:

Higher syntactical ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H3c:

Higher syntactical ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

Inflective Ambiguity

As Walton (1996) notes, inflective ambiguity is a composite ambiguity, containing elements of both lexical and syntactical ambiguity. Like syntactical ambiguity, inflective ambiguity is grammatical in nature. Inflection arises where a word is used more than once in a sentence or paragraph, but with different meanings each time (Walton 1996). An example of inflective

9

ambiguity is to use the word "scheme" with two different meanings in the fallacious argument, "Bob has devised a scheme to save costs by recycling paper. Therefore, Bob is a schemer, and should not be trusted" (Ryle 1971; Walton 1996).

In the case of an information request, inflective ambiguity exists in the example, "A report showing the product of our marketing campaign for our accounting software product". Ambiguity derives from using the word "product" in two different senses in the one statement (Walton 1996; Fowler and Aaron 1998).

The following hypotheses are proposed:

H4a:

Higher inflective ambiguity in the information request leads to an increase in the total errors in the query formulation.

H4b:

Higher inflective ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H4c:

Higher inflective ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

2.2.2 Actual Ambiguity

Actual ambiguity refers to ambiguity that occurs in the act of speaking. It arises when a word or phrase, without variation either in itself or in the way the word is put forward, has different meanings. The statement does not contain adequate information to resolve the ambiguity, resulting in a number of legitimate interpretations. Two distinct types of ambiguity are categorised as actual ambiguity: pragmatic and extraneous.

10

Pragmatic Ambiguity

Pragmatic ambiguity arises when the statement is not specific, and the context does not provide the information needed to clarify the statement. Information is missing, and must be inferred. An example of pragmatic ambiguity is the story of King Croesus and the Oracle of Delphi (adapted from Copi and Cohen 1990):

"King Croesus consulted the Oracle of Delphi before warring with Cyrus of Persia. The Oracle replied that, "If Croesus went to war with Cyrus, he would destroy a mighty kingdom". Delighted, Croesus attacked Persia, and Croesus' army and kingdom were crushed. Croesus complained bitterly to the Oracle's priests, who replied that the Oracle had been entirely right. By going to war with Persia, Croesus had destroyed a mighty kingdom - his own." Pragmatic ambiguity arises when the statement is not specific, and the context does not provide the information needed to clarify the statement (Walton 1996). The information necessary to clearly understand the message is omitted. Due to the need to infer the missing information, pragmatically ambiguous statements have multiple possible interpretations (Walton 1996). Croesus interpreted the Oracle's statement as indicating his success in battle the response he desired. As noted by Hamblin (1970), Croesus' logical response to the oracular reply would have been to immediately ask the Oracle, "Which kingdom?" Further information is needed to resolve pragmatic ambiguity.

In the case of an information request, pragmatic ambiguity exists in the request for "A report of all the clients for a department." The ambiguity is that the request does not refer to a specific department. The end user could legitimately prepare a report for any department. Further information is needed to resolve this actual ambiguity in this case.

11

The following hypotheses are proposed:

H5a:

Higher pragmatic ambiguity in the information request leads to an increase in the total errors in the query formulation.

H5b:

Higher pragmatic ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H5c:

Higher pragmatic ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

Extraneous Ambiguity

In contrast to pragmatic ambiguity, in which information necessary to clearly understand the message is omitted, extraneous ambiguity arises from an excess of information. Clearer communication arises where the minimally sufficient words needed to convey the message of the statement are used (Fowler and Aaron 1998). Where more words are used than necessary, or where unnecessary detail is provided in the communication that is not part of the message, ambiguity arises. The excess detail obscures the essential message and contributes to different emphases or interpretations.

The use of passive voice, vacuous words, or the repetition of phrases with the same meaning all contribute to lack of clarity (Fowler and Aaron 1998). The use of clichés and the over-use of figures of speech add volume to the statement, but add little or no meaning. Pretentious and indirect writing also adds to the bulk of the statement, but without adding meaning. Fowler and Aaron (1998) provide the following comparative example:

12

Pretentious:

To perpetuate our endeavour of providing funds for our elderly citizens as we do at the present moment, we will face the exigency of enhanced contributions from all our citizens.

Revised:

We cannot continue to fund Social Security and Medicare for the elderly unless we raise taxes.

The extra volume contributes to vagueness in the first statement, and adds to the multiplicity of legitimate interpretations of the statement. The first statement exhibits extraneous ambiguity. The second statement communicates forcefully and concisely.

An example of extraneous ambiguity in an information request is "A report of all clients (and their names and addresses only) for the Tax and Business Services department. Some of those clients are our biggest earners, you know". The last sentence is extraneous, and contains detail that is redundant, uninformative, or misleading relative to the fundamental message. In information theoretic terms, extraneous ambiguity is "noise" in the communication (Axley 1984; Eisenberg and Phillips 1991; Severin and Tankard 1997).

The following hypotheses are proposed:

H6a:

Higher extraneous ambiguity in the information request leads to an increase in the total errors in the query formulation.

H6b:

Higher extraneous ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H6c:

Higher extraneous ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

13

2.2.3 Imaginary Ambiguity

Imaginary ambiguity occurs when a word with a fixed meaning seems to have a different one. Imaginary ambiguity derives from the optional interpretation that the recipient of the communication places on the information received. Two distinct types of ambiguity can be categorised as imaginary ambiguity: emphatic and suggestive.

Emphatic Ambiguity

The question of ambiguity deriving from accent, or emphasis in speaking, is an ancient one (Hamblin 1970). When a phrasing is rendered in the written form, the verbal emphasis may only be crudely indicated. Significant meaning and context is lost. Rescher (1964) provides the following example of emphatic ambiguity: The intended meaning of the democratic credo "Men were created equal" can be altered by stressing the word "created" (implying "that's how men started out, but they are no longer so"). The verbal emphasis creates an inference of meaning that is a legitimate interpretation of the phrasing. That is, changes in intonation can yield different interpretations.

In the case of an information request, emphatic ambiguity occurs in the example information request of "A report of our good clients". Ambiguity can derive from placing different emphases on the words. Depending on the context or on emphasis used, "good clients" could be legitimately interpreted to be clients that pay on time or clients that have the highest dollar-value sales. Indeed, with an ironic emphasis on the word "good", this request could be interpreted as a list of our worst clients - those that do not pay. The information necessary to resolve the ambiguity is often difficult to convey using only printed media.

14

The following hypotheses are proposed:

H7a:

Higher emphatic ambiguity in the information request leads to an increase in the total errors in the query formulation.

H7b:

Higher emphatic ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H7c:

Higher emphatic ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

Suggestive Ambiguity

Despite the apparent clarity of the sentence in question, suggestive ambiguity creates diverse implications and innuendos that can produce different implications (Walton 1996). Fischer (1970) provides an example: The First Mate of a ship docked in China returned drunk from shore leave, and was unable to write up the ship's log. The displeased Captain completed the log, adding, "The Mate was drunk all day". The next day, the now-sober Mate challenged the Captain over the entry, as it would reflect poorly on him. The Captain responded that the comment was true, and must stand. Whereupon the mate added to that day's log, "The Captain was sober all day". In reply to the Captain's challenge, the mate responded "the comment is true, and must stand" (derived from Trow 1905, pp 14-15). The phrase "The Captain was sober all day" contains suggestive ambiguity. As a further example, the statement, "The President is now an honest man", is perfectly clear, and yet considerable innuendo exists. The fact that the President's current honesty is worthy of comment implies that the President was previously dishonest.

15

Both phrases are perfectly clear, and, indeed, true. However, considerable innuendo exists. The fact that the Captain's sobriety, or the President's honesty, is singled out for special comment implies that such a state of affairs is unusual (Walton 1996). The statements are suggestively ambiguous.

In the case of an information request, an example of this ambiguity is, "A report of the clients of this accounting practice that have lodged taxation returns in the past five years in accordance with the requirements of the Australian Taxation Office". The request for information is quite clear. By definition, however, all taxation returns should be lodged in accordance with the Australian Taxation Office's requirements. The extra phrase introduces suggestive ambiguity into the information request by suggesting that the report will not consist of all taxation clients, because some clients may not have complied with the Tax Office's requirements.

The following hypotheses are proposed:

H8a:

Higher suggestive ambiguity in the information request leads to an increase in the total errors in the query formulation.

H8b:

Higher suggestive ambiguity in the information request leads to an increase in the time taken to complete the query formulation.

H8c:

Higher suggestive ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation.

16

2.2.4 Ambiguity in Practice

Table 1 provides examples of the types of ambiguity identified in this paper. The table also summarises, and provides examples for, each type of ambiguity. Table 1 Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests

Ambiguity Type Lexical

Information Request A report of our clients for our marketing brochure mail-out. The word "report" may have several meanings, independent of its context. For example, there may be: a gunshot report echoing through the hillside; the Lieutenant reported to the Captain; I dropped the heavy report on my toe, etc. Although the context may make the meaning clear, the lexical ambiguity adds to cognitive effort and contributes to ambiguity overall.

Syntactical

A report of poor-paying clients and client managers. Determine their effect on our profitability for the last twelve months. It is not clear whose effect on profitability is meant. Another example is "Bob hit the man with a stick". It is not clear, syntactically, whether the man with a stick was hit, or whether the man was hit, by Bob, with a stick.

Inflective

A report showing what the product of our last marketing campaign for sales of our accounting software product in the last month was. Ambiguity here derives from the use of the word "product" with two different meanings in the one information request.

Pragmatic

A report of all the clients for a department. The ambiguity here is that the department has not been specified. Information necessary to clearly understand the message is omitted. It would be legitimate to prepare a report for any department. Further information is needed to resolve this actual ambiguity.

Extraneous

A report of all clients (and their names and addresses only) for the Tax and Business Services department. Some of those clients are our biggest earners, you know. The last sentence is extraneous. Unlike pragmatic ambiguity, the sentence contains information that is redundant, uninformative, or not necessary to derive the statement's message. "Noise" in the communication exists. More words are used than are necessary to make the statement.

Emphatic

A report of our good clients. Ambiguity here could derive from the lack of ability to provide emphasis of the words in its written form. Depending on the emphasis used, "good clients" could be legitimately interpreted to be clients that pay on time, clients that have the most dollar-value sales, or even, with the correct ironic emphasis on the spoken word, our worst clients - those that do not pay.

17

Ambiguity Type

Information Request

Suggestive

A report of the clients of this accounting practice that have lodged taxation returns in the past five years in accordance with the requirements of the Australian Taxation Office. The request for information is quite clear until the phrase "in accordance with the requirements of the Australian Taxation Office". By definition, all taxation returns should be lodged in accordance with these requirements. The extra phrase introduces suggestive ambiguity into the information request by suggesting that the report will not necessarily consist of all taxation clients.

2.3

Task Complexity

More complex tasks require more cognitive effort and hence have a generally negative impact on the user's performance in deriving database queries (Campbell 1988; Borthick et al. 1997; Borthick et al. 2000). Task complexity, in the context of query development, consists of the inherent task complexity associated with the query syntax, and the data structure complexity associated with the organisation of the tables and attributes (Liew 1995).

Campbell (1988) and Wood (1986) document the general impact of task complexity. Jih et al. (1989) studied task complexity and user performance in the context of the use of entityrelationship diagrams and relational data models. Complexity in this context is generally measured as a function of the total number of elementary mental discriminations required to write a query (Halstead 1977).

The following hypotheses are proposed:

H9a:

Higher complexity in the information request leads to more total errors in the query formulation.

18

H9b:

Higher complexity in the information request leads to more time taken to complete the query formulation.

H9c:

Higher complexity in the information request leads to lower end user confidence in the accuracy of the query formulation.

2.4

Theoretical Model Summary

Figure 2 summarises the theoretical model presented in this paper. Complexity and the seven types of ambiguity have a negative impact on end user query performance as they increase. Hypotheses 1 through 9 are derived from these hypothesised relationships.

Information Request

Ambiguity Lexical Syntactical Inflective

Pragmatic Extraneous

Emphatic Suggestive

Complexity

Negative Relationship With

Negative Relationship With

End User Query Performance

Figure 2 The Theoretical Model of Ambiguity, Complexity, and End User Query Performance

19

3. Methodology

3.1

Experimental Design

A laboratory experiment was conducted to test the hypotheses presented in this study. A twofactor, within-groups experimental design was used (Huck et al. 1974). Participants were randomly assigned to two groups (Group A and Group B). Each participant was presented with up to sixteen questions. Each question was presented in either a clear or ambiguous formulation.

Group A's question formulations were alternately ambiguous and clear. Group B's question formulations were alternately clear and ambiguous. Using alternating formulations helped promote equitable treatment of the two groups. That is, the alternating formulations ensured that both groups would complete approximately the same number of questions during the allotted time, expend approximately the same amount of cognitive effort, and would experience approximately the same level of frustration in dealing with ambiguous information requests. All participants spent two hours on the experiment. Appendix A shows the questions presented to students together with the model answers.

A set of instructions (Appendix B), including a synopsis of the query language syntax, was provided to the participants. A Unix shell script (Appendix C) presented the questions electronically to the participants and automatically captured their responses in text files. An entity-relationship diagram describing the database is presented in Appendix D, and was available to subjects. Further details regarding the experimental process are provided in Appendix E.

20

3.2

Experiment Participants

Forty-seven undergraduate and nineteen postgraduate students participated in the experiment. Participating students were enrolled either in an advanced undergraduate or in a post-graduate database subject within the business school at the University of Queensland. All students enrolled in the two database subjects participated in the experiment.

The motivation for student participation was the receipt of five percent of the students' final mark for the subject (2.5% for participation, 2.5% for performance). Participants were aware that they were participating in an experiment.

Participants had been previously trained in the use of the SQL query language, and had been afforded the opportunity to practice SQL on the university systems. All practice took place on different databases than used for the experiment. Generally, student expertise with SQL was low to intermediate. The experiment, for most students, was the first practical application of their SQL skills.

3.3

Assessment of Participant Responses

Participant responses were captured in text files that showed each interactive response and captured the start and end time of each question. This file was edited into a suitable format for marking by two examiners. Each response was independently assessed by each examiner to determine whether the response was the participant's final complete response. Responses where participants did not finish the query formulation were removed from the study.

21

In some instances, the state of completion of the response was indeterminate. If the response could only be corrected with substantial rework of the submitted response, the examiners erred on the side of caution and removed these responses from the study.

Examiners then corrected the answers according to the model answers (Appendix A), using the Semantic Error Counting, SQL Challenge Error Counting, and Intermediate Error Counting Forms shown in Appendix F. Each examiner independently assessed the participant responses and corrected the response. Each discrete alteration (addition or deletion of a query component) counted as one "micro error" in the Semantic Error Counting Form (Appendix F).

The corrected response that determined the total error count was the response that required the fewest changes to the participant's response, and still produced the required result set. This approach ensured a lower error count than a strict modification of the response to ensure an exact match to the model answer. Appendix G provides an example corrected response.

The examiners then compared their independent assessments to ensure that all errors had been found and corrected and that the proposed formulations or corrected formulations produced the correct output. If more than one correction method was found to produce a correct query, the correction method that produced the smallest number of errors was used. A diary of common errors and their corrections was kept to ensure consistency throughout the assessment process. The final, moderated, error sheets were transcribed to a relational database for analysis.

22

4. Results and Discussion

4.1

Overview of Experimental Results

Participant demographic information and statistics are presented in Tables 2, 3, and 4. The demographic information indicates that the assignment of participants to ensure homogeneity between Group A and Group B was successful. The groups are relatively homogeneous in terms of course background, grade point average (GPA), and age. In any case, both Group A and Group B received the treatment effect of ambiguity on alternate questions, mitigating concerns of the effect of a selection bias on experimental results.

Table 2 Participant Demographic Information and Descriptive Statistics: Course Background of Group A and Group B

Enrolled Degree Undergraduate Arts Undergraduate Business Undergraduate Computer Science/Information systems Postgraduate Business Postgraduate Computer Science/Information Systems Total Participants:

Group A 3 20 3 2 5

Group B 3 18 0 1 11

33

33

Total 6 38 3 3 16 66

Table 3 Participant Demographic Information and Descriptive Statistics: Academic Record of Group A and Group B

Academic Record GPA (65 students with academic records) GPA (Group A: 33 students with academic records) GPA (Group B: 32 students with academic records)

Average

Min

Max

4.94

Standard Deviation 0.90

3.26

7.00

5.04

0.83

3.26

6.84

4.83

0.97

3.29

7.00

23

Table 4 Participant Demographic Information and Descriptive Statistics: Participant Age in Group A and Group B

Age (in Years)

Average

Standard Deviation

Min

Max

Average Age (65 Students with date of birth available)

24.94

7.72

18.74

61.25

Average Age (Group A, 33 Students with date of birth available)

24.76

7.29

19.50

48.53

Average Age (Group B, 32 Students with date of birth available)

25.13

8.26

18.74

61.25

Participants completed 425 responses in the experiment. The experiment contained sixteen questions for both ambiguous and clear information requests. Due to the two hour time constraint no participant completed more than twelve questions. Forty participants (60.61% of the sample population) completed six questions. On average, participants completed 6.44 questions, with a standard deviation of 1.75.

Table 5 provides an overview of the participants' results in the experiment. Total errors is calculated as the average of the micro errors counted using the Semantic Error Counting Sheet shown in Appendix F. Appendix H provides a Pearson correlation matrix of the dependent and independent variables measured in the experiment. Appendix I provides detailed reports of the errors participants made on each individual question.

24

Table 5 Comparative Statistics for all Participant Responses Grouped by Question (Q) and Treatment (T). Note that for T, a = ambiguous, c = clear Q

T

Halstead's Complexity

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12

a c a c a c a c a c a c a c a c a c a c a c c

1.6927 1.6927 5.4186 5.4186 6.8908 6.8908 4.4697 4.4697 12.2917 12.2917 18.8000 18.8000 16.0076 16.0076 16.2684 16.2684 23.8970 23.8970 19.4819 19.4819 22.4000 22.4000 29.1633

Group Response Attempts Attempts Confidence Confidence Duration Duration Total Errors Total Errors Count Average Standard Average Standard Average Standard Average Standard Deviation Deviation Deviation Deviation

A B B A A B B A A B B A A B B A A B B A A B B

32 33 33 33 33 33 32 33 33 30 17 23 15 15 6 10 3 2 1 4 2 1 1

3.31 3.18 9.21 3.61 7.94 5.09 7.31 6.52 9.24 7.07 11.41 14.91 11.07 7.67 6.83 6.40 12.33 6.50 7.00 7.25 7.00 4.00 14.00

1.99 2.16 8.88 3.43 6.04 6.18 4.75 7.36 6.63 5.98 7.21 9.36 6.10 4.20 8.42 2.46 2.08 3.54 3.20 4.24 -

6.22 6.42 5.21 6.30 5.91 6.27 5.38 6.21 5.24 5.37 5.59 4.87 5.07 5.07 5.83 5.00 3.00 6.50 5.00 4.25 5.00 7.00 4.00

1.36 0.87 1.47 1.05 1.57 1.42 1.64 1.47 2.21 2.16 1.33 1.91 1.49 1.98 1.60 1.94 1.73 0.71 2.50 2.83 -

10.51 11.63 20.74 9.03 11.84 8.63 15.57 10.95 18.54 15.65 23.59 25.63 18.78 15.31 13.24 12.53 16.43 15.36 9.93 9.56 8.53 9.45 10.10

4.63 6.60 11.30 6.89 7.72 5.29 8.95 8.46 11.06 9.74 7.93 10.13 5.46 7.86 8.36 5.35 7.77 2.51 1.40 2.13 -

1.59 1.12 4.27 0.30 3.97 1.03 4.03 0.67 9.42 5.20 32.94 8.00 7.27 6.13 2.33 6.40 18.00 15.50 20.00 5.00 22.50 8.00 8.00

3.66 2.48 8.18 0.81 3.50 2.86 5.54 2.23 10.39 7.70 13.21 10.49 8.65 7.41 4.08 6.52 10.54 21.92 2.58 13.44 -

The relationships between the dependent variables (duration, confidence, and total errors) and the independent variables (complexity, ambiguity) are graphically depicted in Figures 3, 4, and 5. These figures illustrate that the hypothesised relationships for complexity and ambiguity were supported for most measures by most queries. Questions by Treatment and Error 35.00

Average Errors

30.00 25.00 20.00

Ambiguous

15.00

Clear

10.00 5.00 0.00 1

2

3

4

5

6

7

8

9

10 11 12

Question

Figure 3 Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the total errors in the participant's response.

25

Questions by Treatment and Duration

Average Duration (in minutes)

30.00 25.00 20.00 Ambiguous

15.00

Clear

10.00 5.00 0.00 1

2

3

4

5

6

7

8

9 10 11 12

Question

Figure 4 Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the duration taken for the participant to prepare the response.

Average Confidence

Questions by Treatment and Confidence 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00

Ambiguous Clear

1

2

3

4

5

6

7

8

9

10 11 12

Question

Figure 5 Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the participant's confidence in the response.

Question Six, with an average of 32.94 errors (standrard deviation of 13.21), caused the most problems for participants in its ambiguous formulation. Nonetheless the seventeen respondents to Question Six in its ambiguous formulation took on average slightly less time to complete the response (23.59 average minutes, 7.93 standard deviation) than the twentythree respondents for the clear formulation (25.63 average minutes, 10.13 standard deviation).

26

Participants that completed Question Eight in the clear formulation made more average errors (6.40, standard deviation of 6.52) than those with the ambiguous formulation (average of 2.33 and standard deviation of 4.08). Participants also exhibited higher average confidence ratings for the ambiguous formulation of this question (5.83, standard deviation of 1.60) than participants receiving the clear formulation (5.00, standard deviation of 1.94).

A reason for these results may be that extraneous ambiguity is apparent in the clear formulation due to the formulation's length. Question Eight had sixteen completed responses (six respondents for the ambiguous formulation, ten respondents for the clear formulation), however, which limits the weight that can be placed on this question's result. Because of the small number of participants completing Questions Nine through Twelve, analysis of differences in these individual questions is not appropriate.

4.2

Regression Analysis

Two multiple linear regression models were used to analyse the experimental results. The model used to test H1a-c, and H9a-c for the effects of ambiguity and complexity respectively was:

(1) Performance

=

Ambiguity + Complexity

where ambiguity was a dichotomous variable and complexity was measured using the Halstead (1977) complexity measure for difficulty.

27

The model used to test the seven individual types of ambiguity in H2a-c to H8a-c was:

(2) Performance

=

Lexical + Syntactical + Inflective + Pragmatic + Extraneous + Emphatic + Suggestive + Complexity

where the ambiguity types were measured as shown in Appendix J, according to the ambiguity assessment instrument presented in Appendix K.

Performance is end user query performance. The dependent variables that proxy for end user query performance are total errors, duration, and confidence. Duration was measured as decimal minutes. The Confidence Rating was self-assessed by participants and was transformed to a numerical rating in accordance with Table 6. The numerical rating was used as the measure for confidence in the regression analysis. Table 6 Confidence Rating Transformation to a Numerical Scale

Confidence Rating >85-100% 70-85% 55-70% 40-55% 25-40% 10-25% <10%

Numerical Rating 7 6 5 4 3 2 1

In all regression models, the Halstead (1977) complexity measure for difficulty was used to assess the complexity of the required model answer. This measure has been used in several end user query performance studies (Jih et al. 1989).

For testing H1a-c and H9a-c, a dichotomous variable of 0 (clear formulation, or pseudo-SQL) and 1 (ambiguous formulation, or manager-English) was used to indicate whether the

28

participant had received a clear formulation or an ambiguous formulation of the information request. For testing H2a-c to H8a-c, the seven independent ambiguity parameters were assessed in accordance with the scale presented in Table 7. Each question was assessed by two independent non-researchers who had been briefed in the definitions of the seven types of ambiguity. The initial scores were moderated by discussion and consideration between the independent third parties and the researcher to ensure consistent and correct interpretation of the seven ambiguity definitions. Cronbach's alpha (Cronbach 1951) for the two third parties' ambiguity measurement scores was 0.6887, indicating that a moderately reliable measure for ambiguity across two researchers was achieved. Table 7 Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model

Ambiguity Assessment Rating 0 1 2 3 4

Meaning No ambiguity of this type present A little ambiguity of this type present Some ambiguity of this type present Much ambiguity of this type present A great deal of ambiguity of this type present

Each question formulation, clear and ambiguous, for each information request was assessed to provide a scale of ambiguity. The instrument used to undertake this finer assessment of ambiguity for questions for which responses exist is reproduced in Appendix K. Using a five point scale for the ambiguity assessment rating provides a finer measure than would a dichotomous variable.

4.3

Ambiguity Treatment Multiple Linear Regression Model Results

Table 8 provides the results of the multiple linear regression (Newbold 1984) shown for model (1) for the Total Errors, Duration, and Confidence measures of end user query performance. These results provide evidence regarding H1a-c and H9a-c. All relationships

29

are in the hypothesised direction (positive for H1a, H1b, H9a, and H9b, and negative for H1c and H9c), and indicate strong support for each hypothesis.

Table 8 Regression Analysis Results for the General Ambiguity Regression Model Source DF Mean F-Value Pr > T Parameter R2 (n=425) Square (2 tailed) Estimate Model (Total Errors) 2 5430.30 88.44 0.0001 0.2954 Error 422 61.40 Ambiguity (H1a) 1 2447.98 39.87 0.0001 4.8042 Complexity (H9a) 1 8705.38 141.78 0.0001 0.7582 Model (Duration) Error Ambiguity (H1b) Complexity (H9b)

2 422 1 1

2236.60 78.23 1250.63 3352.81

28.59

0.0001

15.99 42.86

0.0001 0.0001

Model (Confidence) Error Ambiguity (H1c) Complexity (H9c)

2 422 1 1

42.87 2.64 13.03 74.68

16.25

0.0001

4.94 28.31

0.0268 0.0001

0.1193 3.4339 0.4705 0.0715 -0.3505 -0.0702

Ambiguity in an information request has a strong impact on the three measures of end user query performance presented in H1a, H1b, and H1c. Total errors, duration, and end user confidence are significantly and strongly affected by the presence of ambiguity in the information request. The result is confirmatory of the general hypothesis of the model presented in this paper: that an ambiguous information request is likely to result in a query formulation that is less accurate, takes longer to prepare, and in which the end user is less confident. Ceteris paribus, a clearly formulated information request is more effective and efficient than an information request that is ambiguous and poorly specified.

The relationship between ambiguity and end user confidence, however, is generally weaker than expected, although still significant at the 5% level. The small R 2 (0.0715) for the confidence model indicates that the ambiguity and complexity of an information request had little impact on each participant's confidence in their query formulation.

30

Ambiguity is significant for all three models. The R2 for each model (0.2954, 0.1193, and 0.0715) provides strong support for the assertion that ambiguity and complexity negatively impact end user query performance.

4.4

Multiple Linear Regression Model: Seven Types of Ambiguity

Table 9 provides the results of the multiple linear regression model shown for model (2) for the Total Errors, Duration, and Confidence measures of end user query performance. This testing examines hypotheses H2a-c through H8a-c for individual types of ambiguity.

Table 9 Regression Analysis Results for the Seven Ambiguity Types Regression Model Source DF Mean F-Value Pr > T Parameter R2 (n=425) Square (2 tailed) Estimate Model (Total Errors) 8 2177.52 46.81 0.0001 0.4737 Error 416 46.52 Lexical (H2a) 1 78.41 1.69 0.1949 -1.5545 Syntactical (H3a) 1 7.99 0.17 0.6789 -0.2274 Inflective (H4a) 1 0.79 0.02 0.8963 -0.4143 Pragmatic (H5a) 1 385.36 8.28 0.0042 1.2621 Extraneous (H6a) 1 254.77 5.48 0.0197 3.3940 Emphatic (H7a) 1 394.51 8.48 0.0038 2.6906 Suggestive (H8a) 1 167.54 3.60 0.0584 2.9079 Complexity 1 2605.34 56.01 0.0001 0.4899 Model (Duration) Error Lexical (H2b) Syntactical (H3b) Inflective (H4b) Pragmatic (H5b) Extraneous (H6b) Emphatic (H7b) Suggestive (H8b) Complexity

8 416 1 1 1 1 1 1 1 1

832.24 74.10 1272.66 600.95 780.00 4.65 1008.31 129.85 457.05 1926.10

11.23

0.0001

17.17 8.11 10.53 0.06 13.61 1.75 6.17 25.99

0.0001 0.0046 0.0013 0.8023 0.0003 0.1863 0.0134 0.0001

Model (Confidence) Error Lexical (H2c) Syntactical (H3c) Inflective (H4c) Pragmatic (H5c) Extraneous (H6c) Emphatic (H7c) Suggestive (H8c) Complexity

8 417 1 1 1 1 1 1 1 1

14.66 2.60 8.81 0.02 1.27 2.83 0.10 0.07 1.91 76.02

5.64

0.0001

3.39 0.01 0.49 1.09 0.04 0.03 0.74 29.24

0.0664 0.9292 0.4844 0.2973 0.8435 0.8697 0.3915 0.0001

31

0.1776 6.2626 1.9725 -13.0021 -0.1387 6.7520 -1.5436 -4.8029 0.4213 0.0978 -0.5211 -0.0115 0.5253 -0.1082 0.0677 -0.0358 0.3107 -0.0837

4.5

Summary of Results

The experimental results indicate that the taxonomy presented in this paper explains a great deal of the effect of ambiguity on end user query performance. The results indicate that further refinement of the theory presented in this paper is required. Table 10 provides a summary of the results obtained in this experiment. All hypotheses indicated as "supported" are significant at the p = 0.05 level or below according to a one-tailed test. The two-tailed pvalue is shown, and is immediately followed by the one-tailed p-value in brackets.

Table 10 Summary of Analysis' Support for Hypotheses

H1a H1b H1c H2a

H2b H2c

H3a H3b

H3c

H4a H4b

Hypothesis Statement Higher ambiguity in the information request leads to an increase in the total errors in the query formulation. Higher ambiguity in the information request leads to an increase in the time taken to complete the query formulation. Higher ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of lexical ambiguity in the information request lead to more total errors in the query formulation. Higher levels of lexical ambiguity in the information request lead to more time taken to complete the query formulation. Higher levels of lexical ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of syntactical ambiguity in the information request lead to more total errors in the query formulation. Higher levels of syntactical ambiguity in the information request lead to more time taken to complete the query formulation. Higher levels of syntactical ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of inflective ambiguity in the information request lead to more total errors in the query formulation. Higher levels of inflective ambiguity in the information request lead to more time taken to complete the query formulation.

32

Result Supported p=0.0001 (0.0001) Supported p=0.0001 (0.0001) Supported p=0.0268 (0.0134) Not Supported p=0.1949 (0.0975) (negative parameter) Supported p=0.0001 (0.0001) Supported p=0.0664 (0.0332) Not Supported p=0.6789 (0.3395) Supported p=0.0046 (0.0023) Not Supported p=0.9292 (0.4646) Not Supported p=0.8963 (0.4482) Not Supported p=0.0013 (0.0007) (negative parameter)

H4c

H5a H5b

H5c

H6a H6b

H6c

H7a H7b

H7c

H8a H8b

H8c

H9a H9b H9c

Hypothesis Statement Higher levels of inflective ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of pragmatic ambiguity in the information request lead to more total errors in the query formulation. Higher levels of pragmatic ambiguity in the information request lead to more time taken to complete the query formulation. Higher levels of pragmatic ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of extraneous ambiguity in the information request lead to more total errors in the query formulation. Higher levels of extraneous ambiguity in the information request lead to more time taken to complete the query formulation. Higher levels of extraneous ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of emphatic ambiguity in the information request lead to more total errors in the query formulation. Higher levels of emphatic ambiguity in the information request lead to more time taken to complete the query formulation. Higher levels of emphatic ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher levels of suggestive ambiguity in the information request lead to more total errors in the query formulation. Higher levels of suggestive ambiguity in the information request lead to more time taken to complete the query formulation. Higher levels of suggestive ambiguity in the information request leads to lower end user confidence in the accuracy of the query formulation. Higher complexity in the information request leads to more total errors in the query formulation. Higher complexity in the information request leads to more time taken to complete the query formulation. Higher complexity in the information request leads to lower end user confidence in the accuracy of the query formulation.

33

Result Not Supported P = 0.4844 (0.2422) Supported p=0.0042 (0.0021) Not Supported p=0.8023 (0.4012) Not Supported p=0.2973 (0.1487) Supported p=0.0197 (0.0099) Supported p=0.0003 (0.0002) Not Supported p=0.8435 (0.4218) Supported p=0.0038 (0.0019) Not Supported p=0.1863 (0.0932) (negative parameter) Not Supported p=0.8697 (0.4349) Supported p=0.0584 (0.0292) Not Supported p=0.0134 (0.0067) (negative parameter) Not Supported p=0.3915 (0.1958) Supported p=0.0001 (0.0001) Supported p=0.0001 (0.0001) Supported p=0.0001 (0.0001)

4.5.1 Potential Ambiguity

The generally weak measured effects for the potential ambiguities assessed by the experiment (lexical and syntactical) do not support the hypotheses presented in this paper. As the theoretical model indicates, potential ambiguities derive their ambiguity independently of the context of the statement. A statement may contain lexical or syntactical ambiguity, but the context of the statement resolves the ambiguity measured. The hypothesised effects were not measurable due to the clarification of the ambiguity by the context.

Lexical ambiguity did not show a statistically significant relationship with total errors (H2a). Lexical ambiguity did demonstrate a statistically significant relationship with duration (H2b, p=0.0001) and confidence (H2c, p=0.0332 for a one-tailed t-test). The implication of these results is that lexical ambiguity requires more cognitive effort by the end users to determine the meaning of the request. Once the meaning of the request has been determined, however, users do not make significantly more errors in their query formulations. Lexical ambiguity did result in end users being slightly less confident in their queries.

Although in the hypothesised direction (positive), the relationship between syntactical ambiguity and total errors (H3a) is not significant (p=0.6789). Syntactical ambiguity does show a significant relationship with the time taken to compare the query, which indicates that greater cognitive effort is required to resolve the contextual ambiguity. Syntactical ambiguity's relationship with end user confidence is not significant (H3c, p=0.9292).

Inflective ambiguity does not show a significant relationship in the hypothesised direction for H4a (p=0.8963), H4b (negative parameter, p=0.0013), or H4c (p=0.4844). Interestingly,

34

inflective ambiguity shows a significant negative relationship with duration, which is in the opposite direction to that hypothesised. This result must be considered with caution, however, as the level of inflective ambiguity present in the questions presented to subjects was low (Appendix J).

4.5.2 Actual Ambiguity

The role of the actual ambiguity types (pragmatic and extraneous) in the theoretical model are strongly supported by the empirical results. Actual ambiguities are not clarified by the context of the statement, i.e., the context does not resolve pragmatic and extraneous ambiguities. Actual ambiguities generally show a strong relationship with total errors, and extraneous ambiguity (but not pragmatic ambiguity) displays a strong relationship with duration. Neither pragmatic or extraneous ambiguities show a significant relationship with end user confidence.

Pragmatic ambiguities are not clarified by context, and arise where information necessary to properly answer the information request is missing. The hypothesised relationship between pragmatic ambiguity and total errors is strongly supported (H5a, p=0.0042). The hypothesised effects of pragmatic ambiguity on duration (H5b, negative parameter, p=0.8023), and end user confidence (H5c, p=0.2973) were not significant. Pragmatic ambiguity may require the end user to infer the missing information, and increase total errors. In the current experiment, the need to infer missing information did not significantly affect the time necessary to complete the query response or end user confidence in their query.

Extraneous ambiguity occurs when more information than is required is provided or when the information request is indirectly and pretentiously written. Extraneous ambiguity misleads

35

end users as to the required response. H6a was strongly supported for total errors (p=0.0197) and duration (p=0.0003) in the end user query formulation. Extraneous ambiguity, where more information is provided than is required, appears to require more time and cognitive effort to resolve the ambiguity, and the query response is more likely to be inaccurate.

The parameter estimates (Table 9) for total errors (3.3940) and for duration (6.7520) indicate that extraneous information produces severe negative impacts on end user query efficiency and effectiveness. The result for H6c, which hypothesised that extraneous ambiguity decreases end user confidence, is not significant (p=0.8435). Where information needs to be inferred (pragmatic ambiguity), end users appear to recognise and grapple with the ambiguity. End users appeared less able to recognise and adjust for extraneous ambiguity than pragmatic ambiguity.

4.5.3 Imaginary Ambiguity

The results for imaginary ambiguities support the hypothesised relationships between these ambiguities and query errors. The results do not support the hypothesised relationships with duration or end user confidence. Imaginary ambiguities result in more total errors, but appear to result in less time taken to complete the requests. These outcomes are important, because, although not hypothesised, imaginary ambiguities appear to lead end users to infer the requirements of the question more quickly (leading to a shorter duration required) and to formulate the query response on that basis (leading to higher total errors). This result should be treated with caution, as the imaginary ambiguities were not at a high level in this experiment (Appendix J).

36

Emphatic ambiguity arises from the limited ability to convey intonation in written form. The hypothesis regarding the effect of emphatic ambiguity on total errors (H7a) is strongly supported (p=0.0038). Neither H7b (duration) nor H7c (confidence) were statistically significant. Where the emphasis of the information request cannot be clearly expressed, end users are required to supply their own emphasis when interpreting the meaning of the information request. While they appear to make their interpretation quickly, the end users did not recognise that their queries were more likely to contain errors.

The hypothesised relationship between suggestive ambiguity and total errors (H8a) is strongly supported (p=0.00292 for a one-tailed t-test). The relationship between suggestive ambiguity and duration (H8b), however, is opposite to the hypothesised direction, and significant (negative parameter, p=0.0134). The hypothesised relationship with end user confidence (H8c) is not supported (p=0.3915). Similar to extraneous ambiguity, suggestive ambiguity indicates that end users are not able to recognise the negative impact of suggestive ambiguity on their query formulations. This anomalous result requires further research to determine the reason for this undesirable result and to search for ways to ameliorate these problems for end user formulations.

4.5.4 Complexity

The results indicate strong support for the hypotheses regarding complexity (H9a, H9b, and H9c all with p=0.0001). Task complexity increases total errors and duration, and decreases the end user's overall confidence in the query formulation. These results are consistent with previous research (e.g., Borthick et al. 1997; Borthick et al. 2000).

37

5. Implications For Business Practice

This research has developed an initial theory of ambiguity and end user queries. It empirically investigated seven ambiguities, and measured how they differentially affect end user query performance. Some ambiguities, e.g., lexical, extraneous, pragmatic, and emphatic, affect end user query performance more than others. Some ambiguities, i.e. extraneous, and suggestive, indicate that end users will potentially make decisions based on results that are inaccurate or misleading.

5.1.1 Electronic Mail

In the business world, electronic mail is often used to transmit information requests, frequently without the benefit of other channels of communication (Star 1995). Furthermore, these information requests are hurriedly written (Star 1995; Fowler and Aaron 1998). Such haste contributes to syntactical, lexical, and inflective ambiguities. The use of shorthand notations often miscommunicates the intended message. Electronic mails frequently leave assumptions about the business process unstated and assumed. These omissions contribute to pragmatic ambiguity. The hurried state of the specification, and the lack of a formal specification process also contribute to extraneous ambiguity (Fowler and Aaron 1998).

Lexical, syntactical, inflective, and, to some extent, extraneous, ambiguity types are functions of the grammar used to write the information request. The longer the request, the more likely the request is to contain these ambiguities (Fowler and Aaron 1998). Concise writing is important to reduce ambiguity. Good written communication skills on the part of the individual making the information request are required.

38

All seven ambiguities arise in the daily business specification of reports. Several strategies are available to reduce their impact. Electronic mails containing information requests need to be concisely drafted and proofread to reduce pragmatic ambiguity. Providing concise specifications and avoiding indirect writing, e.g., pretentious writing and passive voice, reduce the lexical, syntactical, inflective, and extraneous ambiguity of information requests (Fowler and Aaron 1998).

Emoticons (Sanderson 1993) and generally accepted formatting styles can be used to add emphasis to electronic mail. These techniques can reduce emphatic ambiguity.

An objective reading of the information request to reduce innuendo addresses suggestive ambiguity. Explaining the reason for the information request as much as possible will enhance clarity and reduce the perception of hidden agendas.

Each of the above techniques enhance the clarity of the information request and thus increase the effectiveness and efficiency of the response received. These techniques initially increase the time necessary to write the information request. Nonetheless, this paper's results indicate that the result will be an increase in the timeliness, accuracy, and relevance of the information received.

5.1.2 Personnel Turnover and Work Teams

Information systems personnel and end users are frequently engaged on short-term contracts. Turnover in many organisations, and especially within work groups, is high (Moore 2000). As turnover increases, the ambiguity of information requests also tends to increase. End users have less experience and understanding of the organisational culture and thus do not

39

understand the context and assumptions made in information requests. Especially when faced with high turnover of information systems personnel and end users, strategies for reducing the seven ambiguities can significantly benefit the organisation.

Jessup and Valacich (1993) suggest strategies for retaining group memory and enhancing organisational learning. For work teams that often have new members, a library of previous information requests and associated query responses will assist team members to reduce information request ambiguity by providing a context for the request. To function properly, new team members must understand the organisational procedures and have a context within which to function.

Business would benefit from candidly assessing its methodology of making information requests. Using methodologies that result in less ambiguity through formalisation of the information request will reduce errors and improve the efficient use of the time of skilled end users.

40

6. Contributions, Limitations, and Future Research

6.1

Research Contributions

This paper provided significant, unique contributions to the theory of ambiguity, complexity, and end user query performance. The theory of communication linguistics has been applied to end user query performance theory. The theory identified seven ambiguities: lexical, syntactical, inflective, pragmatic, extraneous, emphatic and suggestive. The empirical results obtained for the developed theory are robust, and indicate substantial support.

An instrument to measure ambiguity in an information request, at a finer level than previously available, was developed and applied. Although requiring further refinement, this instrument is a significant advance in the measurement of information request ambiguity.

This paper identifies areas for future research, and examines the implications for business practices. This paper represents a significant advancement of the theory and application to ensure the efficient and effective development of queries by end users.

6.2

Research Limitations

Huck et al. (1974) identify seven issues for the internal validity of experiments. Appendix L provides a detailed analysis of these issues. Appendix L outlines how this experiment's design controlled for each issue.

As with most controlled laboratory experiments with student participants for subjects, there are external validity issues. Generalisation from student subjects to the business setting may

41

be invalid. Students' motivations to obtain a high grade may be different to the business end user. This experiment's use of advanced business and systems undergraduate students as subjects however implies that this generalisation to the business setting is meaningful, as these subjects are reflective of the skill levels of end users in a business context.

Generalising from this paper's results to a business setting is invalid to the extent that the experimental information requests are not representative of information requests made in a business setting. The information requests nonetheless are based on a close model of the business world, undertaking likely real world tasks.

Another limitation is the need to extend the results to more extreme levels of ambiguity. The ambiguity present in the experiment's questions was not extreme. Hence, generalising from the results of the current experiment to more extreme levels of ambiguity may not be valid.

6.3

Future Research

Replication of this experiment, with more ambiguous information requests than those of the current experiment, would strengthen the theoretical model. An experiment designed to examine contextual reduction of the potential ambiguities (lexical, syntactical, and inflective) would also be valuable. The weaker results of the current experiment may derive from a lack of variation in ambiguity for some of the seven types of ambiguity. Instantiating ambiguity into the experiment over a greater range and variation of ambiguity in the information requests would add empirical insight into the theoretical model.

This paper presents what initially appear to be anomalous results for inflective and suggestive ambiguity in the context of duration. A future experiment would do well to investigate the

42

circumstances of these results, and to empirically analyse the relationship between inflective ambiguity, suggestive ambiguity, and duration.

A future experiment having particular regard to end user confidence would significantly assist the development of the theoretical model. None of the hypotheses, with the exception of lexical ambiguity (H2c), is supported for end user confidence. On the basis of the current results, end user confidence often does not reflect the true state of affairs of the query response's accuracy. End users do not appear to know when the query response is inaccurate.

Outside of the domain of laboratory research, an avenue for future research would be a field experiment of ambiguity and the performance of business end users. This experiment would allow the researcher to examine the prevalence and effects of the seven types of ambiguity in actual business settings. Such a study would also make a contribution by assessing the extent to which the current experimental results generalise to the business setting.

An experiment designed to analyse the empirical effectiveness of strategies to mitigate each ambiguity in a business setting would hold considerable value for research and business practice. This would allow the development and subsequent assessment of strategies to reduce the effect of ambiguity on end user query performance.

The development and empirical testing of the ambiguity assessment instrument (Appendix K) would provide the opportunity to refine and enhance the current initial instrument. Future research is necessary to develop a reliable and robust instrument for the measurement of ambiguity in information requests.

43

References Almuallim, H., Akiba, Y., Yamazaki, T., and Kaneda, S. "Learning Verb Translation Rules from Ambiguous Examples and a Large Semantic Hierarchy," Computational Learning Theory and Natural Learning Systems, (4), 1997, pp. 323-336. Athey, S., and Wickham, M. "Required Skills for Information Systems Jobs in Australia". Journal of Computer Information Systems,.(36:2), 1995-1996. Australian Bureau of Statistics. "8669.0 Computing Services Industry, Australia, 1995-96". Australian Bureau of Statistics. 1997. Axley, S.R. "Managerial and organizational communication in terms of the conduit metaphor," Academy of Management Review, (9), 1984, pp. 428-437. Borthick, A.F., Bowen, P.L., and Diery, R.G. "Complexity and Errors in SQL Queries: Development and Empirical Comparison of Complexity Measures." Workshop on Information Technologies and Systems (WITS '97), pp. 31-40, December 13-14 1997. Borthick, A.F., Bowen, P.L., Jones, D.R., and Tse, M.H.K. "The Effects of Information Request Ambiguity and Construct Incongruence on Query Development," Proceedings of the Pacific Asia Conference on Information Systems, June 2000. Campbell, D. J. "Task Complexity: A Review and Analysis," Academy of Management Review, (13:1), 1988, pp. 40-52. Cardinali, R. "Information Systems - A Key Ingredient to Achieving Organizational Competitive Strategy," Computer in Industry, (18:3), 1992, pp. 241-245. Chomsky, N. "Language and Mind," in Ways of Communicating, Cambridge University Press, Cambridge, 1991, pp. 56-80. Conger, S. The New Software Engineering, Wadsworth Publishing, Belmont, California. 1994. Copi, I. M., and Cohen, C. Introduction to Logic (8th ed.), Macmillan, New York, New York, 1990. Cronbach, L. J. "Coefficient Alpha and the Internal Structure of Tests," Psychometrika, (16), 1951, pp. 297-334. Delligatta, A., and Umbaugh, R. E. "EUC Becomes Enterprise Computing," Information Systems Management, Fall 1993, pp. 53-55. Dubin, R. Theory Building, Collier Macmillan Publishers, London, 1978. Eisenberg, E.M., and Phillips, S.R. "Miscommunication in Organizations," in "Miscommunication" and Problematic Talk, Sage Publications, London, 1991. Fischer, D. H. Historians' Fallacies, Harper & Row, New York, 1970. Fowler, H. R., and Aaron, J. E. The Little, Brown Handbook (7th ed.), Addison-Wesley Publishers Inc., New York, New York, 1998. Freeman, L.A., Jarvenpaa, S.L., and Wheeler, B. C. "The Supply and Demand of Information Systems Doctorates: Past, Present and Future," MIS Quarterly, (24:2), June 2000.

44

Halstead, M. H. Elements of Software Science, Elsevier North-Holland Inc, Purdue University, 1977. Hamblin, C. L. Fallacies, Methuen, London, 1970. Huck, S. W., Cormier, W. H., and Bounds, W. G. Jr. Reading Statistics and Research, Harper & Row, New York, New York, 1974. Jespersen, O. Language: its nature, development and origin, Allen & Unwin, London, 1922. Jessup, L.M., and Valacich, J.S. Group Support Systems, Macmillan Publishing Company, New York, New York, 1993. Jih, W.J.K., Bradbard, D.A., Snyder, C.A., and Thompson, N.G.A. "The Effects of Relational and Entity-Relationship Data Models on Query Performance of End Users," International Journal of Man-Machine Studies, (31), 1989, pp. 257-267. Katzeff, C. "Systems Demands on Mental Models for a Fulltext Database," International Journal of Man-Machine Studies, (32), 1990, pp. 483-509. Keen, P.G.W. "Information Technology and the Management Difference: A Fusion Map," IBM Systems Journal, (32:1), 1993, pp. 17-38. Kooij, J.G. Ambiguity in Natural Language, North-Holland Publishing Company, Amsterdam, Holland, 1971. Liew, S.T. "The Effects of Normalization on Query Errors: An Experimental Evaluation," Unpublished Thesis, University of Queensland, 1995. Moore, J.E. "One Road to Turnover: An Examination of Work Exhaustion in Technology Professionals," MIS Quarterly, (24:1), March 2000, pp. 141-168. Nath, R., and Lederer, A.L. "Team Building for IS Success," Information Systems Management, Spring 1996, pp. 32-37. Newbold, P. Statistics for Business and Economics, Prentice-Hall Inc, Englewood Cliffs, New Jersey, 1984. Ogden, W.C., Korenstein, R., and Smelcer, J.B. An Intelligent Front-End for SQL, IBM General Products Division, San Jose, California, 1986. Reilly, R.G. "Miscommunication at the Person-Machine Interface," in "Miscommunication" and Problematic Talk, Sage Publications, London, 1991. Reisner, P. "Use of Psychological Experimentation as an Aid to Development of a Query Language," IEEE Transactions on Software Engineering, SE3:3, 1977, pp. 218-299. Rescher, N. Introduction to Logic, St Martin's Press, New York, New York, 1964. Rho, S., and March, S.T. "An Analysis of Semantic Overload in Database Access Systems using Multi-Table Query Formulation," Journal of Database Management, (8:2), Spring 1997, pp. 3-14. Rosenthal, D.A., and Jategaonkar, V.A. "Wanted: Qualified IS Professionals," Information Systems Management, Spring 1995, pp. 27-31. Russell, B.A.W. "Vagueness," Australasian Journal of Philosophy and Psychology, (1), 1923, pp. 84-92. Ryan, H.W. "User-Driven Systems Development: Defining a New Role for IS," Information Systems Management, Summer 1993, pp. 66-68.

45

Ryle, G. Collected Papers, (2), Hutchinson, London, 1971. Sanderson, D. Smileys, O'Reilly, Sebastapol, California, 1993. Sekine, S., Carroll, J.J., Ananiadou, S., and Tsujii, J. "Automatic learning for Semantic Collocation," Third Conference on Applied Natural Language Processing, 1992, pp. 104100. Severin, W.J., and Tankard, J.W. "Communication Theories: Origins, Methods, and Uses in the Mass Media," Addison Wesley Longman, Inc., New York, New York, 1997. Star, S.L. The Cultures of Computing, Blackwell Publishers/The Sociological Review, Oxford, U.K., 1995. Suh, K.S., and Jenkins, A.M. "A Comparison of Linear Keyword and Restricted Natural Language Database Interfaces for Novice Users," Information Systems Research, (3:3), 1992, pp. 252-272. Tayntor, C.B. "New Challenges or the End of EUC?," Information Systems Management, Summer 1994, pp. 86-88. Trow, C.E. The Old Shipmasters of Salem, New York, New York, 1905. Turner, G.W. (Editor). The Australian Concise Oxford Dictionary of Current English, Oxford University Press, Melbourne, 1987. Walton, D. Fallacies Arising from Ambiguity, Kluwer Academic Publishers, Dordrecht, 1996. Williamson, T. Vagueness, Routledge, New York, New York, 1994. Wood, R.E. "Task Complexity: Definition of the Construct," Organizational Behaviour and Human Decision Processes, (37), 1986, pp. 60-82.

46

Appendix A: Experiment Information Requests and Model Answers No. 1.

Formulation Information Request Ambiguous

Management wants a list of each of our suppliers with no duplicates in the list.

Clear

List the distinct suppliers of the items we stock.

Model Answer (Halstead’s Complexity: 1.6927): Select distinct(item_maker) from inventory; 2.

Ambiguous

Produce a report that lists the inventory items where the quantity on hand is much larger, on a percentage basis, than the quantity ordered.

Clear

List item number, item name, quantity on hand, quantity on order where quantity on hand is greater than 2 * quantity ordered.

Model Answer (Halstead’s Complexity: 5.4186): Select item_no, item_name, qty_hand, qty_ordered from inventory where qty_hand > 2 * qty_ordered; 3.

Ambiguous

Management wants a list of all Japanese customers and customers with credit limits over $15,000.

Clear

List customer numbers, customer names, country, and credit limit of customers with credit limits greater than $15,000 or of customers in Japan.

Model Answer (Halstead’s Complexity: 6.8908): Select cust_no, cust_name, country, credit_limit from customer where country = 'Japan' or credit_limit > 15000; 4.

Ambiguous

Produce a report that statistically compares the credit limits for customers in different countries.

Clear

List country, average credit limit, and standard deviation of customer credit limit grouped by country.

Model Answer (Halstead’s Complexity: 4.4697): Select country, avg(credit_limit), stddev(credit_limit) from customer group by country; 5.

Ambiguous

Produce a report of clients that prefer the Speedair carrier and addresses.

Clear

List customer number, customer name, street, city, post code, and country where the customer's preferred carrier is Speedair.

Model Answer (Halstead’s Complexity: 12.2917): Select cust_no, cust_name, street, city, state, post_code, country From customer, carrier where customer.pref_carrier_code = carrier.carrier_code and carrier_name = ‘Speedair’;

47

No. 6.

Formulation Information Request Ambiguous

We're wondering if some of our winemakers are using poor quality packaging and bottles - we've had a few complaints. Can you get us a report that gives us some sort of idea about what items we are shipping compared to what the customers are taking delivery of? It would probably be a good idea while you're at it to give a comparative percentage of the stuff shipped that doesn't make it just so the vintners won't try and weasel their way out of it, you understand, they're good at that.

Clear

List item maker, item number, item name, and 100 * (sum of quantity shipped less sum of quantity accepted) / (sum of quantity shipped) where the type of alcohol is wine.

Model Answer (Halstead’s Complexity: 18.8): Select item_maker, inventory.item_no, item_name, 100 * (sum(qty_shipped - qty_accepted) / sum(qty_shipped)) From inventory, invoiceitem where inventory.item_no = invoiceitem.item_no and type_of_alc = "wine" Group by item_maker, inventory.item_no, item_name; 7.

Ambiguous

Prepare a report that provides *all* customer's details and indicates the number of different products they have ordered from us.

Clear

List customer number, and customer name for *all* customers, and, if they have ordered anything, a count of unique items ordered.

Model Answer (Halstead’s Complexity: 16.0076): Select customer.cust_no, cust_name, count(distinct(item_no)) from customer, invoice, invoiceitem where customer.cust_no = invoice.cust_no (+) and invoice.invoice_no = invoiceitem.invoice_no (+) group by customer.cust_no, cust_name; 8.

Ambiguous

Management wants to know which customers we've shipped goods more than 10 times to them by the shipper that they requested.

Clear

List customer number, name, and count of invoices, where the actual carrier is the same as the customer's preferred carrier, having more than 10 shipments.

Model Answer (Halstead’s Complexity: 16.2684): Select customer.cust_no, cust_name, count(*) from Invoice, Customer where invoice.cust_no = customer.cust_no and invoice.carrier_code = customer.pref_carrier_code group by customer.cust_no, cust_name having count(*) > 10;

48

No. 9.

Formulation Information Request Ambiguous

Produce a report, with best items first, on the gross contribution to profitability of each inventory item for July 1999.

Clear

List item number, item description, and (unit price less unit cost) multiplied by units sold in July 1999. Sort your output by descending gross contribution to profitability.

Model Answer (Halstead’s Complexity: 23.897): select inventory.item_no, item_name, avg(avg_unit_price - avg_unit_cost) * sum(qty_accepted) from invoice, invoiceitem, inventory where invoice.invoice_no = invoiceitem.invoice_no and invoiceitem.item_no = inventory.item_no and deliver_date between '1-Jul-99' and '31-Jul-99' group by inventory.item_no, item_name order by 3 desc; 10.

Ambiguous

Produce a report with the relevant customer details that gives us an idea of how much of our business is exposed to foreign currency fluctuations.

Clear

List customer number, customer name, customer country, and a total of the amount paid where the settlement currency code for the invoice is not equal to the currency code for Australian dollars. Group results by customer number.

Model Answer (Halstead’s Complexity: 19.4819): Select customer.cust_no, cust_name, country, sum(amt_paid) from customer, invoice, currency where customer.cust_no = invoice.cust_no and invoice.currency_code = currency.currency_code and currency.currency_name <> ‘Australian Dollar’ Group by customer.cust_no, cust_name, country; 11.

Ambiguous

Management is concerned about current slow-moving inventory items, based on shipments since 1 June 1999. Produce a report of the items that they might be most concerned about.

Clear

List inventory item number, item description, quantity on hand, and sum(quantity shipped) with ship dates greater than 1 June 1999 that have sums of the quantity shipped less than the sums of the quantity on hand.

Model Answer (Halstead’s Complexity: 22.4): Select inventory.item_no, item_name, sum(qty_hand), sum(qty_shipped) from inventory, invoiceitem, invoice where inventory.item_no = invoiceitem.item_no and invoiceitem.invoice_no = invoice.invoice_no and ship_date > ‘1-Jun-99’ group by inventory.item_no, item_name having sum(qty_shipped) < sum(qty_hand);

49

No.

Formulation Information Request

12.

Ambiguous

Produce a report that gives some idea about our best USA export items where the amount since March is bigger than $5,000.

Clear

List item numbers, item descriptions and the total accepted quantity times agreed price of each item for items shipped to US customers since 1 March 1999 and having a total accepted quantity times agreed price greater than $5,000.

Model Answer (Halstead’s Complexity: 29.1633): select inventory.item_no, item_name, sum(qty_accepted * agreed_unit_price) from invoice, invoiceitem, inventory, customer where invoice.invoice_no = invoiceitem.invoice_no and invoiceitem.item_no = inventory.item_no and customer.cust_no = invoice.cust_no and ship_date > '1-Mar-99' and country = ‘USA’ group by inventory.item_no, item_name having sum(qty_accepted * agreed_unit_price) > 5000; 13.

Ambiguous

Produce a report showing our Japanese client base that didn't order anything in July. We're going to need an idea of how many invoices and things like that that we have for them. We're concerned about why our orders have dropped off. Can you use that statistical thing (you know, the one that gives an idea of how the numbers are varying, not variance, the other one) to show whether the date the stuff is delivered is different to the date they wanted the stuff?

Clear

List customer number, customer name, number of invoices, and standard deviation of the difference between the deliver date and the want date for Japanese customers who did not place an order in July 1999.

Model Answer (Halstead’s Complexity: 24.0168): select customer.cust_no, cust_name, count(invoice_no), stddev(deliver_date - want_date) from customer, invoice where customer.cust_no = invoice.cust_no and country = 'Japan' and customer.cust_no not in (select cust_no from invoice where order_date between '1-Jul99' and '31-Jul-99') group by customer.cust_no, cust_name; 14.

Ambiguous

We want to have a mail-out to our best customers (say, those who paid us more than $5000 or so recently, and those with credit limits over $20,000). We're interested in seeing if we can move that new Hunter Valley shipment. Can you get us a mailing list?

Clear

List customer number, name, street, city, state, post code, and country for those customers with credit limits greater than $20,000 or since 1 July 1999 have total paid invoices of more than $5,000.

Model Answer (Halstead’s Complexity: 29.9607): select customer.cust_no, cust_name, street, city, state, post_code, country from customer, invoice where customer.cust_no = invoice.cust_no group by customer.cust_no, cust_name, street, city, state, post_code, country having sum(amt_paid) > 5000 UNION select customer.cust_no, cust_name, street, city, state, post_code, country from customer where credit_limit > 20000;

50

No.

Formulation Information Request

15.

Ambiguous

Produce a report that shows the percentage of orders where we're not meeting customers' delivery date expectations in each country.

Clear

Count all invoices, where the date the order was delivered was larger than the date the customer wanted the order. Group by country. Calculate the percentage of late orders by country.

Model Answer (Halstead’s Complexity: 34.992): Create View TotalOrders as select country, count(*) Total_Orders from customer, invoice here customer.cust_no = invoice.cust_no group by country; Create view LateOrders as select country, count(*) Late_Orders from customer, invoice where customer.cust_no = invoice.cust_no and deliver_date > want_date group by country; Select total_orders.country, 100*(late_orders / total_orders) Percent_Late_Orders from lateorders, totalorders where totalorders.country = lateorders.country; 16.

Ambiguous

Produce a report that shows, by country, which carriers are, on average, not meeting their expected delivery times.

Clear

List carrier code, carrier name, country, and average of (delivery days less the difference between delivery date and ship date) by country having that average difference greater than 1 day.

Model Answer (Halstead’s Complexity: 40.1661): select carrier.carrier_code, carrier_name, delivdays.country avg((deliver_date - ship_date) - deliver_days) from carrier, invoice, customer, delivdays where carrier.carrier_code = invoice.carrier_code and invoice.cust_no = customer.cust_no and carrier.carrier_code = delivdays.carrier_code and customer.city = delivdays.city and customer.state = delivdays.state and customer.country = delivdays.country group by carrier.carrier_code, carrier_name, delivdays.country having avg((deliver_date - ship_date) - deliver_days) > 1;

51

Appendix B: Experiment Instruction Sheet

INSTRUCTIONS This laboratory session requires you to execute command files and query a database. Please follow the instructions carefully.

52

Part 1 - Scenario George Harford Wine Merchant distributes wines throughout the world. They predominantly trade with customers in France, Japan, the USA, and the UK. Customers place orders for wines which employees process, pack, and ship to the customers via an appropriate carrier. The packers attach an invoice created by the Accounts Receivable department to the goods when shipped. These invoices contain all relevant information generated from the invoice and inventory databases. The data structures for the relevant tables are attached.

53

Part 2 - SQL Syntax Reminder The SQL syntax for SELECT commands follows. Items in square brackets [ ] are optional, and items in braces { } can be repeated zero or more times: SELECT [DISTINCT]*|(((table. | view.)column | expression) [alias] {, ((table. | view.)column | expression)[alias]}) FROM (table|view)[alias]{,(table | view)[alias]} WHERE condition {, condition} [GROUP BY expression{,expression} [HAVING condition{,condition}]] [(UNION|UNIONALL|INTERSECT|MINUS) SELECT command] [ORDER BY (expression|position)[DESC]{,(expression|position) [DESC]}]; Only under highly unusual circumstances should you formulate a select command that contains more than one table in the FROM clause without a join in the WHERE cause. As a general rule, the number of joins should equal to the number of foreign key attributes. Except for extremely rare queries that usually produce only summary results (such as counting the number of records in a table), all SQL queries, even those involving only one table, should include WHERE conditions.

You may need to use some of the following keywords AND AVG COUNT DISTINCT IN MAX

MIN NOT NULL OR STDDEV SUM

SYSDATE UNIQUE VARIANCE (+) (outer join)

The SQL syntax for VIEW commands follows CREATE VIEW viewname AS (SELECT command); When you create a view with the same name as an already-existing view (for example, you rerun your query), you will need to drop the already-existing view: DROP VIEW viewname; Reminders:  Aliases for columns in views should not be enclosed in quotes.  If you have multiple join conditions, i.e., more than one foreign key or a concatenated foreign key, you may need to put the outer join symbol on other join conditions.

54

Part 3 - Getting started Log into your area on valinor. For the purposes of assessment, everything you do in this laboratory session needs to be recorded and sent to the instructor. Follow the instructions carefully. In particular, please refrain from running more than one session on valinor because running more than one session will mean that all your query attempts will not be recorded. To begin this quiz, type the following at the valinor prompt: valinor> ksh valinor> /home/staff/bowen/startqz199b Follow the instructions given by the program carefully. You can attempt each query as many times as you wish. You should note that once you accept a query, you cannot return to the question again.

55

Part 3 - Getting started Log into your area on valinor. For the purposes of assessment, everything you do in this laboratory session needs to be recorded and sent to the instructor. Follow the instructions carefully. In particular, please refrain from running more than one session on valinor because running more than one session will mean that all your query attempts will not be recorded. To begin this quiz, type the following at the valinor prompt: valinor> ksh valinor> /home/staff/bowen/startqz199a Follow the instructions given by the program carefully. You can attempt each query as many times as you wish. You should note that once you accept a query, you cannot return to the question again.

56

Part 4 - Your Mission You are an internal auditor at George Harford. On 16 August 1999, your supervisor approaches you with a list of questions. Some questions were designed by the supervisor, who knows SQL well. Your supervisor was also given questions from management, who do not know SQL all that well. Your task is to formulate and execute SQL queries to answer these questions. Your supervisor is gone for the day and getting answers for these questions is urgent. Therefore, you need to make your best interpretation of the questions from management. You can discuss with your supervisor the assumptions you made after she returns. However, she will be most annoyed if you do not make an attempt to answer as many of the questions as you can prior to her return. The questions have been structured so that easier questions appear first and then become progressively more difficult. Your supervisor wants to see the complete SQL queries that you use. When the question is phrased asking for a name, your query should use criteria that include that name i.e. you should not look up the code to avoid joining to the table that contains the name.

57

Appendix C: Command Interpreter Unix Shell Script

Two Unix Shell Scripts were used to operate the experiment. The two scripts were essentially identical except that they used different source data depending on the treatment initially received by the different experimental groups (the variable $quizfile). This script has been developed, modified, and enhanced from previous experiments undertaken within the Faculty of Commerce at the University of Queensland (Borthick et al. 1997; Borthick et al. 2000). The interface source code had been previously developed by Mr Andrew Jones.

58

#!/bin/ksh ## /\ndy. 28/08/98.

version 0.02

## NB. this script requires ksh because it uses "read -u". ## The rest of it should run in any sh-compatible shell (sh, bash, ksh etc) ## ## ## ## ## ##

DoLog() - A utility function to append a message to our log file. As it stands, each line contains the username, process ID, date, time, and a message eg. [jones] <4268> 28/08 11:41:09: Displaying question 3 [jones] <4298> 28/08 11:41:12: Attempting question 3 Attempt number 1

DoLog() { ## %a = day, %e = date, %m = month. %T = time. now=`date +"[$username] <$$> %e/%m %T:"` echo "$now $*" >> $logfile } ## Obtain the username of the person running this program, for the log file. ## No need to change this. ###username=${USER:-$LOGNAME} username=`whoami` ## ## ## ## ## ## ## ## ## ## ##

CONFIGURE THIS: "quizfile" is a variable which contains the name of the file with the questions you wish to present to the students. You should edit this script to set this variable to the appropriate value. If this variable is null, then the program will expect a single command-line argument, which will be the filename of the question file. The question file should contains questions, one per line. Note that the user running this program must have access privs to the question file and the directories above it...

## eg. quizfile="/home/staff/bowen/questions" quizfile="/home/staff/bowen/questions99qz1b"

## ## ## ## ## ## ## ## ## ## ## ## ##

CONFIGURE THIS: Location of the log file to record what people do. You can reset this to whatever you like, but make sure that everyone can append to it. Also note that files in /tmp disappear when valinor is restarted. /var/tmp might be safer, but who knows. Probably best if you make a logfile directory in your home dir, chmod it to mode 1777 and put the log files in there... Note: If the log file does not already exist, this program will now create it. This better allows per-user log files to work. However, if you are using only one log file, it is a better idea if you create and chmod it yourself...

59

#logfile="/var/tmp/sql.log"

# one log for all users..

#logfile="/var/tmp/sql.$username.log" # one log per user... logfile="/home/staff/bowen/logfile/qz199/$username.log"

## Editor to use. pico is the easiest.. esp if we run it in "tool" mode... editor="pico -t" ## temporary filenames. tmp="/tmp/qn-$username.$$" attfile="$HOME/answer.$$" qnum=1 attnum=0

# question number # attempt number

## Set up a clean up routine to clean up after ourselves in case we die.. trap 'rm -f "$attfile" "$tmp"; exit 1' 1 3 15 8 ## "echo -n" is supposed to print without a newline. ## This little hack ensures it will on valinor... PATH=/usr/ucb:${PATH} ## --------------------------------------------------------------------## End of configuration section: Start of program. ## Create the log file if it doesn't exit... if [ ! -f "$logfile" ] then > $logfile chmod 666 $logfile DoLog "StartUp: Created this Log file." fi if [ -z "$quizfile" ] then ## No $quizfile, so we expect a question file command-line argument. if [ $# != 1 ] then echo "Usage: `basename $0` file-with-questions" DoLog "Error: No quizfile and no cmd line argument." exit 1 fi quizfile="$1" fi ## Make sure we can read the file. NB. this requires some permissions on the ## directory containing the file, and that directory's parent, and ... if [ ! -f "$quizfile" ] then echo "Error: Unable to read file: \"$quizfile\"." DoLog "Error: Can't open file $question (pwd=`pwd`)" exit 2

60

fi

## Splash screen telling them what will happen. DoLog "Startup: Showing splash screen." clear cat <<ENDOFBLURB CO365 DATABASE MANAGEMENT SYSTEMS IN BUSINESS QUIZ ONE In this exercise, you will be presented with a series of problems. The first problem will be displayed, and then the system will wait for you to hit the (aka the <ENTER>) key. This gives you time to read and absorb the problem. After you hit the <ENTER> key, you will be taken into the user-friendly editor "pico", where you can compose a solution. When you are satisfied, quit the editor with the Control-X command. Your solution will be run, and any output will be displayed on your screen. You will then be asked whether you are happy with your solution. If you are not, then you can re-edit your first attempt and try again. Otherwise, you will be asked to rank your confidence in your solution. You then continue on to the second problem, and so on... ENDOFBLURB echo -n "Hit the key to continue." read junk echo echo clear DoLog "Startup: Finished showing splash screen." exec 3<"$quizfile" qnum=1 ## This is the main loop of the program. while read -u3 question do ## if we are between questions, make the screen tidier. if [ "$qnum" -gt 1 ] then clear ## echo echo "Ok. Onto the next question." echo fi thisattmpt="retry" attnum=0 # attempt number > $attfile

61

## attempt the current question. while [ "$thisattmpt" != "accept" ] do attnum=`expr $attnum + 1` clear echo "Question #$qnum:" echo echo "$question" echo if [ $attnum = 1 ] then echo echo "--------------------------------------------------" echo "When you are finished reading the question, hit the <ENTER> key, to start" echo -n "using an editor to create your solution. " DoLog "Displaying question $qnum" else echo echo "--------------------------------------------------" echo "Your current solution is ..." sed -e 's/^/| /' < $attfile echo echo -n "Hit the <ENTER> key to re-edit this... " fi # pause here until they hit RETURN read junk DoLog " Attempting question $qnum Attempt number $attnum" $editor $attfile ## cp $attfile $username.sql ## echo "quit" >> $username.sql echo echo "Ok. Now testing this solution..." echo ## FIXME: Need to make sure that the Oracle environment ## is properly set up so that they can run sqlplus... ## Plus, the /dev/null thing is crude, but probably enough to ## prevent them getting into an interactive oracle session... sqlplus / @$attfile < /dev/null ## ## ## ##

Reformat of output allows users to use data more interactively. Micheal Axelsen 1999. Disabled since they can then end up in a cartesian product join.

## echo "Attempting Question: $qnum" > $username.lst ## echo "" >> $username.lst ## cat "$question" >> output_screen ## echo "" >> $username.lst ## ## ## ##

echo "Your SQL Query:" >> $username.lst echo >> $username.lst cat $attfile >> $username.lst echo "" >> $username.lst

62

## echo "Results:" >> $username.lst ## sqlplus / @$username.sql >> $username.lst ## $editor $username.lst ## Should we pipe output into less for them to see? echo ## Should we capture their attempt? DoLog " The attempt was ..." sed -e "s/^/[$username] <$$> Qn: $qnum Att: $attnum $attfile >> $logfile

/" <

## ask if happy with this attempt or not echo "Are you happy with this attempt, or do you want to try again?" PS3="Choice: " select thisattmpt in retry accept do if [ -n "$thisattmpt" ] then echo "Ok." break fi echo "Invalid response. Try again." done echo done DoLog "Completed question $qnum Number of attempts was $attnum" ## DoLog "The final solution was ..." ## sed -e 's/^/| /' < $attfile >> $logfile

## Ask here how confident they are... echo "How confident are you about your solution?" PS3="Confidence? " select conf in "85-100%" "70-85%" "55-70%" "40-55%" "25-40%" "10-25%" "<10%" do if [ -n "$conf" ] then echo "Ok." break fi done DoLog "Confidence for question $qnum was $conf" echo echo "Ok. Now what?" PS3="What now? " select whatnow in "Contine to next question" "Quit" do if [ -n "$whatnow" ]

63

then break fi done if [ "$whatnow" = "Quit" ] then echo echo "Are you sure you want to quit?" PS3="Confirm quit: " select confirm in yes no do if [ -n "$confirm" ] then break fi done if [ "$confirm" = "yes" ] then echo "Ok. Quitting now." break else echo "Ok. Not quitting." fi fi ## NB. It's more efficient to use the shell's built in arithmetic... qnum=`expr $qnum + 1` done DoLog "Quitting." rm -f "$attfile" "$tmp" echo "Bye..."

64

Appendix D: Experiment Entity-Relationship Diagram

Delivdays Carrier_code+ City+ State+ Country+ Deliver_days

FK = Carrier_code

Carrier Carrier_code+ Carrier_name Carrier_type

FK = Carrier_code

Customer Cust_no+ Cust_name Phone_no Street City State Post_code Country Credit_limit Outstanding_bal Pref_carrier_code

FK = Cust_no

FK = Fob_code Fob Fob_code+ Fob_name

Invoice Invoice_no+ Order_date Cust_no Ship_date Want_date Deliver_date Paid_date Fob_code Disc_pct Disc_days Currency_code Amt_paid Carrier_code Emp_no

FK = Emp_no Employee Emp_no+ Emp_name

FK = Currency_code + [Appropriate Dates]

FK = Invoice_no

Inventory Item_no+ Item_name Item_maker Item_package Item_year Type_of_alc Alc_category Alc_content Avg_unit_cost Unit_meas Avg_unit_price Qty_hand Qty_ordered

FK = Item_no

Invoiceitem Invoice_no+ Item_no+ Unit_meas Quoted_unit_price Agreed_unit_price Qty_shipped Qty_accepted Diff_cause

FK = Foreign Key + Primary Key

65

Currency Currency_code+ Currency_name Currency_date+ Currency_rate

Abbreviation

Type

Description

Table: Invoice Invoice_no Order_date Cust_no Ship_date Want_date Deliver_date Paid_date Fob_code Disc_pct Disc_days Currency_code Amt_paid Carrier_code Emp_no

Char(7) Date Char(5) Date Date Date Date Char(1) Number Number Char(1) Number Char(5) Char(4)

Invoice number Date the order was placed Customer number Date the order was shipped Date the order was wanted by the customer Date the order was delivered Date the invoice was paid FOB code {1,2} Discount percent, e.g. 1, 1.5, 2, 2.25 Discount days - start day depends on FOB Settlement currency code Amount paid in Australian dollars Carrier code of carrier that delivered the order Employee number of person who packed the order

Table: Customer Cust_no Cust_name Phone_no Street City State Post_code Country Credit_limit Outstanding_bal Pref_carrier_code

Char(5) Char(20) Char(15) Char(30) Char(20) Char(20) Char(10) Char(20) Number Number Char(5)

Customer number Customer's name Customer's telephone number Customer's street address Customer's city Customer's state Customer's post code Customer's country Customer's credit limit Customer's outstanding balance (amount owing) Customer's preferred carrier

Table: Carrier Carrier_code Carrier_name Carrier_type

Char(5) Char(20) Char(8)

Carrier code Carrier's nae Type of carrier {air, surface}

Table: Fob Currency_code Currency_name Currency_date Currency_rate

Char(1) Char(15) Date Number

Currency code Name of currency Date for which the currency rate applies Currency rate as of the currency date, i.e. the number of units of the currency that one Australian dollar will purchase, e.g., one Australian dollar can currency be exchanged for approximately 0.65 US dollars.

66

Table: Delivdays Carrier_code City State Country Deliver_days

Char(5) Char(20) Char(20) Char(20) Number

Carrier code Deliver to city Deliver to state Deliver to country Expected number of calendar days for the carrier to deliver merchandise to the city, state, and country, i.e., the carrier's estimate of the time required to deliver an order to the destination described by city, state, and country.

Table: Employee Emp_no Emp_name

Char(4) Char(20)

Employee number Employee's name

Table: Invoiceitem Invoice_no Item_no Unit_meas Quoted_unit_price Agreed_unit_price Qty_shipped Qty_accepted Diff_cause

Char(7) Char(7) Char(5) Number Number Number Number Char(15)

Invoice number Inventory item number Unit of measure for item {case, each} Quoted unit cost of the item in Australian dollars Agreed unit cost of the item in Australian dollars Quantity of the item shipped to the customer Quantity of the item accepted by the customer Reason for differences in costs or quantities {broken bottle, damaged cork, late delivery, no diff, shortage, sugary, vinegary}

Table: Inventory Item_no Item_name Item_maker Item_package

Char(7) Char(20) Char(20) Char(15)

Item_year Type_of_alc Alc_category

Number Char(5) Char(15)

Alc_content

Number

Avg_unit_cost

Number

Unit_meas Avg_unit_price

Char(5) Number

Qty_hand Qty_ordered

Number Number

Inventory item number Name or description of the item Maker of the item, e.g. the vintner How each component of the item is packaged {bottle, can, cardboard box} Year the item was produced. Type of alcohol {beer, wine} Alcohol category {dark, dry, full strength, light, mid-strength, red, sparkling, white} Alcohol content e.g. full strength beers are typically about 5.0 (percent) and wines are typically between 12 and 14 (percent) Average price per unit at which the item was purchased from the item maker Unit of measure for item {case, each} Average price per unit at which the item is sold to customers Quantity of the item on hand Quantity of the item ordered in the last 12 months

67

Appendix E: Experimental Design

Stratification Into Group A and Group B

To control for a testing effect (Huck et al. 1974), and to ensure even representation of skill sets across Group A and Group B, participants were stratified into classes. This stratification was in accordance with participants' previous subject enrolments. Participants within each strata class were then ranked according to their current enrolment subject and their performance in earlier subjects, and their experience with database query languages. Thirteen groups were used to classify participants. Table 11 shows the final strata class ordering, and the number of participants in each strata class.

This process resulted in a ranked listing of participants from one to sixty-six. The experimental treatment effect of manager-English (ambiguous) and pseudo-SQL (clear) was assigned randomly to the first student on this list and then alternately to each student thereafter. This resulted in two student groups with equivalent participant counts: Group A and Group B. Group A's first question formulation was ambiguous, and then alternately clear and ambiguous thereafter. Group B's first question formulation was clear, and then alternately ambiguous and clear thereafter.

68

Table 11 Participant Strata Classes

Strata Class 865(1)

Participant Count 4

365(1)

1

365(2)

1

865(2)

15

365(3)

10

865(3)

2

365(4)

13

865(4)

3

365(5)

6

365(6)

3

865(5)

6

365(7)

2

Description Students in the postgraduate Database Design subject who had previously participated in more than one similar experiment. Students in the undergraduate Database Design subject who had previously participated in more than one similar experiment. Computer Science students in the undergraduate Database Design subject who had previously participated in a similar experiment. Students who had undertaken a database design course previously and enrolled in the postgraduate database design subject. Students who had undertaken a database design course previously and undertaking the undergraduate database design course. Students who had undertaken a database design course previously (but not at University of Queensland) and undertaking the postgraduate database design course. Students who had undertaken advanced information systems courses previously and undertaking the undergraduate database design course. Students who had undertaken information systems courses previously and undertaking the postgraduate database design course. Students who had undertaken introductory computer courses previously and undertaking the undergraduate database design course. Students who had undertaken no information system or computer courses previously and undertaking the undergraduate database design course. Students undertaking the postgraduate database design course with no available academic history. Students undertaking the undergraduate database design course with no available academic history.

69

The Experiment

The experiment was held over two days during the fourth week of instruction. Students undertook a two hour closed-book (no reference material allowed) experiment on computer, with no perusal time, in their normal classes. The random assignment of membership to Group A and Group B had the purpose and effect of ensuring an even representation of Group A and Group B in each class.

Participants knew before the experiment that questions increased in complexity, that there were sixteen questions in total, and that, once a question had been completed, they could not return to their answer. Participants were also aware that the number of attempts they made on the question did not affect their mark.

An instruction sheet was provided to participants (refer Appendix B), depending on the treatment group (A or B) to which the participant had been previously assigned. The only point of difference between the two groups' instruction sheet was the name of the Unix command script file to use: startqz199a for Group A and startqz199b for Group B. The instruction sheet contained an overview of SQL syntax as a reference for participants. Further, an entity-relationship diagram was provided to describe the database being used, as reproduced in Appendix D.

Participants could make reference notes on working paper if they required. Participants returned these materials to the examiner at the end of the experiment. The question formulations used in the experiment and model answers are reproduced in Appendix A.

70

There were two examiners present (the course lecturer and the researcher). Assistance was provided to participants in the operation of the experimental program (the Unix command script). Assistance was also provided on some technical aspects of SQL on request.

User Interface and Query Development Process

Appendix C contains an example of the Unix command interpreter script used by participants to enter information using the relatively easy-to-use Pico editor, with which they were familiar. The command interpreter presented the question to the participant. On the completion of an attempt, the SQL result set was displayed. If the participant did not consider the results presented to be their final response, the participant could return to the SQL formulation. If the participant considered the result satisfactory, the participant would be prompted to rank their confidence in the solution, and proceed to the next question. Hence, the participant was able to interactively build and test their response until they were confident in their answer. This confidence was self-assigned on the following scale: >85100%, 70-85%, 55-70%, 40-55%, 25-40%, 10-25% and <10%.

The questions were only available electronically. The questions were presented alternately ambiguous (natural language) and clear (pseudo-SQL). A participant in Group A received an ambiguous formulation for Question One, clear for Question Two, ambiguous for Question Three, and so on. A participant in Group B had clear for Question One, ambiguous for Question Two, clear for question three, and so on. The required answer was identical for both formulations of the same question.

71

Appendix F: Error Marking Sheets

Semantic Error Counting Form

User Name

Question Number

Attempts

Confidence: Duration: MICRO ERRORS Keywords View

Select

From

Where Join

Where Cond

Group by

Having

Order by

Select

From

Where Join

Where Cond

Group by

Having

Order by

From

Where Join

Where Cond

Group by

Having

Order by

Select

From

Where Join

Where Cond

Group by

Having

Order by

Select

From

Where Join

Where Cond

Group by

Having

Order by

Select

From

Where Join

Where Cond

Group by

Having

Order by

Select

From

Where Join

Where Cond

Group by

Having

Order by

Union

Intersect

Minus

Symbols View

Logical Operators View

Select

Relational Operators View

Tables View

Attributes View

Values View

Set Operators Where

MACRO ERRORS Columns

Rows

Aggregation

72

SQL Challenge Error Counting Form

User Name

Question Number

Attempts

Confidence: SQL CHALLENGE EXPRESSION Present

Challenge Response

Comment

Distinct Keyword in Select Clause

P/A

1

2

3

4

5

6

7

Built-in Function (Avg, Sum, Std Dev, etc)

P/A

1

2

3

4

5

6

7

Mathematical Expression in Select Clause

P/A

1

2

3

4

5

6

7

Mathematical Expression in Where Clause

P/A

1

2

3

4

5

6

7

Mathematical Expression in Having Clause

P/A

1

2

3

4

5

6

7

ERD (Join not shown on ERD)

P/A

1

2

3

4

5

6

7

Join

P/A

1

2

3

4

5

6

7

Outer Join

P/A

1

2

3

4

5

6

7

Subquery

P/A

1

2

3

4

5

6

7

Or (Where or Having)

P/A

1

2

3

4

5

6

7

Between

P/A

1

2

3

4

5

6

7

Not Equal

P/A

1

2

3

4

5

6

7

Group By

P/A

1

2

3

4

5

6

7

Having

P/A

1

2

3

4

5

6

7

View

P/A

1

2

3

4

5

6

7

73

Intermediate Error Counting Form

User Name

Question Number

Confidence: Column Errors Missing Extra Wrong (in contrast with missing & extra columns) Table Errors Missing Extra Wrong Row Restriction Missing Extra Wrong Logical Operator Join Restrictions Missing Extra Wrong Aggregation Level (Group by/Aggregation in Select) Missing Extra Wrong Aggregation Restriction (Having) Missing Extra Wrong Sort/Order by Missing Wrong Attribute Order Wrong Direction (ascending, descending) Wrong

74

Attempts

Appendix G: Annotated Corrected Participant Response

This appendix provides an annotated example of the process used to correct participant responses according to the model answer. This question was chosen to provide a flavour of the methodology used to determine and classify errors. The response shown here is the fifth participant's response (in order of assessment) to the third question.

Model Answer:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000 or country = 'Japan';

Actual Response:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000 and country = 'japan';

Annotated Response:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000 and (1) or (2) country = 'j (3) J(4)apan';

In this annotated response, the superscript number in brackets indicates the error count. In this response there are four micro errors.

75

Micro Error Sheet:

Errors (1) and (2) result in a total of two logical operator errors in the WHERE COND clause. Errors (3) and (4) result in a total of two value errors in the WHERE COND clause.

Macro Error Sheet

There are two row errors here, as there are two errors in the WHERE COND clause.

SQL Challenge Sheet

The SQL Challenge presented in this question is the "Or (Where or Having)" challenge. The challenge is present, and the participant's response to the challenge was poor, resulting in a "1" assessment.

Intermediate Error Counting Sheet

In this response there are two row restriction errors, one "wrong" row restriction and one "logical operator" error.

76

Ambiguity one-sided p

GPA

Suggestive

Emphatic

Extraneous

Pragmatic

Inflective

Syntactical

Lexical

Total Errors

Duration

1.0000 0.0000

Complexity

-0.0330 1.0000

one-sided p

0.2488 0.0000

Attempts

0.1247 0.3312 1.0000

one-sided p

Confidence

Attempts

Complexity

Ambiguity

Appendix H: Pearson Correlation Matrix of Variables

0.0050 0.0000 0.0000

Confidence

-0.0961 -0.2463 -0.4242 1.0000

one-sided p

0.0239 0.0000 0.0000 0.0000

Duration

0.1729 0.2932 0.6905 -0.4282 1.0000

one-sided p

0.0002 0.0000 0.0000 0.0000 0.0000

Total Errors

0.2421 0.4783 0.2742 -0.3241 0.3653 1.0000

one-sided p

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Lexical

0.7169 -0.0593 0.0847 -0.1213 0.2241 0.2165 1.0000

one-sided p

0.0000 0.1114 0.0406 0.0062 0.0000 0.0000 0.0000

Syntactical

0.6103 -0.1196 0.0532 0.0153 -0.0122 -0.0491 0.0855 1.0000

one-sided p

0.0000 0.0068 0.1367 0.3769 0.4007 0.1564 0.0391 0.0000

Inflective

0.3957 -0.0219 -0.0602 0.0698 0.0118 0.2534 0.2816 0.1606 1.0000

one-sided p

0.0000 0.3266 0.1079 0.0754 0.4045 0.0000 0.0000 0.0004 0.0000

Pragmatic

0.4735 -0.1131 0.0877 -0.0403 0.1057 0.2521 0.4378 0.1257 0.2299 1.0000

one-sided p

0.0000 0.0098 0.0354 0.2035 0.0146 0.0000 0.0000 0.0048 0.0000 0.0000

Extraneous

0.1855 0.3333 0.1410 -0.0223 0.2183 0.5764 0.2616 -0.2611 0.5837 0.3314 1.0000

one-sided p

0.0001 0.0000 0.0018 0.3234 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Emphatic

0.7173 0.1914 0.1886 -0.1490 0.2482 0.3588 0.7100 0.2746 0.1177 0.2486 0.2870 1.0000

one-sided p

0.0000 0.0000 0.0000 0.0010 0.0000 0.0000 0.0000 0.0000 0.0076 0.0000 0.0000 0.0000

Suggestive

0.4930 0.2863 0.1432 -0.0270 0.1927 0.5611 0.3881 0.1127 0.5723 0.4139 0.8347 0.4058 1.0000

one-sided p

0.0000 0.0000 0.0015 0.2893 0.0000 0.0000 0.0000 0.0101 0.0000 0.0000 0.0000 0.0000 0.0000

GPA (n=420)

0.0000 0.1256 -0.0842 0.1764 -0.1256 -0.1313 -0.0282 0.0336 0.0099 0.0079 -0.0013 0.0010 0.0275 1.0000

one-sided p

0.4999 0.0050 0.0424 0.0001 0.0050 0.0035 0.2820 0.2463 0.4196 0.4358 0.4891 0.4919 0.2869 0.0000

77

Appendix I: Analysis of Ambiguity's Effect On Error Type Question One SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total Question Two SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type A C A C A C A C A C A C A C A C A C Type

Keywords 0.000 0.000 0.156 0.091 0.000 0.000 0.031 0.030 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.030 0.219 0.152

A C A C A C A C A C

Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Type

Keywords

A C A C A C A C A C A C A C A C A C Type A C A C A C A C A C

0.000 0.000 0.091 0.000 0.061 0.000 0.000 0.000 0.182 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.030 Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Symbols 0.000 0.000 0.406 0.273 0.000 0.061 0.063 0.061 0.063 0.000 0.000 0.030 0.000 0.000 0.000 0.000 0.531 0.424

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.031 0.030 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.063 0.030

Summary Ambiguous Clear

Symbols 0.000 0.000 1.394 0.061 0.030 0.000 0.000 0.000 0.364 0.152 0.000 0.000 0.000 0.000 0.000 0.000 1.788 0.212

Logical Operators 0.000 0.000 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.030 0.000

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.091 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.091 0.000

Summary Ambiguous Clear

78

Tables

Attributes

0.000 0.000 0.000 0.000 0.250 0.091 0.063 0.061 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.313 0.152

0.000 0.000 0.344 0.182 0.000 0.000 0.063 0.061 0.031 0.000 0.000 0.030 0.000 0.000 0.000 0.030 0.438 0.303

Error Average 1.594 1.121

Response Count 32 33

Tables

Attributes

0.000 0.000 0.061 0.000 0.121 0.030 0.000 0.000 0.121 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.303 0.030

0.000 0.000 1.212 0.000 0.000 0.000 0.000 0.000 0.364 0.030 0.000 0.000 0.000 0.000 0.000 0.000 1.576 0.030

Error Average 4.273 0.303

Response Count 33 33

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.031 0.061 0.000 0.000 0.000 0.000 0.000 0.000 0.031 0.061

Values 0.000 0.000 0.030 0.000 0.000 0.000 0.000 0.000 0.121 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.152 0.000

Total: 0.000 0.000 0.906 0.545 0.250 0.152 0.250 0.242 0.188 0.061 0.000 0.061 0.000 0.000 0.000 0.061 1.594 1.121

Total: 0.000 0.000 2.818 0.061 0.212 0.030 0.000 0.000 1.242 0.212 0.000 0.000 0.000 0.000 0.000 0.000 4.273 0.303

Question Three SQL Type Component View A View C Select A Select C From A From C Where Join A Where Join C Where Cond A Where Cond C Group By A Group By C Having A Having C Order By A Order By C Total A Total C SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total Question Four SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type

Keywords 0.000 0.000 0.061 0.000 0.030 0.000 0.000 0.000 0.061 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.152 0.000

A C A C A C A C A C

Set Operators 0.000 0.000 0.030 0.000 0.000 0.000 0.000 0.000 0.030 0.000

Type

Keywords

A C A C A C A C A C A C A C A C A C Type A C A C A C A C A C

0.031 0.000 0.813 0.121 0.063 0.000 0.000 0.000 0.094 0.061 0.313 0.030 0.000 0.000 0.031 0.030 1.344 0.242 Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Symbols 0.000 0.000 1.182 0.212 0.000 0.000 0.000 0.000 0.182 0.152 0.000 0.000 0.000 0.000 0.000 0.000 1.364 0.364

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.788 0.152 0.000 0.000 0.000 0.000 0.000 0.000 0.788 0.152

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.091 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.091 0.030

Summary Ambiguous Clear

Symbols 0.000 0.000 1.031 0.182 0.094 0.000 0.000 0.000 0.000 0.061 0.125 0.000 0.000 0.000 0.063 0.000 1.313 0.242

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.031 0.000

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Summary Ambiguous Clear

79

Tables

Attributes

0.000 0.000 0.000 0.000 0.030 0.061 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.030 0.061

0.000 0.000 1.182 0.152 0.000 0.000 0.000 0.000 0.030 0.091 0.000 0.000 0.000 0.000 0.000 0.030 1.212 0.273

Error Average 3.970 1.030

Response Count 33 33

Tables

Attributes

0.000 0.000 0.094 0.000 0.219 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.313 0.000

0.000 0.000 0.500 0.121 0.000 0.000 0.000 0.000 0.000 0.061 0.438 0.000 0.000 0.000 0.094 0.000 1.031 0.182

Error Average 4.031 0.667

Response Count 32 33

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.303 0.152 0.000 0.000 0.000 0.000 0.000 0.000 0.303 0.152

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total: 0.000 0.000 2.424 0.364 0.061 0.061 0.000 0.000 1.455 0.576 0.000 0.000 0.000 0.000 0.000 0.030 3.939 1.030

Total: 0.031 0.000 2.438 0.424 0.375 0.000 0.000 0.000 0.125 0.182 0.875 0.030 0.000 0.000 0.188 0.030 4.031 0.667

Question Five SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total Question Six SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type A C A C A C A C A C A C A C A C A C Type

Keywords 0.091 0.000 0.121 0.033 0.030 0.033 0.030 0.033 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.303 0.100

A C A C A C A C A C

Set Operators 0.091 0.033 0.000 0.000 0.000 0.000 0.000 0.000 0.091 0.033

Type

Keywords

A C A C A C A C A C A C A C A C A C Type A C A C A C A C A C

0.000 0.043 2.235 0.174 0.000 0.000 0.235 0.000 0.176 0.000 0.706 0.261 0.000 0.000 0.000 0.000 3.353 0.478 Set Operators 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.059 0.000

Symbols 0.000 0.000 1.273 0.267 0.303 0.233 1.212 0.700 0.212 0.233 0.000 0.000 0.000 0.000 0.000 0.000 3.000 1.433

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.200 0.091 0.100 0.000 0.000 0.000 0.000 0.000 0.000 0.424 0.300

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.515 0.233 0.212 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.727 0.433

Summary Ambiguous Clear

Symbols 0.000 0.000 5.529 1.435 0.471 0.087 1.353 0.391 2.118 0.217 2.059 1.130 0.000 0.000 0.000 0.000 11.529 3.261

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.176 0.174 0.941 0.130 0.000 0.000 0.000 0.000 0.000 0.000 1.118 0.304

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.647 0.174 1.647 0.130 0.000 0.000 0.000 0.000 0.000 0.000 2.294 0.304

Summary Ambiguous Clear

80

Tables

Attributes

0.030 0.000 0.030 0.067 0.333 0.333 1.273 0.667 0.000 0.067 0.000 0.000 0.000 0.000 0.000 0.000 1.667 1.133

0.000 0.000 1.273 0.333 0.000 0.000 1.303 0.733 0.273 0.433 0.000 0.000 0.000 0.000 0.000 0.000 2.848 1.500

Error Average 9.424 5.200

Response Count 33 30

Tables

Attributes

0.000 0.000 0.765 0.217 0.588 0.087 1.294 0.391 0.294 0.000 0.647 0.391 0.000 0.000 0.000 0.000 3.588 1.087

0.000 0.000 3.412 0.652 0.000 0.000 1.294 0.522 2.118 0.130 2.353 1.087 0.000 0.000 0.000 0.000 9.176 2.391

Error Average 32.941 8.000

Response Count 17 23

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.364 0.267 0.000 0.000 0.000 0.000 0.000 0.000 0.364 0.267

Values 0.000 0.000 0.353 0.043 0.000 0.000 0.000 0.000 1.471 0.130 0.000 0.000 0.000 0.000 0.000 0.000 1.824 0.174

Total: 0.121 0.000 2.697 0.700 0.667 0.600 4.667 2.567 1.182 1.300 0.000 0.000 0.000 0.000 0.000 0.000 9.333 5.167

Total: 0.000 0.043 12.294 2.522 1.059 0.174 5.000 1.652 8.765 0.739 5.765 2.870 0.000 0.000 0.000 0.000 32.882 8.000

Question Seven SQL Type Component View A View C Select A Select C From A From C Where Join A Where Join C Where Cond A Where Cond C Group By A Group By C Having A Having C Order By A Order By C Total A Total C SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total Question Eight SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type

Keywords 0.200 0.000 0.533 0.733 0.133 0.000 0.200 0.067 0.000 0.000 0.267 0.133 0.000 0.067 0.000 0.000 1.333 1.000

A C A C A C A C A C

Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Type

Keywords

A C A C A C A C A C A C A C A C A C Type A C A C A C A C A C

0.000 0.000 0.167 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.100 0.000 0.400 0.000 0.000 0.167 0.700 Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Symbols 0.000 0.133 1.200 0.800 0.067 0.067 1.667 1.067 0.000 0.067 0.400 0.467 0.000 0.133 0.000 0.000 3.333 2.733

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.067 0.067 0.000 0.067 0.000 0.000 0.000 0.000 0.000 0.000 0.067 0.133

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.133 0.067 0.000 0.067 0.000 0.000 0.000 0.067 0.000 0.000 0.133 0.200

Summary Ambiguous Clear

Symbols 0.000 0.000 0.667 0.400 0.000 0.000 0.333 0.600 0.000 0.000 0.000 0.400 0.000 0.600 0.000 0.000 1.000 2.000

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.167 0.300 0.000 0.100 0.000 0.000 0.000 0.000 0.000 0.000 0.167 0.400

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.167 0.300 0.000 0.000 0.000 0.000 0.000 0.100 0.000 0.000 0.167 0.400

Summary Ambiguous Clear

81

Tables

Attributes

0.067 0.000 0.333 0.133 0.200 0.067 0.133 0.133 0.000 0.067 0.200 0.200 0.000 0.133 0.000 0.000 0.933 0.733

0.000 0.000 0.600 0.533 0.000 0.000 0.133 0.133 0.000 0.067 0.467 0.400 0.000 0.133 0.267 0.000 1.467 1.267

Error Average 7.267 6.133

Response Count 15 15

Tables

Attributes

0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.600 0.000 0.000 0.000 0.100 0.000 0.200 0.000 0.000 0.333 0.900

0.000 0.000 0.167 0.100 0.000 0.000 0.333 0.800 0.000 0.000 0.000 0.400 0.000 0.600 0.000 0.000 0.500 1.900

Error Average 2.333 6.400

Response Count 6 10

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.067 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.067

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.100 0.000 0.000 0.000 0.100

Total: 0.267 0.133 2.667 2.200 0.400 0.133 2.333 1.533 0.000 0.400 1.333 1.200 0.000 0.533 0.267 0.000 7.267 6.133

Total: 0.000 0.000 1.000 0.700 0.000 0.000 1.333 2.600 0.000 0.100 0.000 1.000 0.000 2.000 0.000 0.000 2.333 6.400

Question Nine SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total Question Ten SQL Component View View Select Select From From Where Join Where Join Where Cond Where Cond Group By Group By Having Having Order By Order By Total Total SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type A C A C A C A C A C A C A C A C A C Type

Keywords 0.000 0.000 2.000 1.500 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.000 0.000 0.000 1.000 0.000 3.333 1.500

A C A C A C A C A C

Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Type

Keywords

A C A C A C A C A C A C A C A C A C Type A C A C A C A C A C

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Symbols 0.000 0.000 1.000 1.000 0.333 0.500 0.667 1.000 1.333 1.000 1.333 1.000 0.000 0.000 0.333 0.000 5.000 4.500

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.500 1.333 1.000 0.000 0.000 0.000 0.000 0.000 0.000 1.667 1.500

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.500 0.667 0.500 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1.000

Summary Ambiguous Clear

Symbols 0.000 0.000 0.000 0.000 1.000 0.000 2.000 0.500 3.000 0.500 0.000 0.500 0.000 0.000 0.000 0.000 6.000 1.500

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.250 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.250

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.250 2.000 0.500 0.000 0.000 0.000 0.000 0.000 0.000 3.000 0.750

Summary Ambiguous Clear

82

Tables

Attributes

0.000 0.000 0.000 0.000 0.333 0.500 0.667 1.000 0.000 0.000 0.333 0.000 0.000 0.000 0.000 0.000 1.333 1.500

0.000 0.000 1.000 2.000 0.000 0.000 0.667 1.000 0.667 0.500 1.333 1.000 0.000 0.000 0.667 0.000 4.333 4.500

Error Average 18.000 15.500

Response Count 3 2

Tables

Attributes

0.000 0.000 0.000 0.000 1.000 0.000 2.000 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3.000 0.500

0.000 0.000 0.000 0.000 0.000 0.000 2.000 1.000 2.000 0.000 0.000 0.500 0.000 0.000 0.000 0.000 4.000 1.500

Error Average 20.000 5.000

Response Count 1 4

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.333 1.000 0.000 0.000 0.000 0.000 0.000 0.000 1.333 1.000

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.000 0.500 0.000 0.000 0.000 0.000 0.000 0.000 2.000 0.500

Total: 0.000 0.000 4.000 4.500 0.667 1.000 2.667 4.000 5.333 4.000 3.333 2.000 0.000 0.000 2.000 0.000 18.000 15.500

Total: 0.000 0.000 0.000 0.000 2.000 0.000 8.000 2.500 10.000 1.500 0.000 1.000 0.000 0.000 0.000 0.000 20.000 5.000

Question Eleven SQL Type Component View A View C Select A Select C From A From C Where Join A Where Join C Where Cond A Where Cond C Group By A Group By C Having A Having C Order By A Order By C Total A Total C SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type A C A C A C A C A C

Question Twelve SQL Type Component View A View C Select A Select C From A From C Where Join A Where Join C Where Cond A Where Cond C Group By A Group By C Having A Having C Order By A Order By C Total A Total C SQL Component Where Where Union Union Intersect Intersect Minus Minus Total Total

Type A C A C A C A C A C

Keywords 0.000 0.000 2.500 1.000 0.500 0.000 0.000 0.000 0.000 0.000 0.500 0.000 3.000 1.000 0.500 0.000 7.000 2.000

Symbols 0.000 0.000 4.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1.000 2.000 1.000 0.000 0.000 7.000 3.000

Set Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.500 0.000

Keywords

Logical Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Relational Operators 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 0.000

Summary Ambiguous Clear

Symbols

Logical Operators

Relational Operators

Tables

Attributes

0.000 0.000 0.500 0.000 0.500 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.000 0.000 0.000 0.000 1.500 0.000

0.000 0.000 2.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1.000 2.000 0.000 0.000 0.000 5.500 1.000

Error Average 22.500 8.000

Response Count 2 1

Tables

Attributes

Values 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.000

Values

Total: 0.000 0.000 9.500 2.000 1.000 0.000 0.000 0.000 0.000 2.000 3.000 2.000 8.000 2.000 0.500 0.000 22.000 8.000

Total:

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

2.000

0.000

0.000

0.000

0.000

0.000

2.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

2.000

2.000

0.000

1.000

0.000

0.000

0.000

1.000

0.000

2.000

0.000

2.000

0.000

0.000

0.000

0.000

0.000

2.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

5.000

0.000

0.000

0.000

1.000

2.000

8.000

Set Operators 0.000

Summary Ambiguous Clear

Error Average 8.000

0.000 0.000 0.000 0.000

83

Response Count 1

Appendix J: Seven Ambiguity Types Question Assessment Ratings

This table displays the average of ambiguity assessments provided by two independent nonresearchers. The scale used to assess the presence of the different type of ambiguity is: 2 Some

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12

3 Much

4 A Great Deal

Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Ambiguous Clear Clear

1.5 0.5 2 1 0.5 0.5 1.5 0.5 1.5 1 1.5 0.5 1.5 0.5 0.5 0.5 1.5 0.5 2 0.5 1.5 0.5 0.5

0.5 0 0 0 0 0 0 0 0 0 0.5 0 0 0 0.5 0 0 0 0 0 0 0 0

84

1 0 1.5 1 1 0 3 2 0.5 0 3 0.5 0.5 0 0.5 1 2.5 0.5 2 0 2 0 0.5

0.5 0 0.5 0 0 0.5 0 0 0 0.5 3.5 0.5 0 0 0 0 0 0 0.5 0 1 0 0.5

0.5 0 1.5 0 0.5 0.5 0.5 0 2 0 1.5 0 1 0 0 0 3 0 1 0 1 0 0

Suggestive

Emphatic

Extraneous

Inflective

2 0.5 1 0.5 3.5 0 1 2 2.5 0.5 0.5 0.5 2.5 0.5 3.5 0.5 0.5 0.5 0 0 1 0 0

Pragmatic

Formulation

Lexical

Question

1 A little

Syntactical

0 None

0.5 0 0.5 0 0.5 0 0 0 0 0 2.5 0 1 0 0 0 0 0.5 1.5 0 1.5 0 0

Appendix K: Ambiguity Assessment Instrument

Ambiguity Measurement Questionnaire Type Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Information Request A report of our clients for our marketing brochure mail-out. The word "report" may have several meanings, independent of its context. There is: a gunshot report echoing through the hillside; the Lieutenant reported to the Captain; I dropped the heavy report on my toe, etc. Although the context may make the meaning clear, the lexical ambiguity that is present adds to cognitive effort and contributes to ambiguity overall in that manner. A report of clients in Brisbane and on our Gold list. The natural language "and" does not map well to its Boolean equivalent. A legitimate interpretation would be to assume that this request is for clients that satisfy both conditions (Brisbane-based and on the Gold List), or for clients that satisfy either condition (Brisbane-based or on the Gold list). Another formulation is Bob hit the man with a stick. It is not clear, syntactically, whether it was the man with a stick, that was hit, or whether the man was hit with a stick by Bob. A report that is the product of our last marketing campaign regarding sales of our accounting software product in the last month. Inflective ambiguity here derives from the use of the word "product" with two different meanings in the one information request. Inflective ambiguity is where the same word is used in the one grammatical structure (paragraph, sentence, phrase) with different meanings. Natural writing tends to avoid this. A report of all the clients for a department. The ambiguity here is that the department has not been specified. It would be legitimate to prepare a report for any department, although it is likely that this will not address the needs of the person making the information request. Further information is needed to resolve this actual ambiguity. A report of all clients (and their names and addresses only) for the Tax and Business Services department. Some of those clients are our biggest earners, you know. The last sentence is extraneous - unlike pragmatic ambiguity, it contains information that is redundant, uninformative, or not necessary to meet the needs of the question or task asked in the statement. It is "noise" in the communication - where more words are used than are necessary to make the statement. A report of our good clients. Ambiguity here could derive from the lack of ability to provide the verbal emphasis of the words in its written form. Depending on the emphasis used, "good clients" could be legitimately interpreted to be clients that pay on time, clients that have the most dollar-value sales, our very best clients

85

Type

Suggestive

Information Request (a much shorter list than if based on dollar-value), or even, with the correct sarcastic or ironic emphasis on the spoken word, our worst clients - those that do not pay. A report of the clients of this accounting practice that have lodged taxation returns in the past five years in accordance with the requirements of the Australian Taxation Office. The request for information is quite clear until the phrase "in accordance with the requirements of the Australian Taxation Office". By definition, all taxation returns should be lodged in accordance with these requirements. The extra phrase introduces suggestive ambiguity into the information request by suggesting that the report will not necessarily consist of all taxation clients.

86

Mark all Information Requests in Accordance with the Following Scale 0 none

No. 1.

2.

1 A little

2 Some

3 Much

4 A Great Deal

Ambiguity Information Request Type (Scale) Management wants a list of each of our suppliers with no duplicates in the list. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List the distinct suppliers of the items we stock. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 Produce a report that lists the inventory items where the quantity on hand is much larger, on a percentage basis, than the quantity ordered. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List item number, item name, quantity on hand, quantity on order where quantity on hand is greater than 2 * quantity ordered. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4

87

No. 3.

4.

5.

Ambiguity Information Request Type (Scale) Management wants a list of all Japanese customers and customers with credit limits over $15,000. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List customer numbers, customer names, country, and credit limit of customers with credit limits greater than $15,000 or of customers in Japan. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 Produce a report that statistically compares the credit limits for customers in different countries. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List country, average credit limit, and standard deviation of customer credit limit grouped by country. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 Produce a report of clients that prefer the Speedair carrier and addresses. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4

88

No.

6.

Ambiguity Information Request Type (Scale) List customer number, customer name, street, city, post code, and country where the customer's preferred carrier is Speedair. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 We're wondering if some of our winemakers are using poor quality packaging and bottles - we've had a few complaints. Can you get us a report that gives us some sort of idea about what items we are shipping compared to what the customers are taking delivery of?

Lexical Syntactical Inflective Pragmatic Extraneous Emphatic Suggestive

Lexical Syntactical Inflective Pragmatic Extraneous Emphatic Suggestive 7.

Lexical Syntactical Inflective Pragmatic Extraneous Emphatic Suggestive

It would probably be a good idea while you're at it to give a comparative percentage of the stuff shipped that doesn't make it just so the vintners won't try and weasel their way out of it, you understand, they're good at that. 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 List item maker, item number, item name, and 100 * (sum of quantity shipped less sum of quantity accepted) / (sum of quantity shipped) where the type of alcohol is wine. 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 Prepare a report that provides *all* customer's details and indicates the number of different products they have ordered from us. 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4

89

No.

8.

9.

Ambiguity Information Request Type (Scale) List customer number, and customer name for *all* customers, and, if they have ordered anything, a count of unique items ordered. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 Management wants to know which customers we've shipped goods more than 10 times to them by the shipper that they requested. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List customer number, name, and count of invoices, where the actual carrier is the same as the customer's preferred carrier, having more than 10 shipments. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 Produce a report, with best items first, on the gross contribution to profitability of each inventory item for July 1999. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List item number, item description, and (unit price less unit cost) multiplied by units sold in July 1999. Sort your output by descending gross contribution to profitability. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4

90

No.

Ambiguity Information Request Type (Scale) Suggestive

91

0

1 2 3 4

No. 10.

11.

Ambiguity Information Request Type (Scale) Produce a report with the relevant customer details that gives us an idea of how much of our business is exposed to foreign currency fluctuations. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List customer number, customer name, customer country, and a total of the amount paid where the settlement currency code for the invoice is not equal to the currency code for Australian dollars. Group results by customer number. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 Management is concerned about current slow-moving inventory items, based on shipments since 1 June 1999. Produce a report of the items that they might be most concerned about. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List inventory item number, item description, quantity on hand, and sum(quantity shipped) with ship dates greater than 1 June 1999 that have sums of the quantity shipped less than the sums of the quantity on hand. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4

92

No. 12.

Ambiguity Information Request Type (Scale) Produce a report that gives some idea about our best USA export items where the amount since March is bigger than $5,000. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4 List item numbers, item descriptions and the total accepted quantity times agreed price of each item for items shipped to US customers since 1 March 1999 and having a total accepted quantity times agreed price greater than $5,000. Lexical 0 1 2 3 4 Syntactical 0 1 2 3 4 Inflective 0 1 2 3 4 Pragmatic 0 1 2 3 4 Extraneous 0 1 2 3 4 Emphatic 0 1 2 3 4 Suggestive 0 1 2 3 4

93

Appendix L: Internal Validity of the Experiment

A full explanation of the recognised seven "threats" for the internal validity of experiments is contained in Huck et al. (1974). The comments made below have their basis in the discussion presented in Huck et al. (1974).

History

The history threat to internal validity arises where an event outside of the domain of the experiment occurs that may affect the independent variable. As the experiment took place over a two hour period in a controlled setting, over two days of experimental testing, there is not considered to be a history threat to internal validity for this experiment.

Maturation

Maturation occurs where the participants mature, grow, and learn during the course of the experiment. The passage of time increases the recorded end user query performance. Any maturation effect is adequately controlled for in this instance, as the experiment was two hours in duration, homogeneous groups were used, and each tutorial group tested contained both Group A and Group B participants. Further, both groups received the ambiguity treatment on alternate questions. Any residual maturation effect (such as learning the use of the SQL experimental tool or increased proficiency in SQL during the experiment) applies equally to the clear and ambiguous treatment effects.

94

Testing

Testing occurs where the individuals taking the test score higher than their first sitting of the test. Within this experiment, the possibility exists that participants learned more about the use of the experimental tools and process (the SQL editor). Subsequent questions (for example, question one compared to question six) might result in superior performance (particularly time for completion) due to the testing effect. Due to the factors cited for the maturation effect, any experimental testing effect - should there be any - applies equally to both the clear and ambiguous formulations of the question. Additionally, participants who had undertaken similar experiments previously are stratified into separate classes. Group A and Group B were homogeneous in this respect. Therefore, both within the experiment, and from previous experiments, any testing effect that exists in this experiment from these sources applies equally to both treatment effects.

Instrumentation

Instrumentation is identified by Huck et al. (1974) as the effect of any change in the observational technique accounting for any experimentally observed difference. This could arise in the current experiment with a maturation change in the assessors over the time taken to assess student responses. Assessors could correct later participant responses differently to earlier participant responses.

This effect is controlled for in several ways. Firstly, when assessing responses, assessors had no means to identify participant responses by student name, only student number. This avoided assessors' preconceptions about student's performance. The use of two independent assessors controlled for some differences in marking strategies, as did the use of diary notes

95

to ensure consistency of marking over time. An exhaustive cross-checking and data correctness procedure also mitigates this effect.

Responses were assessed by student in no particular order. Group A and Group B participant responses were evenly distributed in the marking order, with a calculated non-parametric runs test z statistic of 0.9924 (Newbold 1984). This weak z-statistic (significant only at a 32% confidence level on a two-tailed hypothesis) implies that any residual instrumentation effect, should it exist, is evenly applicable to either question formulation. Overall, the threat of instrumentation to experimental results in this regard is controlled for.

Statistical Regression

Statistical regression occurs where the analysis of the experiment is on extreme scores, such that subsequent tests tend to regress to the mean (Huck et al. 1974). The current experiment is not exposed to this threat to internal validity, as extreme scores are not the focus of the experiment. Furthermore, the experimental design and assessment process used adequately controls for this threat to internal validity, as previously described.

Mortality

Mortality occurs where participants drop out of the experiment during its course. As this experiment is short in duration (two hours), participant mortality did not occur during the experiment. In addition, all sixty-six students enrolled in the subjects participated in the experiment. The mortality effect is of some concern, however, in that incomplete participant responses were removed from the analysis. There were 506 participant responses, of which 425 responses were completed and statistically analysed in the experiment.

96

The effect of this acknowledged experimental bias is to reduce the total number of responses examined, and a general tendency to remove from analysis responses with a significant number of errors. As this bias tends to be against the direction of the hypotheses made in this paper, any conclusions drawn in this regard are strengthened, and the mortality effect on interpretation of results is lessened. Overall, the mortality effect strengthens any conclusions drawn, and thus is less of an internal validity issue for the current experiment.

Selection Bias

The selection process resulted in two homogeneous groups, Group A and Group B, drawn from the entire student population of two information systems subjects. There is no evident selection bias between Group A and Group B. In any case, both Group A and Group B received the treatment effect of ambiguity on alternate questions, further mitigating concerns of the effect of a selection bias on experimental results.

97

Related Documents


More Documents from ""