Guide to Writing Objective Tests A guide to writing selected response questions and creating objective tests. Produced as part of the E-only project.
Version 1.2 April 2007
Page 1
Guide to Writing Objective Tests v1.2
INTRODUCTION TO SELECTED SELE CTED RESPONSE QUESTIONS QUESTI ONS This guide was written as part of the E-Only project, which sought to develop SQA’s first (fully) online qualification. A package of support materials was developed as part of that project – and this document is part of that package. It was written after an extensive literature review of UK and international (particularly US) publications relating to objective testing. It is not a procedural guide for SQA appointees. Specific sectors and subjects will have their own procedures for producing objective tests and this guide does not seek to replace that guidance. However, it does aim to describe best practice in the construction of objective tests from a generic perspective. This document has gone through a number of revisions since the first draft version was written in July 2006. Special thanks to everyone who took the time to contribute through SQA Academy (http://www.sqaacademy.com). The document is being frequently revised. An online forum is available to discuss its contents at the following URL, where the latest version of the Guide can be found: http://groups.google.com/group/objectivetesting Bobby Elliott (
[email protected]) April 2007
PURPOSE OF THIS GUIDE Although most SQA units employ “conventional” assessment, some subject areas (mostly Science-related) have a tradition of using objective tests. For example, Biology uses multiple choice questions at Intermediate, Higher and Advanced Higher levels; and HNC Computing uses an objective test as part of the Graded Unit. More recently, there has been greater emphasis on objective testing due to its suitability for computer-based assessment; as a result, an increasing number of unit specifications (at both National and Higher National levels) involve an element of objective testing. This guide will be of assistance to any SQA Officer (or appointee) involved in creating objective tests. It has three objectives: 1. to provide advice about the construction of objective questions; 2. to explain how to combine questions into an objective test; 3. to provide guidance on authoring items. A subsidiary objective is to standardise our vocabulary. Objective testing is a technical area with lots of jargon – some of which is used inconsistently. This guide is the result of a wide-ranging literature review and seeks to harmonise our terminology with that used internationally.
Page 1
Introduction to Selected Response Questions
Page 2
Guide to Writing Objective Tests v1.2
Although some topics (such as item banking) overlap with computer-assisted assessment (CAA), this guide focuses on the production of paper-based objective tests – although much of the advice is directly transferable to a CAA environment. The document has seven sections: Section 1 Section 2 Section 3 Section 4 Section 5 Section 6 Section 7 Section 8
Introduction to selected response questions (page 1) 1) Types Types of selected selected response questions questions (page 7) 7) Choosing selected response questions (page 15) 15) Writing multiple choice questions (page 22) 22) Writing questions for higher level skills (page 33) 33) Item analysis (page 40) 40) Constructing tests (page (page 44) Dealing with guessing (page 52).
While the focus of this guide is objective questions, it does not seek to promote one type of assessment over another. Traditional forms of assessment remain as valid today as they have ever been – but, where appropriate, objective approaches have a role to play too. Neither does it seek to explain what you already know. Most SQA staff have a good knowledge of objective testing - this guide simply seeks to provide a single source of advice for busy Officers and appointees. There are no rules for writing objective tests; there’s only advice. Do whatever you think is right for your particular test. Assessment is an art – not a science. There is no substitute for human judgement. MISCONCEPTIONS ABOUT OBJECTIVE TESTING Although this document does not seek to promote one type of assessment over another, it does aim to dispel commonly-held, but inaccurate, views about objective testing. Some of the most common misconceptions are rehearsed below. 1. Objective tests dumb-down education; objective tests are easy. Objective tests are as “dumb” or “smart” as you choose to make them. Many high stakes tests (such as university medical examinations in the UK and the SAT in the United States) use objective tests. 2. Objective tests can only be used to assess basic knowledge. While this is largely true in practice, there is nothing inherent in the design of objective tests to make them unsuitable for assessing high level skills. 3. Objective tests encourage guessing. The problem of guessing can be resolved through one of a number of recognised techniques. 4. Writing an objective test is easy. While most teachers can create simple objective tests, the construction of high quality objective questions is highly skilled and requires significant knowledge and experience.
Page 2
Introduction to Selected Response Questions
Page 3
Guide to Writing Objective Tests v1.2
5. Objective testing is only fashionable because of e-assessment. It’s true that objective tests are well suited to computer-assisted assessment – but they are also valid and reliable forms of assessment in their own right. 6. Objective tests aren’t appropriate for my subject. While objective tests have traditionally been used in the physical and social sciences (such as Physics and Psychology), they can be used in any subject. QUESTION TYPES SQA has traditionally employed a variety of question types within Unit and Course assessment. These question types can be categorised under two headings: • •
constructed response questions selected response questions.
Note: Note Some of the terminology in this guide might not be familiar to you. It has been used because it is widely employed in international testing literature and it was considered best to use “industry standard” nomenclature rather than “Scottish” terminology. CONSTRUCTED RESPONSE QUESTIONS Constructed response questions (also known as “open-ended” questions) are questions that require the candidate to create (“construct”) an answer. Examples of constructed response questions (CRQs) include short answer questions and essays.
Example 1 ~ Constructed response question Translate “Good morning mother” into Spanish. Write here: CRQs can be sub-divided into two sub-categories: • •
restricted response questions extended response questions.
RRQ
ERQ
A restricted response question (RRQ) is a question whose answer is limited to a few words. Examples of RRQs include complete-the-sentence, missing word and short answer questions (see Example 1 above).
Figure 1 - RRQs and ERQs
Page 3
Introduction to Selected Response Questions
Page 4
Guide to Writing Objective Tests v1.2
An extended response question (ERQ) is one whose answer requires the candidate to write longer responses, normally consisting of two or more paragraphs. Examples of ERQs include reports, essays and dissertations. There is no hard-and-fast rule about where a restricted response question ends and an extended response question begins. Note that many SQA assessments use a combination of restricted response questions and extended response questions. Some question papers have two sections, one employing RRQs and the other using ERQs. SELECTED RESPONSE QUESTIONS A selected response question (SRQ) is a question whose answer is pre-determined and involves the candidate choosing (“selecting”) the response from a list of options. Because the answer is pre-determined and there is only one correct answer, these types of questions are often referred to as “objective” questions. Examples of SRQs include true/false, multiple choice and matching questions.
Example 2 ~ Selected response question The capital of the United States is New York.
True/False
SQA’s question papers typically consist of constructed response questions. Lower levels (up to SCQF level 4/5) generally use restricted response questions and higher levels (SCQF level 5 and up) generally employ extended response questions (although sometimes a mixed approach is used). A limited number of subjects employ selected response questions. This guide focuses on SRQs, which are becoming increasingly popular for a variety of reasons. ADVANTAGES OF SRQS 1. SRQs take less time to answer – reducing the amount of time that candidates spend on assessment and increasing learning time. 2. SRQs are quick to mark – reducing the time teachers spend on assessment and increasing teaching time. 3. SRQs are well suited to formative assessment – since candidates’ responses can be analysed and used to provide detailed feedback. 4. SRQs are good for assessing breadth of knowledge - they are ideal for assessing a broad range of topics in a short time. 5. SRQs are more reliable than CRQs - because they get around some of the marking problems associated with written answers.
Page 4
Introduction to Selected Response Questions
Page 5
Guide to Writing Objective Tests v1.2
6. SRQs are well suited to computer-assisted assessment - and facilitate item banking. The low writing load of SRQs means that the focus is on the candidate’s knowledge rather than the candidate’s writing or language skills – which is a common problem with constructed response questions. Also, the speed of answering SRQs addresses another common criticism of assessment – that it takes up too much time for both students and teachers. Research into the marking of CRQs and SRQs has shown significant differences in the reliability of the two approaches – with objective tests proving to be significantly more reliable than written tests. This has been the major reason for the widespread adoption of objective tests in the United States, where testing organisations operate in a more litigious environment. The compatibility of objective tests with computer-assisted assessment is a major driver for the renewed popularity of objective testing. SQA, along with other awarding bodies, is in the process of building banks of questions (“item banks”) which can be computerised and delivered to candidates over the Internet. DISADVANTAGES OF SRQS 1. SRQs are not suitable for assessing certain abilities, such as communication skills or creativity. They are also not appropriate when candidates are required to construct an argument or provide an original response. 2. SRQs may be less valid than CRQs and suffer from low professional credibility. 3. SRQs that assess higher order skills are difficult (and time consuming) to produce. 4. SRQs can be wordy and require high order reading skills. The first and second disadvantages are linked. There is nothing inherent in the design of SRQs to make them less valid than CRQs – but because they have often been used inappropriately (to measure skills that cannot be properly measured by this style of question) they have established a reputation for being invalid among some practitioners. Most teachers are comfortable with using SRQs to assess low order skills (such as factual recall, typified by Example 2). They are less comfortable with their use in assessing deeper knowledge and understanding. Most currently available examples of SRQs re-affirm this view by focussing on the assessment of surface knowledge; even examples of SRQs that are meant to assess deeper knowledge often only assess surface knowledge – albeit less well known surface knowledge! Traditionally, the costs of carrying out assessment come at the end of the process – the setting of the question paper is relatively speedy, the time consuming part comes when the papers have to be marked. Objective tests reverse this model – the time consuming bit is the production of the questions, with marking taking very little
Page 5
Introduction to Selected Response Questions
Page 6
Guide to Writing Objective Tests v1.2
time. It is, therefore, something of a culture shock to move from traditional assessment to objective testing. Another criticism of SRQs is that they can atomise teaching and learning, encouraging “teaching to the test” and surface learning. This, combined with their efficiency in assessing large numbers of students in short periods of time, has resulted in them acquiring a reputation as “weapons of mass instruction”, with poor standing among many educationalists. USES OF SELECTED RESPONSE QUESTIONS As previously mentioned, objective tests are used in a number of SQA summative assessments (such as Higher Physics and some HN units). This style of assessment is well suited to rapid, focussed assessment and is traditionally employed to assess factual recall and basic understanding. It is less commonly used to assess deeper knowledge and understanding, and there are few examples (within SQA or elsewhere) of objective tests being used to assess higher level skills. When used summatively, objective testing tends to be used for low-stakes assessment rather than high stakes assessment, which largely remains the preserve of constructed response questions. However, some subjects (such as Advanced Higher Biology) do employ objective testing and Higher Education has a long tradition in using objective testing for high-stakes summative purposes in some fields (such as Medicine). Objective testing is well suited to formative assessment since it is quick to administer and assess (lack of time is often cited as the main reason for not using formative assessment). It is particularly suited to diagnostic assessment since it can be used to identify specific misunderstandings or weaknesses. Historically, objective testing has been widely used for psychometric testing (testing of intellect and attitudes) and, more recently, it has been widely applied to job competence testing. It is also used in entry examinations for some professional bodies (such as ACCA). Objective tests are widely used internationally – including high stakes assessments such as the SAT in the United States, which is used for university entry. They are also widely used within vendor examinations (such as Microsoft’s global certification programme). Awarding bodies in every country are focusing on computer-assisted assessment, which has resulted in a renewed interest in objective testing. These organisations share the view that the increasing popularity of e-learning will drive demand for e-assessment – which will be underpinned by item banks consisting of large numbers of selected response questions.
Page 6
Introduction to Selected Response Questions
Page 7
Guide to Writing Objective Tests v1.2
TYPES OF SELECTED RESPONSE RE SPONSE QUESTIONS There are several types of selected response questions (SRQs). Although they share some common characteristics, they each have unique features and applications. But they all share a fundamental characteristic – they have one unambiguously correct answer. TYPES OF SELECTED RESPONSE QUESTIONS There are seven types of SRQ. These are: 1. true/false questions 2. matching questions 3. multiple choice questions (MCQ) 4. multiple response questions (MRQ) 5. ranking/sequencing questions 6. assertion/reason questions 7. Likert scale questions. Each type of SRQ is now described and exemplified. Note: This section simply introduces each type of question. It does not aim to explain how or when to use them. TRUE/FALSE QUESTIONS A true/false question (T/F) is a statement (not a question!) that is either true or false. The candidate must select one of two possible responses - “true” or “false”.
Example 3 ~ True/false question (x+1) is a factor of x2+2x-3
True/False
Because candidates have a 50/50 chance of answering these questions correctly, this type of question is considered “easy” and is associated with low order knowledge. However, true/false questions can assess higher order skills; and setting an appropriate pass mark can eliminate the effects of guessing.
Page 7
Introduction to Selected Response Questions
Page 8
Guide to Writing Objective Tests v1.2
Note: Note Any question that has one of two possible answers is considered a true/false question (for example, the responses might be “yes” or “no” rather than “true” or “false”). These questions are also known as “alternative response” items. MATCHING QUESTIONS This type of question requires candidates to match an object with one or more associated characteristics.
Example 4 ~ Matching question Match the list of storage technologies on the left with the list of memory characteristics on the right. Match each technology (A, B, C or D) with one characteristic (1, 2, 3 or 4) only. A.
Hard disk
1.
Non-volatile
B.
Flash memory
2.
Volatile
C.
RAM
3.
High capacity
D.
ROM
4.
Low cost.
A. B. C. D. The objects on the left are called “stimulators” and the matching statements on the right are called “responses”. No more than seven stimulators should be included in any one question. This type of question is often used to assess candidates’ knowledge of the characteristics of certain objects. It is particularly well suited to computer-based assessment since it can be implemented as drag-and-drop (dragging each response onto an associated stimulator).
Page 8
Introduction to Selected Response Questions
Page 9
Guide to Writing Objective Tests v1.2
MULTIPLE CHOICE QUESTIONS A multiple choice question (MCQ) consists of a question (or incomplete statement) followed by a list of possible responses from which candidates must select one. There are normally three to five options with four being the most common.
Example 5 ~ Multiple choice question In psychiatry, holding two contradictory views about the same thing is called: A
cognitive dissonance
B
dementia
C
dissociative disorder
D
factitious disorder
Note that a multiple choice question with two options is effectively a true/false question. Or, to put it more accurately, a T/F question is a multiple choice question with two options. MCQs are the most common type of selected response question – and the one that this guide focuses on in later sections. MULTIPLE RESPONSE QUESTIONS A multiple response question (MRQ) is similar to a multiple choice question (MCQ) but has two or more correct responses (as opposed to an MCQ’s single correct response).
Example 6 ~ Multiple response question Which of the following statements about earthquakes is/are true?
Page 9
A
An earthquake generates seismic waves.
B
The boundary of tectonic plates is called the fault plane.
C
The point of origin of seismic waves is called its epicentre.
D
The severity of an earthquake is measured by its magnitude and intensity.
Introduction to Selected Response Questions
Page 10
Guide to Writing Objective Tests v1.2
There are some misconceptions about MRQs. They are not necessarily more difficult than MCQs; they are as hard or as easy as you choose to make them. There is no need to indicate the number of correct options; this only encourages guessing. And there is nothing wrong with making every option correct; in fact, prohibiting this possibility reduces the reliability of MRQs. Note that MCQs normally begin: “Which Which one of the following..”, and MRQs usually begin: “Whi Which Which of the following…”. RANKING QUESTIONS A ranking question involves ordering the options in some defined sequence. The sequence can be an ordered list of numbers, chronological sequence or series of events.
Example 7 ~ Ranking question Rank the following countries in order of their population densities (lowest density first). I
France
II
Germany
III
Spain
IV
United Kingdom
A B C D Ranking questions are easily implemented by computers using drag-and-drop. ASSERTION/REASON QUESTIONS This type of question consists of a statement (assertion) and a possible explanation (reason). Candidates must decide if the assertion and reason are true, and whether the reason is a correct explanation of the assertion.
Page 10
Introduction to Selected Response Questions
Page 11
Guide to Writing Objective Tests v1.2
Example 8 ~ Assertion-reason question The following assertion and reason relate to World War II. Read the assertion and associated reason and then choose a corresponding letter (A-E) to indicate whether the assertion and/or reason is/are true. Assertion
Japan’s lack of raw materials was a cause of World War II in Asia.
Reason
Japan lacked natural raw material except for small deposits of coal and iron.
A
Assertion is true and reason is true and the reason is a correct explanation of the assertion.
B
Assertion is true and reason is true but the reason is not a correct explanation of the assertion.
C
The assertion is true but the reason is false.
D
The assertion is false but the reason is true.
E
The assertion is false and the reason is false.
Assertion-reason questions are similar to multiple true-false questions. LIKERT SCALE QUESTIONS This type of SRQ was named after Rensis Likert who invented the scale in 1932. It is widely used within questionnaires to gauge respondents’ attitudes. The classic Likert scale consists of five possible responses: 1. Strongly disagree 2. Disagree 3. Neither agree nor disagree 4. Agree 5. Strongly agree Some psychometricians add or remove options (the neutral option – “neither agree nor disagree” – is often removed).
Page 11
Introduction to Selected Response Questions
Page 12
Guide to Writing Objective Tests v1.2
Example 9 ~ Likert Scale question My manager supports me when necessary but otherwise allows me to work without interference. A
Strongly disagree.
B
Disagree.
C
Neither agree nor disagree.
D
Agree.
E
Strongly agree.
This type of SRQ is almost exclusively used for attitudinal assessments and is rarely employed within formal SQA assessments. It is not discussed further in this guide. BEST ANSWER AND EXCEPTIONS Although the existence of a single, unambiguous, correct response is a fundamental feature of SRQs, the usefulness of SRQs can be extended through “best answer” and “exception” type questions. These techniques increase the flexibility of SRQs at the expense of some of their objectivity. BEST ANSWER QUESTIONS A “best answer” question is one whose answer is the closest (“best”) answer selected from a list of possible answers of which more than one may be true. Used carefully, best answer questions can be almost as objective as standard SRQs.
Example 10 ~ Best answer question A user wishes to use a search engine to look for information relating to Celtic music that originated in Scotland. Which one of the following queries is likely to produce the best results?
Page 12
A
Celtic music Scotland
B
“Celtic music” Scotland –football
C
Scotland +celtic +music +originate
D
“Celtic music that originated in Scotland”
Introduction to Selected Response Questions
Page 13
Guide to Writing Objective Tests v1.2
Note that more than one of the responses is correct (in fact, they are all more-or-less correct). But only one option is the best answer (B). The use of best answer questions is particularly appropriate to the social sciences and arts subjects, which tend not to have a definitive body of knowledge like the physical sciences. Best answer questions can also be used to assess some higher order skills since they frequently require an element of judgment. EXCEPTION QUESTIONS An exception question is one where all of the options are correct except one of the possible responses. This type of question effectively reverses the logic of the standard SRQ.
Example 11 ~ Exception question Smoking is a contributory factor in the following conditions EXCEPT: A
diabetes.
B
heart disease.
C
lung cancer.
D
Parkinson’s disease.
A question that includes “not” in the stem is effectively an exception question. For example, the above question could be re-phrased: “Which one of the following conditions is NOT caused by smoking?”. Exception (and negative) questions are not ideal – but should not be completely avoided since their use can simplify questions and/or increase the number of questions that can be asked. VARIANTS & CLONES A question that assesses the same content as another question is known as a variant. variant The stems of variants are worded differently and the options may be different – but, fundamentally, variants assess the same learning objective. A question that is (almost) identical to another question is known as a clone clone. one Clones differ only in their variables. For example, the question below is a clone of Example 3, the only difference being the expression to be factorised.
Page 13
Introduction to Selected Response Questions
Page 14
Guide to Writing Objective Tests v1.2
Example 12 ~ Clone (x+2) is a factor of x2+2x-3.
True/False
Variants and clones have significant implications for e-assessment since they provide a quick and simple way of rapidly populating an item bank.
Each type of SRQ has its strengths and weaknesses, and each has its best uses. The next section looks at choosing SRQs for different purposes.
Page 14
Introduction to Selected Response Questions
Page 15
Guide to Writing Objective Tests v1.2
CHOOSING SELECTED RESPONSE RE SPONSE QUESTIONS The previous section explored the characteristics of different types of selected response question. This section looks at how each type is best used. TAXONOMIES OF LEARNING One of the key determinants in the selection of SRQs is the kind of knowledge or understanding that you are seeking to assess. For example, factual recall can be adequately assessed using true/false questions; deeper understanding may require more complex question types such as multiple response questions. As a starting point, we need a method of classifying knowledge and understanding. The most widely used classification system is Bloom’s Taxonomy. BLOOM’S TAXONOMY Benjamin Bloom wrote Taxonomy of Educational Objectives Book 1 Cognitive Domain in 1956 in an attempt to standardise the terminology used by teachers to describe academic abilities. Until the publication of this book, different people used different words to describe the same thing; or, worse, used the same words to describe different things. His book described a classification system that could be used to categorise cognitive abilities. The taxonomy (which became known as Bloom’s Taxonomy) is widely used within the educational community. Note: Note Bloom’s Taxonomy is not the only way to classify academic abilities. There are many alternative methods – some linked to Bloom’s (but more up-to-date) and some entirely different from Bloom’s. But Bloom’s Taxonomy remains the most widely used classification system. Bloom’s Taxonomy classifies academic abilities into six categories: 1. Knowledge 2. Comprehension 3. Application 4. Analysis 5. Synthesis 6. Evaluation. A brief description of each cognitive skill follows.
Page 15
Introduction to Selected Response Questions
Page 16
Guide to Writing Objective Tests v1.2
Knowledge
Knowledge involves the recall of specific facts and figures, or the recall of specific methods and processes. Knowledge is the bottom of Bloom’s Taxonomy but underpins the higher order abilities. There are three types of knowledge: knowledge of specifics, knowledge of methods, and knowledge of universals. At the higher levels (knowledge of methods and universals) it can be intellectually demanding. This category includes: knowledge of terminology, knowledge of specific facts, knowledge of conventions, knowledge of trends and sequences, knowledge of classifications, knowledge of criteria, knowledge of methodology, knowledge of principles and generalisations, and knowledge of theories and structures.
Comprehension
Comprehension differs from knowledge in that it relates to the mental processes of organising and re-organising information for a particular purpose. It includes: translation, interpretation and extrapolation. Translation relates to the ability to translate (or decode) a communication from one format (or language) to another. Interpretation involves the explanation or summarisation of a communication. Whereas translation involves a mechanistic, part-for-part rendering of a communication, interpretation involves a more holistic, re-ordering or rearrangement of the information. Extrapolation involves extending trends or sequences beyond the given data to infer consequences or corollaries.
Application
This involves the use of knowledge and comprehension in specific situations. For example, the use of knowledge of computing terminology and procedures combined with an understanding of the principles of computer hardware and software can be applied to the assembly of a computer system.
Analysis
Analysis involves the breakdown of a communication into its constituent parts so that the relationship between the elements is made clear. Analysis is intended to clarify or explain communications or processes. This cognitive skill includes the ability to: (1) analyse elements (identification of the components of the communication); (2) analyse relationships (the ability to check the consistency or accuracy of a hypothesis, and skills in comprehending the inter-relationships among different ideas or concepts); and (3) analyse organisational principles (the ability to recognise form and pattern in a communication, and the ability to recognise general techniques used within a subject area).
Synthesis
Synthesis involves combining the parts so as to form a whole. It involves combining and arranging parts or pieces of a communication to create something new. It may involve: (1) the production of a unique communication; (2) the production of a plan; and (3) the derivation of a set of abstract relations to represent physical phenomena.
Evaluation
Evaluation involves making judgements about the value of particular phenomena for given purposes. Evaluation is carried out using criteria and involves qualitative and quantitative judgements based on these criteria. The criteria may be given or created. This includes measuring the internal consistency of the communication using criteria such as: quality of writing, accuracy of the information contained within it, and consistency of argument; and measuring the external consistency of the communication which requires the evaluator to have a detailed knowledge of type of phenomena under review since it will be evaluated in terms of the general criteria which are applied to phenomena of this type.
Table 1 - Bloom's Taxonomy
Page 16
Introduction to Selected Response Questions
Page 17
Guide to Writing Objective Tests v1.2
Bloom’s Taxonomy is a hierarchy in that each category builds on the one below. For example, application depends on comprehension which in turn depends on knowledge. Or, to put it more simply: you can’t apply something until you understand it; and you can’t understand something until you know about it. The figure opposite illustrates this hierarchy -- with knowledge at the bottom and evaluation at the top.
Figure Figure 2 - Bloom's hierarchy
It is worth noting that, in practice, every level of Bloom’s Taxonomy can be reduced to knowledge if the candidate answers the question through rote learning. The most sophisticated evaluation can be answered correctly if the candidate has studied that specific scenario and learned the correct response. Or, to put it another way, one person’s evaluation is another person’s knowledge. IDENTIFYING THE LEVEL OF A QUESTION Bloom’s Taxonomy can be used to categorise the cognitive demands of a question. For example, a question asking a candidate to “describe” something is normally associated with the knowledge domain; another question asking the candidate to “explain” something is normally associated with analysis. In fact, the verb in the question can provide a clue to the question’s intellectual demands. Table 2 associates some verbs with the levels within Bloom’s Taxonomy.
Page 17
Introduction to Selected Response Questions
Page 18
Guide to Writing Objective Tests v1.2
Level
Verbs
Knowledge
define, describe, label, list, name, recall, show, who, when, where where
Comprehension
compare, discuss, distinguish, estimate, interpret, predict, summarise
Application
apply, calculate, demonstrate, illustrate, relate, show, solve
Analysis
analyse, arrange, categorise, compare, connect, explain, infer, order, separate
Synthesis
arrange, combine, compose, create, design, formulate, hypothesize, integrate, invent, modify, plan
Evaluation
assess, compare, decide, defend, discriminate, evaluate, judge, justify, measure, rank, rank, recommend.
Table 2 - Verbs Verbs associated with Bloom's Bloom's Taxonomy
So, for example, a question that commences: “Define…” is likely to assess basic knowledge; a question that begins “Compare…” is likely to assess analytical or evaluative skills. Note: Note SQA does not formally use a recognised taxonomy for assessments. However, when one is employed by Officers or appointees, it is usually Bloom’s. Some SQA question papers fall foul of Bloom’s Taxonomy, asking candidates to “explain” something but actually awarding marks for descriptions (or vice-versa). DIFFICULTY AND DEMAND Bloom’s Taxonomy provides an indication of the demand of a question – it does not define its difficulty. difficulty A question’s demand is a measure of its intellectual requirements; its difficulty is “how hard” it is. Although difficulty and demand are related (most demanding questions are difficult), a question can have high demand and low difficulty - or low demand and high difficulty.
Example 13 ~ Low demand, high difficulty question Describe the main processes that take place during nuclear fusion. This question has low demand (relating to factual recall) but high difficulty because it relates to a complex topic (nuclear fusion). Similarly, crossing the road involves evaluation skills (Is the road clear? Is it safe to cross? How far away is that car?),
Page 18
Introduction to Selected Response Questions
Page 19
Guide to Writing Objective Tests v1.2
which are at the top of Bloom’s hierarchy – but is not a difficult task for most people. So, merely climbing Bloom’s Taxonomy is no guarantee of difficulty. The concept of difficulty and demand has important implications for question setting. Most SQA tests employ low difficulty/low demand questions; but even the “more demanding” questions may not be – they might simply assess knowledge in a more difficult way (by, for example, assessing little known knowledge). QUESTION TYPES AND DEMANDS Each question type can be related to one or more levels in Bloom’s Taxonomy. While it’s possible to use any one of the question types for almost any of Bloom’s levels, some are better than others for specific levels as the following table describes.
True/False
While mostly used to assess knowledge, T/F questions can, in fact, be used to assess knowledge, comprehension and application levels.
Matching
Again, mostly used to assess basic knowledge but can be used to assess knowledge and comprehension.
MCQ
MCQs are the most flexible type of of SRQ and can assess all levels; they are particularly suitable for knowledge, comprehension, application and analysis.
MRQ
MRQs can assess the same range of levels as MCQs – but have the potential to create more difficult questions within each category.
Ranking
Ranking questions are well suited to assessing application and analysis.
Assertion
Suitable for knowledge, comprehension and analysis.
Table 3 - Question types and demand
So, in theory, SRQs can assess all of Bloom’s levels. However, in practice, it is uncommon to come across SRQs that assess anything other than knowledge and comprehension. But this is not an inherent limitation in their design. Assessing higher order skills can be done – but it is a time consuming and skilled task to do so.
Page 19
Introduction to Selected Selected Response Questions
Page 20
Guide to Writing Objective Tests v1.2
ADVANTAGES & DISADVANTAGES OF QUESTION TYPES As stated previously, each question type has its unique characteristics and uses. The applications of each type are determined by its strengths and weaknesses.
Type True/False
Advantage(s) Well suited to basic knowledge. Easy to write.
Disadvantage(s) Limited applications (best suited to dichotomous knowledge).
Rapid to mark. Suited to dichotomous knowledge. Good for formative assessment – especially diagnostic assessment. Matching
Relatively easy to write.
Limited to knowledge and comprehension.
Quick to mark. Good for assessing knowledge of characteristics/features or relationships between variables.
Best used for homogenous content i.e. classifying types.
Well suited to computerisation (drag-and-drop). MCQ/MRQ MCQ/MRQ
Can assess a wide range of cognitive abilities (up to analysis). Scenario-based questions can assess higher order skills. Well suited to diagnostic assessment (distractors can target learning difficulties). Item analysis provides detailed feedback (to assessors and candidates).
Good MCQs (at any level) are difficult and time consuming to construct. MCQs that assess high level abilities require skilled authors. Unsuitable for assessing synthesis and evaluative skills.
Simple MCQs are quick and easy to construct. High re-usability of items. Assertion
Well suited to assessing relationships between variables. Well suited to assessing understanding of causeand-effect. Good for constructing demanding items.
Table 4 - Advantages and disadvantages of question types
Page 20
Introduction to Selected Response Questions
Difficult to construct. Limited applications (compared to MCQs). Wordy – difficult to read and understand.
Page 21
Guide to Writing Objective Tests v1.2
In practice, the main barriers to constructing high quality items are the skills and experience of the authors. A talent for writing traditional question papers does not necessarily translate to writing SRQs – so experienced setting teams may struggle to create high quality item banks. Even after training, some writers don’t “get” SRQs – while others are veritable question factories. Note: Note If a unit writer wishes to use objective testing, s/he should not prescribe the particular type of SRQ in the unit specification itself. It’s better to simply state that selected response questions may be used – and leave the choice of SRQ to the assessment writers (although the Support Notes may suggest specific forms of SRQ).
Page 21
Introduction to Selected Response Questions
Page 22
Guide to Writing Objective Tests v1.2
WRITING MULTIPLE CHOICE QUESTIONS This section focuses on the construction of a specific type of selected response question (SRQ) – the multiple choice question (MCQ). However, much of the advice is transferable to other forms of SRQs. Multiple choice questions are the most common type of SRQ; they’re also the most flexible and most difficult to construct. MCQs are used in all types of objective testing (including high stakes assessment) and are the most common form of SRQ employed by SQA. ANATOMY OF AN MCQ A single, complete multiple-choice question is called an item. item It poses a question and allows a candidate to select the correct answer from a list of possible options. An MCQ has the following structure:
Figure 3 - Anatomy of an MCQ
Stem (or (or stimulus): stimulus) the question or problem. Options (or responses or alternatives) alternatives): the list of possible answers. Key: Key the correct (or best) answer. Distractors: Distractors the incorrect alternatives to the key. Note the spelling of “distractor” – which is the US-English spelling rather than the International English spelling (“distracter”). WRITING MULTIPLE CHOICE QUESTIONS There is no formula for constructing high quality items. However, there is some guidance that aids their construction.
Page 22
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 23
THE ITEM The key to writing good items (“authoring”) is to ensure that the question directly relates to the underlying Arrangements, it is clearly presented, and free from unnecessary details. A question should not be a test of reading ability; the focus must be on the knowledge or skill that it is seeking to assess. Ensure that each item is relevant to the course/unit outcomes. Ensure that the level of language is appropriate to the target cohort. Assess one thing at a time (unless you intend to ask an integrative question). One correct answer only.
Don’t write questions in isolation.
Don’t include unnecessary words.
Pre-test items whenever possible. one e correct The most difficult part of writing an item is to ensure that there is only on answer. Having more than one potentially correct answer is the most common complaint from teachers and candidates. It’s a challenge to write items with one clearly correct answer – at least non-trivial items. It’s easy to be subjective or context dependent (i.e. the key is correct is some circumstances but not others). One solution is to spell out the context – but this may make the item clumsy or wordy or gives clues to the correct answer. Another option is to use words and phrases like “best” or “most likely” in the stem (it’s easier to argue that the key is the most likely answer rather than the only answer). Although the initial construction of questions has to be the work of an individual, it’s vital that items are reviewed prior to being used operationally. It’s impossible for a single author to both write and review items independently. SRQs are well suited to prepre-testing – which means trying them out on students before using them operationally. Pre-testing will confirm the item’s suitability (or not). It also generates valuable data about the question that can be used in item analysis (see Section 6). STYLE GUIDE Each item should follow an agreed house-style to provide guidance on language use. A style guide for item writing would normally include advice about: •
Page 23
spelling
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 24
•
punctuation
•
use of emphasis
•
prose style
•
language.
For example, spelling advice would include the treatment of numbers (spelled in words or written as digits?); punctuation advice would include information on the punctuation to use within options (should they end with a period or without any form of punctuation?); emphasis rules would include the use of bold and italics; prose style and language would provide general advice about the type and level of language to be used. THE STEM It’s best to phrase the stem as a self-contained question rather than a partial statement – although the latter approach is neither uncommon nor invalid.
Try to phrase the stem as a complete question (unless this is too contrived – when an incomplete statement may be used).
Use clear, straight-forward language – suitable for the target cohort in terms of level of language.
Place necessary wording in the stem – not in each of the options.
Avoid irrelevant or unnecessary information.
Avoid negative wording if possible – or use negatives sparingly.
Specify any standards implied.
Avoid the use of personal pronouns (“I”, “You” etc.).
Avoid subjectivity e.g. “Which one of the following do you think is…” (what the candidate “thinks” is subjective – and her response cannot be wrong).
Any words that would be repeated in each of the options should be included in the stem. Options should not begin or end with identical words and phrases.
Page 24
Introduction to Selected Selected Response Questions
Page 25
Guide to Writing Objective Tests v1.2
Example 14 ~ Repeated text If the pressure of a certain amount of gas is held constant, what will happen if its volume is increased? A
The temperature of the gas will decrease.
B
The temperature of the gas will increase.
C
The temperature of the gas will remain the same.
Example 15 ~ Repeated text removed If the pressure of a certain amount of gas is held constant, what will happen to the temperature if its volume is increased? A
Decrease.
B
Increase.
C
Remain the same.
Avoid words like “could” and “would”. For example, asking a candidate “What would you do…” cannot be answered incorrectly (since only the candidate can know what she would do in any given circumstance) – instead write: “What should you do…”. The following example illustrates a poor question.
Example 16 ~ Using subjective wording A computer is running slowly. What could be responsible?
Page 25
A
Insufficient memory
B
Over-heating
C
Small hard drive
D
Virus
Introduction to Selected Response Questions
Page 26
Guide to Writing Objective Tests v1.2
The author intends D to be the correct answer – but any of the options could be correct. Here is an improved version.
Example 17 ~ Subjective wording removed A computer suddenly runs slowly without any changes to its configuration. What is most likely to be responsible? A
Insufficient memory
B
Over-heating
C
Small hard drive
D
Virus
Notice the added contextual information in the stem to improve the clarity of the question – and the replacement of “could” with “most likely”. Specify any standards implied.. If an item calls for a judgment, specify the authority or standard upon which the correct answer is based.
Example 18 ~ Standards specified According to the American Medical Association, the diet of the average American provides vitamins in amounts that are what? A
Adequate for normal consumption.
B
Inadequate for normal consumption.
C
In excess of normal requirements.
D
Variable in relation to individual requirements.
The key to good stem construction is to keep the question (or statement) as short as possible – consistent with providing sufficient information to clearly pose the question. But don’t be tempted to reduce the length of the stem by moving information into each of the options; this complicates the question and increases the candidate’s reading time. Negative wording is not prohibited but it’s better to word a question positively when this is possible. Double negatives should be completely avoided i.e. two negatives in the stem or a negative in the stem and a negative in the options. However, some
Page 26
Introduction to Selected Response Questions
Page 27
Guide to Writing Objective Tests v1.2
questions can be made unnecessarily complex by avoiding a single negative – in which case, use negatives. When negatives are used, emphasise “NOT” (or whatever construct is used) in the stem (or the options). THE OPTIONS
Provide between three and five options – four options is most common.
Options should be internally consistent (e.g. all consisting of people’s names, not three names and the measurement).
All of the options should be plausible.
All of the items should be quality equivalent.
Ordering of the options should follow a consistent and logical sequence.
The length of options should be comparable.
Options should be mutually exclusive.
Only one correct (or best) answer.
The one correct answer (key) should be actually correct.
The key should not be worded in a way that would make it likely to change over time.
Ensure that none of the distractors is conditionally correct (depending on circumstances or context – unless these are defined in the stem).
Do not create distractors that are too close to the key.
Don’t use words such as “not”, “never” or “always” to make an option incorrect.
Avoid the use of “All of the above”.
“None of the above” should be used sparingly (and when used should be the correct answer some of the time).
Avoid pejorative language (such as “bad”, “low”, “ignore” etc.).
Avoid syllogistic reasoning e.g. “Both A and B are correct”. Some of the advice is conflicting – such as “Stems should be short and simple” and “Move information to the stem rather than repeat it in each option”. Dealing with these tensions is the art of item construction!
Page 27
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 28
The advice about pejorative language is quite subtle. Any option that uses words such as “bad”, “low” and “ignore” is usually a distractor – authors rarely use such words in the key. At higher levels of understanding, it can be difficult to construct questions with one objectively correct answer and it is a common error in such questions to offer options that include more than one potentially correct answer. Careful wording (“Which one of the following is likely to be the best answer…”) can get round this potential problem. SEQUENCING OPTIONS Ordering of the options within an item should follow a logical order. If using numbers or dates then they should be displayed numerically or chronologically in ascending or descending order (normally ascending). Text answers should normally be sorted alphabetically unless there is a “natural” sequence to the options, in which case the natural sequence should be used in preference to alphabetical order. Do not order the options to try to evenly distribute distribute the answers (i.e. to ensure each option – A, B, C and D – is used approximately the same number of times) nor attempt to avoid clustering keys (e.g. A-B-B-B-C) since both of these strategies reduce the randomness of the test. USE OF “NONE OF THE ABOVE” The option “None of the above” should be used sparingly. It is preferable to avoid the use of “None of the above” as well as “All of the above” Studies have shown that they decrease item discrimination and test score reliability (see Section 6). However, “None of the above” can be used if authors ensure that:
•
it is used in several items in a test
•
it is sometimes the correct option (but not always)
•
it is not used after a negative stem
•
it is not used as “padding” (because you are short of other options).
“None of the above” may be particularly useful in questions that require candidates to carry out calculations, since this option effectively mops-up a large range of potential errors. But, if it’s used, it must sometimes be the key.
Page 28
Introduction to Selected Response Questions
Page 29
Guide to Writing Objective Tests v1.2
Example 19 ~ Good Good use of “None of the above” Which one of the following is the solution for x in the equation 5(x-1)=10. A
0
B
2
C
4
D
None of the above
ADVICE ON WRITING DISTRACTORS The quality of distractors has a huge impact on the quality of the question. Distractors have a particularly important role to play in formative assessment since their careful selection can provide a wealth of diagnostic information about the candidate’s present understanding. In summative assessment, carefully selected distractors can catch out unprepared (or under prepared) candidates. Writing distractors, therefore, requires as much thought as writing the key. Distractors should be as plausible as the key; do not use unrealistic or humorous distractors as this effectively reduces the number of (real) options.
Distractors should be as plausible as the key – no silly distractors – although some can be relatively weak.
Common misunderstandings make good distractors.
Incorrect paraphrasing of the question makes for good distractors.
Correct sounding distractors are good for the poorly prepared candidate.
True statements that do not answer the question are good distractors.
There is a balance to be struck between writing good distractors and trying to dupe candidates. Distractors should not “entrap” candidates – that is, catch out candidates through clever wording, very fine distinctions or tricks-of-the-trade. If you want to write a difficult question then do so through the knowledge and skills required to answer it – not by tricking the candidate into giving the wrong answer.
Page 29
Introduction to Selected Selected Response Questions
Page 30
Guide to Writing Objective Tests v1.2
ADVICE ON AVOIDING CUEING “Cueing” is the tendency for the stem (or the options) to infer the key. It is a common problem with SRQs. The following question has only one option (A) which is grammatically correct (the stem ends with “an” and only option A begins with a vowel).
Example 20 ~ Cueing A word used to describe a noun is called an:
Page 30
A
adjective
B
conjunction
C
pronoun
D
verb
The wording in the stem should not provide obvious clues to the correct answer.
Don’t give clues to the correct answer by ensuring the options flow from the stem, are in the same format and tense, and are grammatically correct.
Don’t allow the wording of the options to provide obvious clues to the correct answer.
Avoid the use of “always” and “never” in the options since these responses are rarely correct.
Avoid the use of “sometimes” and “often” in the options since these responses are often correct.
Avoid using stereotypical language that could give away the answer.
Avoid using phrases from textbooks.
Avoid pejorative wording (“bad”, “low” etc.) since these words are rarely used in the key.
Avoid absolute language such as “always”, “never”, etc. since these are rarely correct.
Introduction to Selected Response Questions
Page 31
Guide to Writing Objective Tests v1.2
Avoid complex language in one option compared with other options (this option tends to be the correct answer).
Avoid similar language in the stem and the options since the option with the most similar language is most likely to be the key.
Avoid visual cueing i.e. one option being much longer or standing out in some other way from the other options – this one is likely to be the key.
The length of options should be similar. An option that stands out from the others can indicate to a student that it is the right answer. If different lengths are unavoidable then use two long options adjacent to each other and two short options adjacent to each other. The following example illustrates some of this guidance.
Example 21 ~ Advice in context “Shakespeare wrote plays and they reflect both the depth of human emotion and the complexity of human society.” Which one of the following phrases improves the wording of the underlined fragment? A
“Shakespeare wrote plays who reflect…”
B
“Shakespeare wrote plays that reflect…”
C
“Shakespeare wrote plays which reflect…”
D
“Shakespeare wrote plays being that they reflect…”
The question appears to be a valid assessment of candidates’ knowledge of English grammar (presuming that this is what the author intended to assess) – although a more familiar context could have assessed the same knowledge (the mere mention of Shakespeare can disorientate candidates). The question is clearly worded – although some of the language in the stem is unnecessarily complex (words such as “fragment” could confuse candidates). The options look homogenous, with none standing out (no visual cueing). They have been ordered in a logical sequence (sentence length). They are all plausible to the under-studied candidate. There is some repeated text in the options that a rewording of the stem may avoid (but maybe not without making the question less clear). The distractors have been chosen to reflect common misunderstandings among candidates with respect to the use of “that”, “which” and “who”. And there is one unambiguously correct option (B).
Page 31
Introduction to Selected Response Questions
Page 32
Guide to Writing Objective Tests v1.2
All in all, a reasonable (albeit, imperfect) question. DISCLOSERS A concept associated with cueing is disclosing. disclosing A discloser is a question that contains the answer to another question. Unless otherwise intended, every question should be independent of every other question and should contain the minimum information required to answer the question. However, it can happen that the stem or options in one question inadvertently help candidates to answer another question. Disclosure is a particular problem in item banking when it is impossible to predict which items will be included in a particular instance of a test (such tests are usually dynamically generated by a computer – and a computer is unlikely to spot the subtleties of disclosure).
A checklist, summarising the advice for item construction, is provided in the appendices.
Page 32
Introduction to Selected Response Questions
Page 33
Guide to Writing Objective Tests v1.2
WRITING QUESTIONS FOR FO R HIGHER LEVEL SKILLS Multiple choice questions (MCQs) have gained a reputation for being a quick-anddirty way of assessing low level knowledge. However, they can also be used to assess higher level skills – but this requires a great deal more effort on the part of the writer. This section explores the potential of MCQs to assess higher level skills. As has been previously stated, MCQs can be used to assess all of the levels within Bloom’s Taxonomy – although they are more suited to the lower levels.. This section explores a couple of techniques for writing higher order questions and exemplifies this against each level in Bloom’s Taxonomy. Writing MCQs to assess higher order skills frequently contradicts some of the previous advice about writing good items. For example, such questions often involve long stems; complex language is frequently used; standards are often omitted (or the question becomes one of knowledge of the standard); and they often require an element of judgement on the part of the candidate (and, as a consequence, are less objective). Note: There is a fundamental distinction between writing questions that assess higher level skills and writing items that assess lower order skills in a “difficult” way. Writing a question that assesses some esoteric piece of knowledge is not a higher order item – although few candidates will answer it correctly, it is still only assessing a low level ability albeit in a difficult way (see previous discussion on difficulty and demand). TECHNIQUES FOR WRITING HIGHER ORDER QUESTIONS Writing higher level questions is easier in some subjects than others. Some fields, such as mathematics, are problem solving based and in such subjects it is relatively straight-forward to produce questions that assess more than knowledge and comprehension (see example 3 for a straight-forward application level question in Maths). In other subjects it’s not so easy. However, there are a few techniques that can be used to help authors produce more demanding MCQs. We will look at two: 1. scenario questions 2. passage-based reading. Before we do, there is a very simple technique that can be used to transform a simple knowledge question into one that is more demanding. Instead of asking “What…?”, ask “Why…?”. For example, in a Geography test, instead of asking: “Which one of the following cities is the capital of the United States” (which assesses basic knowledge), ask why Washington is the capital of the US (which requires an
Page 33
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 34
explanation).
Example 22 22 ~ Upgrading questions Why is Washington DC the capital of the United States? A
It is a planned city, capital by design.
B
It is the largest city in the United States.
C
It is located beside a large river and manufacturing base.
D
It is located in a position safe from British troops during the American Revolution.
This is a quick-and-dirty technique to generate more demanding questions, upgrading basic knowledge questions to comprehension or analysis levels. SCENARIO QUESTIONS The main method of writing demanding items is to present a scenario to candidates and then pose one or more related questions. The scenario can be anything from a paragraph to a page (although a very long scenario really requires a number of follow-on questions to justify its length). The associated question(s) may involve a range of cognitive abilities including interpretation (comprehension), prediction (comprehension), calculation (application), problem solving (application), explanation (analysis), inference (analysis), categorisation (analysis) and decision making (analysis and evaluation). Scenarios can be used in all subjects but are particularly suitable in the social sciences. Science subjects are inherently suited to problem solving and it is easier in these areas to pose demanding questions without the need for lengthy scenarios. The examples provided in this section are given without detailed comment. You are encouraged to critically appraise each question yourself. When you do, you will appreciate that no )non-trivial) question is without its weaknesses. A scenario question has a straight-forward construction. It consists of some text, which may be illustrated with a diagram or photograph, and one or more associated questions. The scenario can take one of a number of forms including:
Page 34
•
a description of a specific environment
•
a description of a specific situation
•
a description of a principle or theorem
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 35
•
a description of a problem
•
an explanation of an event
•
the results of an experiment (or the results of research).
Most scenario questions involve an element of interpretation on the part of the candidate. The candidate will take more time to process a scenario question as it often requires a high level of reading skills. This should be taken into account when determining the duration of a test (see Section 7).
Example 23 ~ Application skills Julie is 14 years old and frequently uses an online community called MyParty, which is a social network used by many of her friends. However, the service is open to any member of the public. She has become very friendly with Jamie, who is another user of the service, whom she has never met. Jamie’s profile reports that he is 16 years old and attends a nearby school. Julie and Jamie share many common interests and Jamie has asked to meet Julie, who wants to meet him. Which one of the following is Julie’s best course of action? A
Refuse to meet with him.
B
Agree to meet with him but accompanied with a responsible adult.
C
Agree to meet with him but accompanied with a friend.
D
Agree to meet with him.
This question uses a specific situation to ask a question that involves application skills. Any question that uses a scenario that the candidate is unfamiliar with is, in effect, assessing application skills.
Page 35
Introduction to Selected Response Questions
Page 36
Guide to Writing Objective Tests v1.2
Example 24 ~ Application and analysis skills A user is having problems reading files from a flashdrive. While most files work correctly, any attempt to access a few specific files results in an operating system error message: “Cannot read file. Storage device may be corrupt.”. Which one of the following actions is normally the best course of action in such circumstances? A
Copy the readable files from the device and do not re-use the device.
B
Copy the readable files from the device, reformat it and recopy the files to the device.
C
Ignore the error and continue to use the part of the device that is usable.
D
Reformat the device and re-use it.
Note that this question is an example of problem solving. Note also that there are at least two weaknesses. The key (B) “looks” correct (it is the longest and most detailed option); at least one of the distractors is weak (C) and uses pejorative language (“ignore”). But it has its strengths too. The key is clearly the best answer (not always an easy task when writing demanding questions) and it’s a challenging question (admittedly made easier by the options). And the author didn’t resort to “None of the above” as a final option! It is a moot point whether this item can be “fixed” or whether it has to be discarded. The following example uses a single scenario and a number of linked questions of increasing demand.
Page 36
Introduction to Selected Response Questions
Page 37
Guide to Writing Objective Tests v1.2
Example 25 ~ Application and analysis skills Raj and Sophie, who have never been married, have two children – Ben aged 8 and Shazia aged 2. Raj and Sophie’s relationship has ended, and Sophie has married Carlton. Raj has agreed that the children can live with Sophie and Carlton for the time being. For questions 1-4, the options are: A
Raj and Sophie.
B
Raj, Sophie and Carlton.
C
Sophie and Carlton.
D
Sophie only.
E
Raj only.
1
Who has parental responsibility for the children at present?
2
If Section 8 orders are required in respect of the children, who could apply as of right (without leave) for any Section 6 order?
3
Who would be able to apply as of right (without leave) for a residence or contact order?
4
If Raj obtained a contact order to see the children every week, who would have parental responsibility for the children?
PASSAGE-BASED READING A second technique to aid the writing of demanding questions is to use passagebased items. This involves presenting a passage of around 100 to 800 words and asking one or more linked questions about the passage. The passage can be narrative, argumentative or expository in nature. The questions can ask candidates about the meaning of words in the passage (vocabulary in context); ask questions about significant information the passage is seeking to impart (literal comprehension); or measure candidates’ ability to analyse information as well as to evaluate the assumptions made and the techniques used by the author (extended reasoning).
Page 37
Introduction to Selected Response Questions
Page 38
Guide to Writing Objective Tests v1.2
Example 26 ~ Passage-based question 1 2 3 4 5 6 7
“Psychoanalysis has been criticised on a variety of grounds by Karl Popper, Adolf Grünbaum, Mario Bunge, Hans Eysenck, L. Ron Hubbard and others. Popper argues that it is not scientific because it is not falsifiable. Grünbaum argues that it is falsifiable, and in fact turns out to be false. The other schools of psychology have produced alternative methods of psychotherapy, including behaviour therapy, cognitive therapy, primal therapy and person-centred psychotherapy.
8 9 10 11 12 13 14
An important consequence of the wide variety of psychoanalytic theories is that psychoanalysis is difficult to criticise as a whole. Many critics have attempted to offer criticisms of psychoanalysis that were in fact only criticisms of specific ideas present in one or more theories, rather than in all of psychoanalysis. For example, it is common for critics of psychoanalysis to focus on Freud's ideas, even though only a fraction of contemporary analysts still hold to Freud's major theses.” (Wikipedia)
A number of linked questions could be asked about this passage. For example, a vocabulary-in-context question could ask about the meaning of a word (or term) such as “falsifiable” (line 3) or “cognitive therapy” (line 6); a literal comprehension question could ask about the candidate’s understanding of this passage (such as asking her to choose the best (one line) summary of the passage); and a number of extended reasoning questions could be posed (such as one asking about criticisms of Freudian psychoanalysis). Passage-based reading can also be used to measure evaluation skills by asking candidates to judge the logical consistency of written material, the validity of experimental results, the interpretation of data, or the quality of writing.
Page 38
Introduction to Selected Response Questions
Page 39
Guide to Writing Objective Tests v1.2
Example 27 27 ~ Evaluation skills The Fibonacci sequence of numbers can be defined by the following mathematical recurrence relation.
The following Java method (i.e. function) specifies an implementation of this recurrence relation. public static int fibonacci (int n) { if (n == 0 || n == 1) { return 1; } else { return fibonacci (n(n-1) + fibonacci (n(n-2); } } Which one of the following statements best evaluates this function?
Page 39
A
The algorithm will produce the correct result and is efficient.
B
The algorithm will produce the correct result but is inefficient.
C
The algorithm will not produce the correct result.
D
The algorithm will fail.
Introduction to Selected Response Questions
Page 40
Guide to Writing Objective Tests v1.2
ITEM ANALYSIS One of the major advantages of selected response questions (SRQs) is that they can be easily analysed. Item analysis permits a more scientific approach to assessment. If you know the properties of each question (for example, how difficult it is or how well it separates candidates of differing abilities) then you can construct a better test. This section explores two classical ways of analysing items: (1) measuring their difficulty; and (2) measuring how well they separate candidates. The next section explains how these measures can be used to construct tests. FACILITY VALUE The facility value (FV) of an item is a measure of its difficulty – or, more accurately, its “easiness”. It represents the proportion of candidates who answer the item correctly and is expressed as a decimal fraction between zero and one. A FV of zero means that no-one answered the question correctly; a FV of one means that everyone answered the question correctly; and a FV of 0.6 means that 60% of the test takers answered it correctly. The lower the FV, the more difficult the item; the higher the FV, the easier the item (hence, it is better thought of as an “easy index”). A very easy item might have a FV of 0.9 (meaning that 90% of candidates are expected to answer it correctly) and a very difficult item might have a FV of 0.1 (meaning that 10% of candidates are expected to answer it correctly). Note: Note In a competency-based system (such as SQA’s), the FV measures the probability of a minimally competent candidate answering the question correctly – not not a typical candidate. candidate Facility values are best assigned during pre-testing. Once a sample group of students has attempted the item (assuming that this sample is representative of the target cohort), an initial FV can be assigned. If pre-testing is not possible (or, more likely, not feasible) a predicted facility value (PFV) can be assigned by the test authors. Predicted FVs are assigned by subject matter experts (SMEs) and represent the “best guess” of two or more SMEs. This initial estimate can be re-calibrated once the item is used operationally. Note that a FV is a relative measure of an item’s difficulty – relative to the target cohort’s age and stage. For example, a simple addition question might have a low FV for Primary 2 pupils but a high FV for Primary 4 pupils. Note that, in theory, any SRQ will have a minimum FV greater than zero. For example, any true/false question will have a minimum FV of 0.5 (which represents the 50-50
Page 40
Introduction to Selected Response Questions
Page 41
Guide to Writing Objective Tests v1.2
chance of guessing the answer correctly) and any MCQ (with four options) will have a minimum FV of 0.25 (no matter how difficult it is). However, in practice, some FVs will be lower than this due to the way the item has been constructed – with a key attracting more than its fair share of candidates and badly constructed distractors attracting very few candidates. It is recommended that items with FVs greater than 0.9 are discarded (too easy); similarly FVs lower than 0.1 should be avoided (too difficult). DISCRIMINATION INDEX The discrimination index (DI) of an item is a measure of how well that item separates candidates. It relates each candidate’s test score with his/her performance on a specific item, and then compares the top candidates with the bottom candidates. For example, if 30 candidates attempt an item, the DI measures the performance of the top third (top 10) of the candidates with the bottom third (bottom 10) of candidates (based on final test scores). If eight of the top ten answered the item correctly and two of the bottom third answered it correctly then the item’s DI is: DI = (8-2)/10 = 6/10 = 0.6. DI values range from +1 (all of the top candidates answered it correctly and none of the bottom candidates) to -1 (all of the bottom candidates answered it correctly and none of the top candidates!); a DI of zero means that the same number of top and bottom students answered it correctly. A positive DI is essential (which shows some discrimination). If an item yields a zero or negative DI, discard it. The above example illustrates good discrimination. It recommended that an item has a DI of at least 0.2; items with DI values of 0.4 and above are considered to have good discrimination. Discrimination indices cannot be predicted. They must be derived through pretesting or operational use. There is a link between a question’s facility value and its discrimination index. A “good” question that is designed to be difficult will have a low facility value and high discrimination. But not all questions with low FVs will have high DIs. A poorly designed question that is difficult to answer due to lack of clarity or inappropriate language may have a low FV and low discrimination (since few candidates can answer it – and poor candidates are as likely to get it right as good candidates). The following example (see over) illustrates the facility value and discrimination index for a specific question. The item was designed to assess the mathematical knowledge of S2 candidates. It was pre-tested on 60 candidates of whom 18 answered it correctly; 15 in the top third and three in the bottom third. This gave the following item analysis: FV = 0.30 DI = 0.60
Page 41
Introduction to Selected Response Questions
Page 42
Guide to Writing Objective Tests v1.2
Example 25 ~ Item analysis If the radius of a circle is increased by 20%, which one of the following represents the corresponding increase in the circle’s area? A
40%
B
44%
C
120%
D
144%
This item is difficult. Given that blind guessing would produce a one-in-four chance of answering it correctly (FV=0.25), the recorded FV of 0.30 (representing 30% of the sample) is very low. It also discriminates well, meaning that it is likely to separate candidates and aid grading. It is worth noting that this item is slightly cued. “44%” appears twice in the options (in B and D) – which might encourage some candidates to assume one of these options is correct (which would be a correct assumption – the key is B). This could have been avoided by selecting a different value for D (such as 160%). OTHER METRICS There is a range of other metrics that can be calculated for SRQs. Most are complex and, unlike facility values and discrimination indices, have no “real” interpretation. However, the distractor pattern provides useful information about which of the options candidates choose. For example, the following distractor pattern illustrates the choices made by 100 candidates for Example 25 (above).
Option Option
Frequency of selection
A
15
B
40
C
10
D
35
This distribution would infer that distractors A and C are under-performing and need to be strengthened or replaced. It might also indicate that distractor D is too strong
Page 42
Introduction to Selected Response Questions
Page 43
Guide to Writing Objective Tests v1.2
and may require weakening. It would appear that this question comes down to a straight choice between options B and D for most candidates. There isn’t a perfect distribution for the options – but options that are rarely selected or a distractor that is more popular than the key warrant attention.
Item analysis provides a means of evolving item banks by identifying underperforming (“weak”) items – and eliminating them (“survival of the fittest”). The initial calibration of items can be done formally (through field testing items prior to their use) or informally (using predicted facility values for example) and these initial values can be re-calibrated once the items are used in earnest. However, to be effective, item bank evolution (like biological evolution) needs a mechanism to identify weak items and replace these with stronger ones.
Page 43
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 44
CONSTRUCTING TESTS AUTHORING TESTS This section looks at the process of combining questions into a test. The following diagram illustrates the test generation procedure.
Figure 4 - Test generation generation procedure
TEST SPECIFICATION The test specification is the document (or “blueprint”) that defines the precise nature of the test. It is normally created by the Principal Assessor (or equivalent) under advice from the SQA Officer. The test specification will include the following information: •
description (including links with source unit(s) and outcome(s))
•
question format(s)
•
number of questions
•
duration
•
rubric (including the marking scheme)
•
pass mark (including grade boundaries where applicable)
•
conditions of assessment.
A sample test specification is provided in the appendices. The description of the test must (at a minimum) define the learning objectives that the test is seeking to measure. In the context of SQA, this would mean the unit(s) and outcome(s) that the assessment is testing (its “domain”). The question format defines the type of question that the test will employ. This might be true/false, matching, multiple choice or multiple response – or a mix of these types. For example, a test might use 15 MCQs and 5 MRQs – the test spec’ should spell this out. The number of questions is self-evident but note that where more than one question type is employed, the spec’ should specify the number of each type. The duration duration of the test will depend on the number of questions and the complexity of the questions. Simplistic formulas for the duration of a test (“two minutes per
Page 44
Introduction Introduction to Selected Response Questions
Page 45
Guide to Writing Objective Tests v1.2
question”) should be avoided. Scenario questions, in particular, take time to read, assimilate and answer. The duration should be based on a typical test undertaken by a typical candidate. If in doubt, err on the side of generosity – unless speed of response is a critical aspect of the assessment. The rubric defines the marking scheme and provides instructions to candidates. Setters may adopt a simple marking scheme (one mark per question) or more complex schemes (involving one, two or more marks for each item depending on its importance or complexity). Simple marking schemes are recommended. This section should also provide any special instructions for candidates. The pass mark (or cutting score) is the minimum mark that candidates must gain in order to achieve a pass in the test. There are a number of techniques for setting pass marks, some of which are discussed later in this section. But pulling a figure out of thin air is not one of them. And 50% is rarely a suitable cut score for an objective test (due to the effects of guessing – see below). If a test is graded (beyond the basic pass/fail threshold), the grade boundaries must be defined. The grade boundaries define the marks required to gain an A or B or C pass. For example, a C pass might require a total score between 60% and 74%, B between 75% and 89%, and an A pass 90% or more. Finally, the test spec’ should describe any special conditions that have not already been described elsewhere in the specification. Examples include: access to reference material (Is the assessment open book? Or open web?) and permitted materials (such as calculators or special instruments). ASSEMBLING THE TEST TEAM The test team is responsible for constructing the test, using the test specification as a blueprint. This team will normally consist of an SQA Officer and a number of setters – or, in testing terminology, a test expert (the SQA Officer) and a number of subject matter experts (the setters). The SMEs should have prior knowledge and experience of writing SRQs. The size of the team will depend on a number of factors such as the number of items required and the time available to write them. The more items required and less time available, the greater the number of SMEs needed. Subject matter experts may need training in the construction of selected response questions. This can be done at the authoring event (see below) or prior to this event, at a specific training event.
Page 45
Introduction to Selected Response Questions
Page 46
Guide to Writing Objective Tests v1.2
AUTHORING EVENT Due to the collaborative nature of item writing, it is recommended that questions are produced over a short period of intensive activity rather than the more traditional SQA approach to question setting. For example, a team of four SMEs might be asked to produce 200 items over an intensive working weekend. A suggested workflow during the authoring event is provided below. Allocate learning outcomes to SMEs
Agree targets with SMEs
Add item to item bank
Write item
Add item to batch
No
Completed batch?
Yes
Revise or discard item
Yes
No
Accept Item?
Pass batch To reviewer
Reviewer checks batch
Figure 5 - Authoring event workflow
Authors need to be crystal clear about the learning objectives (outcomes) that they are to assess. Where more than one outcome is to be covered by an individual SME, the number of questions for each outcome should be agreed. Each author’s targets should also include the types of question and number of each type of question (for example: “Twenty multiple choice questions and 10 multiple response questions”), the average facility value for their set of questions (see below), and the expected productivity rate (for example, five items per hour). Writing items is a solitary activity. Although authors may seek advice when they write questions, the act of putting pen to paper (or, more likely, finger to keyboard) is an individual task. Authors should be provided with a question template before commencing. This template (which is normally a Word document) defines the precise format of the question and will include metadata about the item (such as the
Page 46
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 47
associated keywords and its predicted facility value). A sample template is provided in the appendices. If the items are being written for a test with a known pass mark, authors will require to know the target facility value (FV) to aim for. For example, if the writers are producing items for a test with a pass mark of 15/20 then the target FV will be 0.75 and each author should ensure that each batch of questions has an average FV of 0.75 (so that the overall item bank has a “correct” FV). Authors should batch items before passing a group of questions to a designated reviewer for checking. The reviewer will then review each item and do one of three things: (1) accept it without change; (2) accept it with revisions; or (3) reject it. While it is unlikely that the author and reviewer cannot reach a compromise about a disputed item, in such cases the Principal Assessor should make a final decision. Reviewing is best done blind (i.e. without knowing the identity of the author) to prevent personality conflicts from interfering with the process. While group reviewing is a good means of training writers and reviewers, it is an inefficient way to create large numbers of items. The output from the authoring event will be an item bank of approved and calibrated items. The SQA officer will play a crucial role role in maintaining workflow and ensuring a productive event. Target setting and regular milestones will play an important part in ensuring a successful outcome. At various points during the event, the officer should convene review meetings when progress can be measured, and problems or bottlenecks can be collectively identified and addressed. DETERMINING TEST LENGTH Determining the number of questions to include in a test is an important decision. The length of a test has a direct relationship with the test’s reliability – the longer the test (and, by implication, the more questions in the test), the more reliable that test will be as a measure of the candidate’s ability. There are a number of factors that affect test length including: • • • •
the importance of the test the size of the domain being assessed the range of knowledge and skills contained within the domain the time available.
A high stakes test needs to be more reliable than a low stakes test – and therefore needs to be longer. However, the improvement in reliability levels off over a certain number of questions. The number of learning objectives being assessed also has a bearing on the size of the test. A test that assesses several outcomes (or one large outcome) will obviously require more items than one that assesses fewer outcome (or smaller outcomes). However, even a test that assesses a single outcome may require lots of questions if that outcome covers a broad range of knowledge and skills.
Page 47
Introduction to Selected Response Questions
Guide to Writing Objective Tests v1.2
Page 48
And, finally, the time available needs to be considered. There is no point is designing a test with 60 questions, requiring two hours to complete, if this is disruptive to centres. For example, most Scottish schools operate a 50 minute period and tests that last longer than this can be difficult to administer. There is no formula for test length. Criticality, domain size and practical considerations need to be balanced. However, in most instances of unit assessment it is best to keep tests as short as possible to reduce the assessment burden on centres (and candidates). TECHNIQUES FOR SETTING PASS MARKS IN OBJECTIVE TESTS There are a number of ways to set a pass mark. We will look at three methods: 1. Informed judgement 2. Angoff method 3. contrasting groups Some are more “scientific” than others but, no matter which method is used, none of them replace the need for human judgement. INFORMED JUDGEMENT This technique involves the most human judgement and, as a consequence, is the most subjective way of setting pass marks (it also is the method most similar to the way that SQA sets cut-scores). At its most basic level, informed judgement involves the opinion of the members of the setting team. These subject matter experts (SMEs) agree a sensible pass mark based on their expert judgement and the following considerations: •
the minimum mark achievable through guessing
•
the criticality of the judgement being made about candidates
•
the complexity of the subject domain
•
the difficulty of the test items
•
the age and stage of the candidates.
No matter how little a candidate knows, s/he is unlikely to score zero marks in an objective test due to the effects of guessing. For example, in an objective test consisting of 100 multiple choice items, each with four options, blind guessing should produce a minimum mark of 25% (representing the one in four chance of guessing the correct answer to each question). For this reason, the pass mark in an objective test is usually higher than 50%. The importance of the assessment also has a bearing on the pass mark. For example, an assessment that grants a license to practice as a surgeon is more
Page 48
Introduction to Selected Response Questions
Page 49
Guide to Writing Objective Tests v1.2
important that an assessment that confers a pass in a unit. Where it is critical that candidates possess particular competences both the test duration (see above) and the pass mark (see below) should be increased. If there is an existing item bank, the difficulty of the items in the bank can be used to determine the pass mark. For example, if we know that an item bank contains difficult questions then that would result in a lower pass mark; conversely, a simple item bank would lead to a higher pass mark. Associated with this is the complexity of the subject domain. For example, a test on nuclear physics might have a lower pass mark than one on multiplication tables – although this is dependent on the age and stage of the candidates. In practice, the informed judgement would be based on all of these considerations – some of which may drive the pass mark up and some may push it down. For example, an undergraduate true/false test for medical students would have a significantly higher pass mark than a multiple response test for a low level unit. The initial judgement may be refined after further consultation or pre-testing. For example, practicing teachers may be asked about their views about the proposed pass mark; and/or the assessment may be field-tested and the pass mark adjusted in the light of the resulting scores. ANGOFF METHOD This method of determining the pass mark is less subjective than the informed judgement approach. It involves aggregating the facility values (FVs) for each item and estimating the pass mark based on this figure. The following example illustrates this method.
Question
FV
1
0.8
2
0.6
3
0.6
4
0.3
5
0.4
Total
2.7
Pass mark
3/5
Table 5 - Setting pass marks using Angoff
Recall that the facility value is a measure of the probability (between 0 and 1) of minimally competent candidates answering the question correctly. For example,
Page 49
Introduction Introduction to Selected Response Questions
Page 50
Guide to Writing Objective Tests v1.2
based on the above table, there is an 80% probability that candidates will answer question one correctly (FV=0.8). Adding the FVs for each question, therefore, provides an indication of the total score that a minimally competent candidate should achieve (in this case 2.7). Subject matter experts would then either round this value down or up using their professional judgements (in this case the aggregate FV was rounded up). The resulting pass mark for this test is three out of five. In practice, pass marks are defined in the test specification, and the task, therefore, becomes one of selecting questions with FVs that aggregate to this pass mark. We effectively reverse engineer the Angoff method. For example, if the test specification defines a pass mark of 7/10 then the test should consist of questions whose FVs add to seven (give or take a decimal place). This is a very simple task for a computer. CONTRASTING GROUPS This method, unlike the previous ones, requires pre-testing. The test is issued to two groups of students – one group who are expected to pass and one group who are expected to fail. The actual scores are then plotted on a chart and the intersection of the graphs provides an initial pass mark. This initial pass mark is then refined using the SMEs’ expert judgement. The graph below illustrates the results for two groups of students – one group (the blue line) expected to fail and one group (the red line) expected to pass. 30
No of candidates
25
20
15
10
5
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
10
5
0
0
Marks
Figure 6 - Setting pass marks using contrasting groups
The initial cut score would be around 55% (the approximate intersection of the two lines). Raising this to 60% would reduce the number of “incompetent” students who
Page 50
Introduction to Selected Response Questions
Page 51
Guide to Writing Objective Tests v1.2
would pass the test – but increase the number of “competent” students who would fail. Conversely, decreasing the pass mark to 50% would reduce the number of “false fails” but increase the number of “false passes”. The final decision is based on the professional judgement of the SMEs.
These methods can be used alone or in combination. They all provide some scientific basis to the process of setting the pass mark. The alternative – pulling a pass mark from thin air – is not an option.
Page 51
Introduction to Selected Response Questions
Page 52
Guide to Writing Objective Tests v1.2
DEALING WITH GUESSING GUESSIN G Guessing is often cited as a major problem with selected response questions and it is true that blind guessing can produce relatively high marks for candidates in an objective test. For example, blind guessing in a true/false test should produce a result of approximately 50%. However, there are well established ways of dealing with guessing. These are: pass mark setting, negative marking and correction-forguessing. SETTING AN APPROPRIATE PASS MARK The simplest way of dealing with guessing is to adjust the pass mark accordingly. Instead of the “traditional” 50% pass mark, the cut score can be made higher to compensate for the effects of guessing. For example, a multiple choice test that has a pass mark of 75% is unlikely to be passed by blind guessing. We have already seen three ways of determining the pass mark for an objective test (informed judgement, Angoff method and contrasting groups). Any of these methods will eliminate (or greatly reduce) the effects of guessing. NEGATIVE MARKING Negative marking involves deducting marks for incorrect answers. For example, the following table illustrates a candidate’s scoring pattern in a five item test where one mark is awarded for the correct answer, zero marks where a question is not answered and one mark deducted for the incorrect answer.
Question
Mark
1
1
2
1
3
0
4
-1
5
1
Total
2
The main problem with negative marking is that it penalises partial knowledge. Selecting a “good” distractor is a better than choosing a “bad” distractor – but both choices will result in the loss of a mark.
Page 52
Introduction to Selected Response Questions
Page 53
Guide to Writing Objective Tests v1.2
CORRECTION-FOR-GUESSING This technique involves deducting a certain number of marks from every candidate to compensate for the effects of guessing. The number of marks deducted can be worked out in a number of ways, ranging from the crude (a fixed number of marks deducted from every candidate) to the more sophisticated (when the number of marks deducted is not fixed and is based on an estimate of how many guesses each candidate has made). An example of the second approach follows. In a 50 item test, where each item is a multiple choice question consisting of four options (a key and three distractors), a candidate scores 38/50. The proportion of marks deducted is based on the number of incorrect answers (which are assumed to be guesses) and is worked out as follows: No. of marks to deducted = No. of wrong answers x (1/No. of distractors) In this case: No. of marks deducted = 12 x 1/3 = 4 marks. So, four marks would be deducted from this candidate giving her an adjusted score of 34. While less crude than negative marking, this method suffers from similar problems – it penalises partial knowledge as much as no knowledge, and disproportionately affects low risk takers who will choose not to attempt a question rather than answer it for fear of losing marks, resulting in many unanswered questions – and deflated marks. SEQUENCING QUESTIONS When deciding the order of items in a test, it should be borne in mind that tests should begin with relatively simple questions and progress to more complex questions. It is also advisable to group item types together – for example, all true/false items and all MCQs. So, in most cases, it is advisable to begin with straight-forward, low difficulty true/false questions and progress to more complex, higher difficulty MRQ or assertion/reason items.
Page 53
Introduction to Selected Response Questions
Page 54
Guide to Writing Objective Tests v1.2
APPENDIX 1 – SAMPLE TEST SPECIFICATION S PECIFICATION Source unit Provide details about unit, outcomes and performance criteria that the test is assessing. Title
Internet Safety
Ref. No.
10 1234
SCQF level
4
Outcome
Performance criteria
No. of questions
1
All
9
2
d, e
9
3
a, b, c
7
Outcome(s) and Performance Criteria
Test details Provide details about the test. No. of questions
25
Type
Number
Duration
50 min.
MCQ
20
MRQ
5
Question format(s)
Selection of questions Explain selection criteria for questions.
Pass mark(s) Including grading thresholds where applicable.
Additional info. 4 options for each question 4 options for each question
There must be a fixed number of questions for each outcome. See distribution above. The question types (MCQ and MRQ) can be distributed between outcomes as desired.
16/25
Rubric Marking instructions, instructions to candidates, assessment conditions etc. Marking instructions Assessment conditions Such as reference materials, location, authentication.
Page 54
One mark per question.
No access to reference material (paper or web). Candidate authentication is required.
Instruction to candidates
No special instructions.
Author
Bobby Elliott
Introduction Introduction to Selected Response Questions
Date
12 May 2006
Page 55
Guide to Writing Objective Tests v1.2
APPENDIX 2 – SAMPLE TEMPLATE FOR MCQS Item
Stem
Options Key 1
Distractors
2
3
Metadata Outcome PC(s) PFV Tags Workflow
Page 55
Writing
Writer
Date
Time
Reviewing
Reviewer
Date
Time
Banking
Banker
Date
Time
Introduction to Selected Response Questions
Page 56
Guide to Writing Objective Tests v1.2
APPENDIX 3 – CHECKLIST FOR MULTIPLE MULTIP LE CHOICE QUESTIONS Q UESTIONS Test ID
Item ID
ITEM The question relates to learning outcome(s) and performance criteria. The level of language is appropriate to the target candidates. The question is set at an appropriate level of difficulty. There is one unambiguously correct answer. Cueing is avoided. STEM The stem is phrased as a question. Unnecessary information is not included. Necessary standards are specified. Negative wording is avoided. Personal pronouns (“you”, “we”, etc.) are avoided. Subjective wording is not used e.g. “What do you think…”. OPTIONS Options are sequenced in a definite order. Length of options are similar. Options are mutually exclusive. The key is not distinctive in terms of length, wording etc. Distractors are correct in every context (unless a specific context is given). Definitive wording (“never”, “always”, etc.) is not used. Pejorative wording is avoided e.g. “bad”, “little”, etc. “None of the above” is used sparingly. “All of the above” is not used. COMMENTS
Page 56
Introduction to Selected Response Questions
Reviewer