Chaired Panel 4 Mariana Lilley Barker Britton

  • August 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Chaired Panel 4 Mariana Lilley Barker Britton as PDF for free.

More details

  • Words: 4,501
  • Pages: 13
Computer Adaptive Testing in Higher Education: A case study Mariana Lilley1, Trevor Barker2, Carol Britton3 1

University of Hertfordshire, [email protected] University of Hertfordshire, [email protected] 3 University of Hertfordshire, [email protected]

2

ABSTRACT At the University of Hertfordshire we have developed a computer-adaptive test (CAT) prototype. The prototype was designed to select the questions presented to individual learners based upon their ability. Earlier work by the authors during the last five years has shown benefits of the CAT approach, such as increased learner motivation. It was therefore important to investigate the fairness of this assessment method. In the study reported here, statistical analysis of test scores from 320 participants show that in all cases scores were highly correlated between CATs and other assessment methods (p<0.05). This was taken to indicate that learners of all abilities were not disadvantaged by our CAT approach.

KEYWORDS Student assessment, Computer-adaptive test, Assessment fairness

INTRODUCTION The past ten to fifteen years have witnessed a significant increase in the use of computer-assisted assessment in Higher Education. Hardware developments and subsequent proliferation of computer technology in conjunction with the ever-increasing student numbers are amongst the main reasons for this trend (Freeman and Lewis, 1998; O’Reilly and Morgan, 1999; Wainer, 2000; Joy et al., 2002). Computer-assisted assessments are applications that support student testing, from actual test administration to scoring and student performance reporting. The benefits of these computerised tools over traditional paper-and-pencil tests are well reported in relevant literature and range from accuracy of marking to the potential to quickly assess large groups of students (Pritchett, 1999; Harvey and Mogey, 1999; De Angelis, 2000; Mason et al., 2001). A significant number of computer-assisted assessments currently being used in Higher Education are the so-called computer-based tests (CBTs). CBTs are traditionally not

SOLSTICE 2007 Conference, Edge Hill University

1

tailored towards individual students, as the same fixed set of questions is administered to all students regardless of their ability within the subject domain. Conventional CBTs differ from computer-adaptive tests (CATs) primarily in the way that the questions administered during a given assessment session are selected. In a CAT, one question is administered at a time and the selection of the next question is dependent on the response to the previous one. In summary, whilst CBTs mimic aspects of a paper-and-pencil test, CATs mimic aspects of an oral interview (Freedle & Duran, 1987; Syang & Dale, 1993). To this end, the first administered question within a CAT is typically one of average difficulty. A correct response will make a more difficult question follow. Conversely, an incorrect response will cause an easier question be administered next. By dynamically selecting the sequence and level of difficulty of the questions administered to each individual student's proficiency level, the CAT approach has the potential to offer higher levels of interaction and individualisation than those offered by its CBT counterpart. This can, in itself, lead to increased student motivation (Lilley et al., 2004). Because of individual differences in ability levels within the subject domain being tested, the CBT static approach often poses problems for some students. For example, a given question might be too easy and thus uninteresting for one student and too difficult and therefore bewildering to another student. More importantly, questions that are too difficult or too easy provide tutors with little information regarding student ability. We argue that it is only by asking questions at the boundary of what a student understands that we can obtain useful information about what he or she has learned. By adapting the level of difficulty of the questions to match the ability of the test-taker, questions that provide little information about a given student can be avoided. Despite the predicted benefits of computerised adaptive testing, the approach has received relatively little attention from British Higher Education institutions (Joy et al., 2002). The use of computer-adaptive tests (CATs) in Higher Education, as a means of enhancing student assessment as well as fully exploiting the computer technology already available, is the focus of ongoing research at the University of Hertfordshire. To this end, a CAT application based on the Three-Parameter Logistic Model from Item Response Theory (IRT) was designed, developed and evaluated. Earlier work by the authors showed the efficacy of the approach in the domain of English as a second language (Lilley & Barker, 2002; Lilley et al., 2004). Our current focus of research is the use of computerised adaptive testing within the Computer Science domain. In the next section of this paper, the reader is provided with an overview on IRT and computerised-adaptive testing. We then report on the findings of an empirical study, in which over 300 students participated in summative assessment sessions using our CAT application. Potential benefits and limitations of the CAT approach in addition to our views on how the work described here can be developed further are presented in the final section of this paper.

SOLSTICE 2007 Conference, Edge Hill University

2

COMPUTER-ADAPTIVE TESTS Computer-adaptive tests (CATs) are typically based on Item Response Theory (IRT) (Wainer, 2000). IRT is a family of mathematical functions that attempts to predict the probability of a test-taker answering a given question correctly. Since we aimed to develop a computer-adaptive test based on the use of objective questions such as multiple-choice and multiple-response, only IRT models capable of evaluating questions that are dichotomously scored were considered to be appropriate. The ThreeParameter Logistic (3-PL) Model was chosen over its counterparts One-Parameter Logistic Model and Two-Parameter Logistic Model as it takes into consideration the question's discrimination and the probability of a student answering a question correctly by guessing. Equation 1 shows the mathematical function from the 3-PL model used to evaluate the probability P of a student with an unknown ability θ correctly answering a question of difficulty b, discrimination a and pseudo-chance c. In order to evaluate the probability Q of a student with an unknown ability θ incorrectly answering a question, the function Q(θ ) = 1 − P(θ ) is used (Lord, 1980). Within a CAT, the question to be administered next as well as the final score obtained by any given student is computed based on the set of previous responses. This score is obtained using the mathematical function shown in Equation 2 (Lord, 1980). Equation 1: Three-Parameter Logistic Model

Equation 2: Response Likelihood Function

1− c P (θ ) = c + 1 + e −1.7 a (θ −b )

n

L(u1 , u 2 ,..., u n | θ ) = ∏ Pj j Q j u

1− u j

j =1

By applying the formula shown in Equation 1, it is possible to plot an Item Characteristic Curve (ICC) for any given question. Figure 1: 3-PL ICC for a correct response where a=1.5, b=0 and c=0.1 1

Probability of a correct response

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Ability θ

SOLSTICE 2007 Conference, Edge Hill University

3

Figure 2: 3-PL ICC for a correct response where a=1.7, b=1.48 and c=0.25 1

Probability of a correct response

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Ability θ

Figure 3: 3-PL ICC for an incorrect response where a=1.5, b=0 and c=0.1 1

Probability of an incorrect response

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Ability θ

Figure 4: The response likelihood curve assumes a bell-shape when at least one correct and one incorrect response are entered 0.3

Likelihood

0.2

0.1

0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Ability θ

Questions within the Three-Parameter Logistic Model are dichotomously scored. As an example, consider a student who answered a set of three questions, in which the first and second responses were incorrect and the third response was correct, such as u1 = 0, u2 = 0 and u3 = 1. The likelihood function for this example is 0 1 0 1 1 0 L(u1, u 2, u3 | θ ) = ( P1 Q1 )( P2 Q2 )( P3 Q3 ) , or more concisely L(u1, u 2, u3 | θ ) = Q1Q2 P3 . In the event of a student entering at least one correct and one incorrect response, the response likelihood curve (see Equation 2) assumes a bell-shape, as shown in Figure 4.

SOLSTICE 2007 Conference, Edge Hill University

4

IRT suggests that the peak of this curve is the most likely value for this student's ability θ estimate. An extensive discussion on IRT is beyond the scope of this paper, and the reader interested in investigating this topic in more depth is referred to Lord (1980), Hambleton (1991) and Wainer (2000). In the next section of this paper we describe the empirical study that is the main focus of this paper.

THE STUDY The study described here involved participants drawn from three different second-year programming modules at the University of Hertfordshire. For simplicity, these groups will be referred to as CS1, HND1 and HND2. The CS1 group comprised 96 students enrolled in a Visual Basic programming module of the Bachelor of Science (BSc) programme in Computer Science. Both HND1 and HND2 groups consisted of students enrolled in a Visual Basic programming module of the Higher National Diploma (HND) programme in Computer Science. The HND1 and HND2 groups had, respectively, 133 and 81 participants. All subject groups participated in three different types of summative assessment, namely computer-based test (CBT), computer-adaptive test (CAT) and practical exam. All assessment sessions took place in computer laboratories, under supervised conditions. The main characteristics of these assessment sessions are summarised in Table 1. Table 2 provides an overview of the topics covered by each session of computer-assisted assessment. Table 1: Summary of summative assessment methods Assessment

Week

Description

Computer-Assisted Assessment (1)

7

10 CBT questions followed by 10 CAT ones. Time limit of 30 minutes.

Computer-Assisted Assessment (2)

10

10 CBT questions followed by 20 CAT ones. Time limit of 40 minutes.

Practical exam

18

Each individual student had to create a working program based on a set of specifications provided on the day. Time limit of 2 hours.

As can be seen from Table 1, the computer-assisted assessments (1) and (2) comprised both non-adaptive (i.e. CBT) and adaptive (i.e. CAT) components. This was deemed necessary to ensure that students would not be disadvantaged by the adaptive approach, in addition to providing the authors with useful data for comparative purposes. The students, however, were unaware of the existence of the adaptive component until the end of the study.

SOLSTICE 2007 Conference, Edge Hill University

5

Table 2: Topics covered by Computer-Assisted Assessments (1) and (2) Number of questions Computer-Assisted Assessment (1)

Number of questions Computer-Assisted Assessment (2)

Data Types and variable declaration

4

5

Arithmetic, Comparison, Concatenation and Logical operators

4

5

Built-in/Intrinsic functions

4

5

Program flow

4

5

Standard controls: properties, methods and events

4

5

Professional controls: properties, methods and events

0

5

Topic

The CAT application used in this study comprised an adaptive algorithm based on the 3-PL Model, a question-database and a Graphical User Interface (GUI). The GUI is illustrated in Figure 5. Figure 5: Screenshot of a question regarding the MsgBox function

The question database contained information on each question, such as stem, options, key answer and IRT parameters. There are two main approaches to the calibration of questions with no historical data. The first approach would be the use of statistical simulations, commonly known as “virtual students” (Conejo et al., 2000). The second approach would be the use of experts in the subject domain to grade the difficulty of the questions (Fernandez, 2003). An important characteristic of our approach was that experts used Bloom's taxonomy of cognitive skills (Pritchett 1999; Anderson & Krathwohl, 2001) in order to perform the calibration of questions. In this work, questions were first classified according to cognitive skill being assessed. After this initial SOLSTICE 2007 Conference, Edge Hill University

6

classification, questions were then ranked according to difficulty within each cognitive level. Table 3 summarises the three levels of cognitive skills covered by the question database and their difficulty range. It can be seen from Table 3 that knowledge was the lowest level of cognitive skill and application was the highest. An important assumption of our work is that each higher level cognitive skill will include all lower level skills. As an example, a question classified as application is assumed to embrace both comprehension and knowledge. Table 3: Level of difficulty of questions Difficulty b

Cognitive skill

Skill involved

− 2 ≤ b < −0.6

Knowledge

Ability to recall taught material

− 0 .6 ≤ b < 0 .8

Comprehension

Ability to interpret and/or translate taught material

Application

Ability to apply taught material to novel situations

0 .8 ≤ b ≤ 2

At the end of each assessment session, questions were re-calibrated using response data obtained by all participants who attended the session. In general terms, questions that were answered correctly by many test takers had their difficulty levels lowered and questions that were answered incorrectly by many test takers had their difficulty levels increased. Both sessions of computer-assisted assessment started with the non-adaptive questions, followed by the adaptive ones. In the CBT section of the assessment, questions were sorted by topic and then by ascending difficulty. In the CAT section of the assessment, questions were grouped by topic and the level of difficulty of the question to be administered next was based on each individual set of previous responses (see Equation 2). The scores obtained by the subject groups in the three assessments were subjected to statistical analysis. The results of this statistical analysis are presented in the next section of this paper.

RESULTS Table 4 shows the mean scores obtained by participants in the three Visual Basic assessments undertaken. Table 4: Mean scores obtained by the participants in the three assessments undertaken (N=320)

96

CBT (1) Mean Score 52.7%

CAT (1) Mean Ability -0.90

CBT (2) Mean score 27.8%

CAT (2) Mean Ability -0.89

Practical exam Mean Score 48.2%

HND1

133

51.5%

-0.83

42.3%

-0.91

49.7%

HND2

91

50.0%

-0.81

24.3%

-1.45

47.9%

Group

N

CS1

SOLSTICE 2007 Conference, Edge Hill University

7

Note that the scores for the adaptive and the non-adaptive component of both computer-assisted assessments are shown in separate columns. The “Mean Ability” columns represent the mean value of the estimated student ability for the adaptive component of the computer-assisted assessment, which ranged from –2 (lowest) to +2 (highest). The scores for the non-adaptive (i.e. CBT) component of the computerassisted assessment and practical exam ranged from 0 (lowest score) to 100 (highest score). It was important to investigate whether or not participants were disadvantaged by our approach to adaptive testing. To this end, the data summarised in Table 4 was subjected to a Pearson's Product Moment correlation using the SPSS software package. This tests the significance of any relationship between individuals' scores obtained in the three assessments examined as part of this work. The Pearson's Product Moment correlation for the CS1, HND1 and HND2 groups are shown respectively in Tables 5, 6 and 7. Table 5: Pearson's Product Moment correlations between the scores obtained by CS1 students in the three assessments undertaken (N=96) ** Correlation is significant at the 0.01 level (2-tailed)

CAT1 Ability Pearson Correlation Sig. (2-tailed) N

CAT 2 Ability

CBT1 Score

CBT2 Score

.580(**) .000 96

.390(**) .000 96

.522(**) .000 96

Practical exam Score .575(**) .000 96

.488(**) .000 96

.758(**) .000 96

.640(**) .000 96

.390(**) .000 96

.502(**) .000 96

CAT2 Ability Pearson Correlation Sig. (2-tailed) N CBT1 Score

Pearson Correlation Sig. (2-tailed) N

CBT2 Score

Pearson Correlation Sig. (2-tailed) N

SOLSTICE 2007 Conference, Edge Hill University

.535(**) .000 96

8

Table 6: Pearson's Product Moment correlations between the scores obtained by HND1 students in the three assessments undertaken (N=133) ** Correlation is significant at the 0.01 level (2-tailed)

CAT1 Ability Pearson Correlation Sig. (2-tailed) N

CAT 2 Ability

CBT1 Score

CBT2 Score

.617(**) .000 133

.849(**) .000 133

.548(**) .000 133

Practical exam Score .552(**) .000 133

.552(**) .000 133

.816(**) .000 133

.571(**) .000 133

.467(**) .000 133

.445(**) .000 133

CAT2 Ability Pearson Correlation Sig. (2-tailed) N CBT1 Score

Pearson Correlation Sig. (2-tailed) N

CBT2 Score

Pearson Correlation Sig. (2-tailed) N

.527(**) .000 133

Table 7: Pearson's Product Moment correlations between the scores obtained by HND2 students in the three assessments undertaken (N=91) ** Correlation is significant at the 0.01 level (2-tailed) * Correlation is significant at the 0.05 level (2-tailed) Practical CAT 2 CBT1 CBT2 exam Ability Score Score Score CAT1 Ability Pearson Correlation .521(**) .421(**) .449(**) .394(**) Sig. (2-tailed) .000 .000 .000 .000 N 81 81 81 81 CAT2 Ability Pearson Correlation .411(**) .488(**) .350(**) Sig. (2-tailed) .000 .000 .001 N 81 81 81 CBT1 Score Pearson Correlation .412(**) .289(**) Sig. (2-tailed) .000 .009 N 81 81 CBT2 Score Pearson Correlation .236(*) Sig. (2-tailed) .034 N 81

The results of the statistical analysis of learners' performance are interpreted as supporting the view that the CAT method is a fair and reliable method of assessment. Those learners performing well on the CBT component also performed well on the CAT component of the assessment.

SOLSTICE 2007 Conference, Edge Hill University

9

It is well reported in the relevant literature that objective questions such as multiplechoice and multiple-response questions are useful when assessing the lowest levels of cognitive skills, namely knowledge, comprehension and application. On the other hand, practical examinations support the assessment of higher cognitive skills, namely analysis, synthesis and evaluation (Pritchett, 1999). Thus, due to the different nature of skills being evaluated in each assessment, it was expected that the correlation between practical examinations and computer-assisted assessments would be low but nonetheless significant.

DISCUSSION AND FUTURE WORK The research presented here was performed in the context of the increased use of computer technology as an enabling electronic tool for student assessment in Higher Education (HE). At the present time, many HE institutions are performing summative assessment using online CBT methods. It is important that learners and tutors are aware of the implications of this increasing trend. It is hoped that our research will be useful in understanding the opportunities as well as the limitations of such methods. By increasing learner motivation, providing better and reproducible estimates of learner ability and by improving the efficiency of testing it is hoped that both tutors and learners may benefit from the CAT approach. This is only likely to be useful if tutors and learners have confidence in the fairness of the approach. Our research to date suggests that the CAT prototype we developed did not negatively affect student performance (Barker & Lilley, 2003, Lilley & Barker, 2002, Lilley & Barker, 2003, Lilley et al., 2004). It is hoped that our work provides some evidence that CATs reflect learner ability at least as well as CBT and off-computer tests. The main benefit of the CAT approach over its CBT counterpart would be higher levels of individualisation and interaction. In a typical CBT, the same fixed set of questions is presented to all students participating in the assessment session, regardless of their performance during the test. The questions within this predefined set are typically selected in such a way that various ability levels, ranging from low to advanced, are considered (Pritchett, 1999). A consequence of this configuration is that highperformance students are presented with one or more questions that are below their level of ability. Similarly, low-performance students are presented with questions that are above their level of ability. The adaptive algorithm within CATs makes it possible to administer questions that are appropriate for each individual student level of ability. In so doing, a tailored test is dynamically created for each individual test-taker. Individualising testing can bring double benefits to student assessment: • •

academic staff can be provided with more significant information regarding student performance; test-takers can be provided with a more motivating assessment experience, as they will not be presented with questions that are too easy and therefore unchallenging or too difficult and thus bewildering.

SOLSTICE 2007 Conference, Edge Hill University

10

Some participants from the CS1 group corroborated the view that CATs can support a more motivating assessment experience, and reported that using our CAT application was challenging and not “boring” as other programming tests that they have taken in the past. Interestingly, they reported that they liked to be assessed using our application because they felt challenged rather than expected to answer “silly” test questions. These comments were unprompted and provided by high performing students. In previous work by the authors (Lilley et al., 2004), participants of a focus group reported as one of benefits of the CAT approach the fact that it allowed not only the most proficient but also the less able students to demonstrate what they know. Although CATs are more difficult to implement than CBTs due to the need of an adaptive algorithm and a larger differentiated question-database, the findings from our quantitative and qualitative evaluations of the CAT approach to date foster further research. In other research we have shown the benefit of the CAT approach in the context of formative rather than summative assessment. The utilisation of the adaptive approach for formative purposes has been successful in other projects such as ASAM (Yong & Higgins, 2004) and SIETTE (Conejo et al., 2000). It is our view that the inherent characteristics of the CAT approach – for example, the tailoring of the difficulty of the task to each individual student – contributed to the success of such projects. At the University of Hertfordshire we are currently engaged in extending the work presented here to provide students with personalised feedback on test performance. The use of a differentiated database of questions that are separated into topic areas and cognitive skill level has made it possible to provide individualised feedback on student performance. We have shown this approach to the provision of feedback to be fast and effective and well received by students of all abilities and valued by teaching staff (Lilley & Barker, 2006; Barker & Lilley, 2006). We argue that this has been made easier and more informative by the CAT approach (Lilley & Barker, 2006; Barker et al. 2006). An important recent application of the CAT approach is its use in a student model for the presentation and configuration of learning objects (Barker, 2006). Although this research with CATs is in early stages, we have evidence of the benefits of the approach for more than ten years. We are currently engaged in implementing a computer application that is capable of using CAT scores to structure and differentiate the presentation of podcasts for learners, based upon their ability and their needs.

REFERENCES ANDERSON, L. W. & KRATHWOHL, D. R. (Eds.) (2001) A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. New York: Longman.

SOLSTICE 2007 Conference, Edge Hill University

11

BARKER, T. & LILLEY, M. (2006) Measuring staff attitude to an automated feedback system based on a Computer Adaptive Test, Proceedings of CAA 2006 Conference, Loughborough University, July 2006. BARKER, T. (2006) Attending to Individual Students: How student modelling can be used in designing personalised blended learning objects. Journal for the Enhancement of Learning and Teaching, ISSN 1743-3932, Vol. 3 (2) pp 38-48. BARKER, T. & LILLEY, M. (2003) Are Individual Learners Disadvantaged by the Use of Computer-Adaptive Testing? In Proceedings of the 8th Learning Styles Conference. University of Hull, European Learning Styles Information Network (ELSIN), pp. 30-39. BARKER, T., LILLEY, M & BRITTON, C. (2006) A student model based on computer adaptive testing to provide automated feedback: The calibration of questions. Presented at Association for Learning Technology, ALT 2006, Heriot-Watt University, September 4-7, 2006. BARKER, T., LILLEY, M & BRITTON, C. (2006) Computer Adaptive Assessment and its use in the development of a student model for blended learning. Annual Blended Learning Conference, University of Hertfordshire, July 2006. CONEJO, R., MILLAN, E., PEREZ-DE-LA-CRUZ, J. L. and TRELLA, M. (2000) An Empirical Approach to On-Line Learning in SIETTE. Lecture Notes in Computer Science 1839, pp. 605614. DE ANGELIS, S. (2000) Equivalency of Computer-Based and Paper-and-Pencil Testing. Journal of Allied Health 29(3), pp. 161-164. FERNANDEZ, G. (2003) Cognitive Scaffolding for a Web-Based Adaptive Learning Environment. Lecture Notes in Computer Science 2783, pp. 12-20. FREEDLE, R. O. & DURAN, R. P. (1987) Performance. New Jersey: Ablex.

Cognitive and Linguistic Analyses of Test

FREEMAN, R. & LEWIS, R. (1998) Planning and Implementing Assessment. London: Kogan Page. HAMBLETON, R. K. (1991) Publications.

Fundamentals of Item Response Theory.

California: Sage

HARVEY, J. & MOGEY, N. (1999) Pragmatic issues when integrating technology into the assessment of students In S. Brown, P. Race & J. Bull (Eds.). Computer-Assisted Assessment in Higher Education. London: Kogan Page. JOY, M., MUZYANTSKII, B., RAWLES, S. & EVANS, M. (2002) An Infrastructure for WebBased Computer-Assisted Learning. ACM Journal of Educational Resources (2)4, December 2002, pp. 1-19. LILLEY, M. & BARKER, T. (2006) Student attitude to adaptive testing, Proceedings of HCI 2006 Conference, Queen Mary, University of London, 11-15 September 2006.

SOLSTICE 2007 Conference, Edge Hill University

12

LILLEY, M. & BARKER, T. (2006) Students’ perceived usefulness of formative feedback for a computer-adaptive test, proceedings of ECEL 2006: The European Conference on e-Learning, University of Winchester, 11-12 September 2006 LILLEY, M. & BARKER, T. (2002) The Development and Evaluation of a Computer-Adaptive Testing Application for English Language In Proceedings of the 6th Computer-Assisted Assessment Conference. Loughborough University, United Kingdom, pp. 169-184. LILLEY, M. & BARKER, T. (2003) Comparison between computer-adaptive testing and other assessment methods: An empirical study In Research Proceedings of the 10th Association for Learning and Teaching Conference. The University of Sheffield and Sheffield Hallam University, United Kingdom, pp. 249-258. LILLEY, M., BARKER, T. & BRITTON, C. (2004) The development and evaluation of a software prototype for computer adaptive testing. Computers & Education Journal 43(1-2), pp. 109-123. LORD, F. M. (1980) Applications of Item Response Theory to practical testing problems. New Jersey: Lawrence Erlbaum Associates. MASON, B. J., Patry, M. & Bernstein, D. J. (2001) An Examination of the Equivalence Between Non-Adaptive Computer-Based and Traditional Testing. Journal of Educational Computing Research 24(1), pp. 29-39. O’REILLY, M. & MORGAN, C. (1999) Online Assessment: creating communities and opportunities In S. Brown, P. Race, P. and J. Bull. Computer-Assisted Assessment in Higher Education. London: Kogan Page. PRITCHETT, N. (1999) Effective question design In S. Brown, P. Race & J. Bull (Eds.). Computer-Assisted Assessment in Higher Education. London: Kogan Page. SYANG, A. & DALE, N. B. (1993) Computerized adaptive testing in Computer Science: assessing student programming abilities. ACM SIGCSE Bulletin 25 (1), March 1993, pp. 53-57. WAINER, H. (2000) Computerized Adaptive Testing: A Primer. Lawrence Erlbaum Associates Inc. YONG, C. F. & HIGGINS, C. A. (2004) Self-assessing with adaptive exercises In Proceedings of the 8th Computer-Assisted Assessment Conference. Loughborough University, United Kingdom, pp. 463-469.

SOLSTICE 2007 Conference, Edge Hill University

13

Related Documents