Assessment, Learning and Judgement in Higher Education
Gordon Joughin Editor
Assessment, Learning and Judgement in Higher Education
13
Editor Gordon Joughin University of Wollongong Centre for Educational Development & Interactive Resources (CEDIR) Wollongong NSW 2522 Australia
[email protected]
ISBN: 978-1-4020-8904-6
e-ISBN: 978-1-4020-8905-3
DOI: 10.1007/978-1-4020-8905-3 Library of Congress Control Number: 2008933305 # Springer ScienceþBusiness Media B.V. 2009 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com
In memory of Peter Knight
Preface
There has been a remarkable growth of interest in the assessment of student learning and its relation to the process of learning in higher education over the past ten years. This interest has been expressed in various ways – through large scale research projects, international conferences, the development of principles of assessment that supports learning, a growing awareness of the role of feedback as an integral part of the learning process, and the publication of exemplary assessment practices. At the same time, more limited attention has been given to the underlying nature of assessment, to the concerns that arise when assessment is construed as a measurement process, and to the role of judgement in evaluating the quality of students’ work. It is now timely to take stock of some of the critical concepts that underpin our understanding of the multifarious relationships between assessment and learning, and to explicate the nature of assessment as judgement. Despite the recent growth in interest noted above, assessment in higher education remains under-conceptualized. This book seeks to make a significant contribution to conceptualizing key aspects of assessment, learning and judgement. The book arose from the Learning-oriented Assessment Project (LOAP) funded by the Hong Kong University Grants Committee, led by a team from The Hong Kong Institute of Education and involving all ten of the higher education institutions in Hong Kong between 2003 and 2006. LOAP initially focused on assessment practices, with the goal of documenting and disseminating practices that served explicitly to promote student learning. This goal was achieved through conferences, symposia, and the publication of two collections of learning-oriented assessment practices in English1 (Carless, Joughin, Liu, & Associates, 2006) and Chinese2 (Leung & Berry, 2007). Along with this goal, the project sought to reconceptualize the relationship between assessment and learning, building on research conducted in the UK, the USA, Europe, Asia and Australia, and drawing on leading assessment theorists in 1
Carless, D., Joughin, G., Liu, N-F., & Associates. (2006). How assessment supports learning: Learning-oriented assessment in action. Hong Kong: Hong Kong University Press. 2 Leung, P. & Berry, R. (Eds). (2007). Learning-oriented assessment: Useful practices. Hong Kong: Hong Kong University Press. (Published in Chinese)
vii
viii
Preface
higher education. The initial outcome of this was a Special Issue of the journal, Assessment and Evaluation in Higher Education, one of the leading scholarly publications in this field (Vol 31, Issue 4: ‘‘Learning-oriented assessment: Principles and practice’’). In the final phase of the project, eight experts on assessment and learning in higher education visited Hong Kong to deliver a series of invited lectures at The Hong Kong Institute of Education. These experts, Tim Riordan from the USA, David Boud and Royce Sadler from Australia, Filip Dochy from Belgium, and Jude Carroll, Kathryn Ecclestone, Ranald Macdonald and Peter Knight from the UK, also agreed to contribute a chapter to this book, following the themes established in their lectures. Georgine Loacker subsequently joined Tim Riordan as a co-author of his chapter, while Linda Suskie agreed to provide an additional contribution from the USA. This phase of the project saw its scope expand to include innovative thinking about the nature of judgement in assessment, a theme particularly addressed in the contributions of Boud, Sadler and Knight. The sudden untimely death of Peter Knight in April 2007 was a great shock to his many colleagues and friends around the world and a great loss to all concerned with the improvement of assessment in higher education. Peter had been a prolific and stimulating writer and a generous colleague. His colleague and sometime collaborator, Mantz Yorke, generously agreed to provide the chapter which Peter would have written. Mantz’s chapter appropriately draws strongly on Peter’s work and effectively conveys much of the spirit of Peter’s thinking, while providing Mantz’s own unique perspective. My former colleagues at the Hong Kong Institute of Education, David Carless and Paul Morris, the then President of the Institute, while not directly involved in the development of this book, provided the fertile ground for its development through their initial leadership of LOAP. I am also indebted to Julie Joughin who patiently completed the original preparation and formatting of the manuscript. Wollongong February 2008
Gordon Joughin
Contents
1
Introduction: Refocusing Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Gordon Joughin
2
Assessment, Learning and Judgement in Higher Education: A Critical Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gordon Joughin
13
3
How Can Practice Reshape Assessment? . . . . . . . . . . . . . . . . . . . . . . . . . . 29 .. David Boud
4
Transforming Holistic Assessment and Grading into a Vehicle for Complex Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Royce Sadler
45
Faulty Signals? Inadequacies of Grading Systems and a Possible Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mantz Yorke
65
The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filip Dochy
85
5
6
7
Plagiarism as a Threat to Learning: An Educational Response . . . . . . . . 115 ... Jude Carroll
8
Using Assessment Results to Inform Teaching Practice and Promote Lasting Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linda Suskie
133
Instrumental or Sustainable Learning? The Impact of Learning Cultures on Formative Assessment in Vocational Education . . . . . . . . Kathryn Ecclestone
153
9
ix
x
10
11
Contents
Collaborative and Systemic Assessment of Student Learning: From Principles to Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tim Riordan and Georgine Loacker
175
Changing Assessment in Higher Education: A Model in Support of Institution-Wide Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ranald Macdonald and Gordon Joughin
193
Assessment, Learning and Judgement: Emerging Directions . . . . . . . Gordon Joughin
215
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
223
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229
12
Contributors
About The Editor Gordon Joughin is Coordinator of Academic Development in the Centre for Educational Development and Interactive Resources at the University of Wollongong, Australia. He has worked in several Australian universities as well as the Hong Kong Institute of Education where he directed the final phase of the Hong Kong wide Learning Oriented Assessment Project. His research and recent writing has focused on the relationship between learning and assessment, with a special emphasis on oral assessment. His most recent book (with David Careless, Ngar-Fun Lui and Associates) is How Assessment Supports Learning: Learning-oriented Assessment in Action (Hong Kong University Press). He is a member of the Executive Committee of the Higher Education Research and Development Society of Australasia and a former President of the Association’s Hong Kong Branch.
About The Authors David Boud is Dean of the University Graduate School and Professor of Adult Education at the University of Technology, Sydney. He has been President of the Higher Education Research and Development Society of Australasia and is a Carrick Senior Fellow. He has written extensively on teaching, learning and assessment in higher and professional education, and more recently on workplace learning. In the area of assessment he has been a pioneer in developing learning-centred approaches to assessment and has particularly focused on self-assessment (Enhancing Learning through Self Assessment, Kogan Page, 1995) and building assessment skills for long-term learning (Rethinking Assessment in Higher Education: Learning for the Longer Term, Routledge, 2007). His work can be accessed at www.davidboud.com. Jude Carroll is the author of A Handbook for Deterring Plagiarism in Higher Education (Oxford Centre for Staff and Learning Development, 2007, 2nd edition) and works at Oxford Brookes University in the UK. xi
xii
Contributors
She is Deputy Director of the Assessment Standards Knowledge Exchange (ASKe), a Centre for Excellence in Learning and Teaching focused on ensuring students understand assessment standards. She researches, writes and delivers workshops about student plagiarism and about teaching international students in ways that improve learning for all students. Filip Dochy is Professor of Research on Teaching and Training and Corporate Training, based jointly in the Centre for Educational Research on Lifelong Learning and Participation and the Centre for Research on Teaching and Training, at the University of Leuven, Belgium. He is a past president of EARLI, the European Association for Research on Learning and Instruction, and the Editor of Educational Research Review, the official journal of EARLI. His current research focuses on training, team learning, teacher education and higher education, assessment, and corporate learning, and he has published extensively on these themes. Kathryn Ecclestone is Professor in Education at the Westminster Institute of Education at Oxford Brookes University. She has worked in post-compulsory education for the past 20 years as a practitioner in youth employment schemes and further education and as a researcher specialising in the principles, politics and practices of assessment and its links to learning, motivation and autonomy. She has a particular interest in socio-cultural approaches to understanding the interplay between policy, practice and attitudes to learning and assessment. Kathryn is a member of the Assessment Reform Group and the Access to Higher Education Assessment working group for the Quality Assurance Agency. She is on the Editorial Board of Studies in the Education of Adults and is books review editor for the Journal of Further and Higher Education. She has published a number of books and articles on assessment in post-compulsory education. Georgine Loacker is Senior Assessment Scholar at Alverno College, Milwaukee, USA. While heading the English Department, she participated in the development of the ability-based education and student assessment process begun there in 1973 and now internationally recognized. She continues to contribute to assessment theory through her writing and research on assessment of the individual student. She has conducted workshops and seminars on teaching for and assessing student learning outcomes throughout the United States and at institutions in the UK, Canada, South Africa, Costa Rica and in Australia and New Zealand as a Visiting Fellow of the Higher Education Research and Development Society of Australasia. Ranald Macdonald is Professor of Academic Development and Head of Strategic Development in the Learning and Teaching Institute at Sheffield Hallam University. He has been responsible for leading on policy and practice aspects of assessment, plagiarism, learning and teaching. A previous Co-chair of the UK’s Staff and Educational Development Association, Ranald is currently Chair of its Research Committee and a SEDA Fellowship holder. He was awarded a National Teaching Fellowship in 2005 and was a Visiting
Contributors
xiii
Fellow at Otago University, New Zealand in 2007. Ranald’s current research and development interests include scholarship and identity in academic development, enquiry-focused learning and teaching, assessment and plagiarism, and the nature of change in complex organisations. He is a keen orienteer, gardener and traveller, and combines these with work as much as possible! Tim Riordan is Associate Vice President for Academic Affairs and Professor of Philosophy at Alverno College. He has been at Alverno since 1976 and in addition to his teaching has been heavily involved in developing programs and processes for teaching improvement and curriculum development. In addition to his work at the college, he has participated in initiatives on the scholarship of teaching, including the American Association for Higher Education Forum on Exemplary Teaching and the Association of American Colleges and Universities’ Preparing Future Faculty Project. He has regularly presented at national and international conferences, consulted with a wide variety of institutions, and written extensively on teaching and learning. He is co-editor of the 2004 Stylus publication, Disciplines as Frameworks for Student Learning: Teaching the Practice of the Disciplines. He was named the Marquette University School of Education Alumnus of the Year in 2002, and he received the 2001 Virginia B. Smith Leadership Award sponsored by the Council for Adult and Experiential Learning and the National Center for Public Policy and Higher Education. D. Royce Sadler is a Professor of Higher Education at the Griffith Institute for Higher Education, Griffith University in Brisbane, Australia. He researches the assessment of student achievement, particularly in higher education. Specific interests include assessment theory, methodology and policy; university grading policies and practice; formative assessment; academic achievement standards and standards-referenced assessment; testing and measurement; and assessment ethics. His research on formative assessment and the nature of criteria and achievement standards is widely cited. He engages in consultancies for Australian and other universities on assessment and related issues. A member of the Editorial Advisory Boards for the journals Assessment in Education, and Assessment and Evaluation in Higher Education, he also reviews manuscripts on assessment for other journals. Linda Suskie is a Vice President at the Middle States Commission on Higher Education, an accreditor of colleges and universities in the mid-Atlantic region of the United States. Prior positions include serving as Director of the American Association for Higher Education’s Assessment Forum. Her over 30 years of experience in college and university administration include work in assessment, institutional research, strategic planning, and quality management. Ms. Suskie is an internationally recognized speaker, writer, and consultant on a broad variety of higher education assessment topics. Her latest book is Assessment of Student Learning: A Common Sense Guide (Jossey-Bass Anker Series, 2004).
xiv
Contributors
Mantz Yorke is currently Visiting Professor in the Department of Educational Research, Lancaster University. He spent nine years in schools and four in teacher education at Manchester Polytechnic before moving into staff development and educational research. He then spent six years as a senior manager at Liverpool Polytechnic, after which he spent two years on secondment as Director of Quality Enhancement at the Higher Education Quality Council. He returned to his institution as Director of the Centre for Higher Education Development, with a brief to research aspects of institutional performance, in particular that relating to ‘the student experience’. He has published widely on higher education, his recent work focusing on employability, student success, and assessment.
Chapter 1
Introduction: Refocusing Assessment Gordon Joughin
The complexity of assessment, both as an area of scholarly research and as a field of practice, has become increasingly apparent over the past 20 years. Within the broad domain of assessment, assessment and learning has emerged as a prominent strand of research and development at all levels of education. It is perhaps no coincidence that, as the role of assessment in learning has moved to the foreground of our thinking about assessment, a parallel shift has occurred towards the conceptualisation of assessment as the exercise of professional judgement and away from its conceptualisation as a form of quasimeasurement. Assessment, learning and judgement have thus become central themes in higher education. This book represents an attempt to add clarity to the discussion, research and emerging practices relating to these themes. In seeking to develop our understanding of assessment, learning and judgement, the emphasis of this book is conceptual: its aim is to challenge and extend our understanding of some critical issues in assessment whilst remaining mindful of how assessment principles are enacted in practice. However, understanding, while a worthwhile end in itself, becomes useful only when it serves the purpose of improvement. Hence an underlying thread of the book is change. Each chapter proposes changes to the ways in which we think about assessment, and each suggests, explicitly or by implication, important changes to our practices, whether at the subject or course level, the level of the university, or the level of our higher education systems as a whole. The concepts of assessment, learning and judgement draw together the three core functions of assessment. While assessment can fulfil many functions,1 three predominate: supporting the process of learning; judging students’ achievement in relation to course requirements; and maintaining the standards of the profession
G. Joughin Centre for Educational Development and Interactive Resources, University of Wollongong, NSW 2522, Australia e-mail:
[email protected] 1
Brown, Bull and Pendlebury (1997), for example, list 17.
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_1, Ó Springer ScienceþBusiness Media B.V. 2009
1
2
G. Joughin
or discipline for which students are being prepared. Each of these is important, with each having particular imperatives and each giving rise to particular issues of conceptualisation and implementation.
Assessment and Learning There are numerous ways in which assessment may be seen to promote learning and which have now been well rehearsed in the higher education literature (see, for example, Carless, Joughin, Liu, & Associates, 2006, Gibbs & Simpson, 2004–2005). Four of these are particularly pertinent to this book, with authors offering insights into aspects of assessment and learning that are especially problematic. The first is through the design of assessment tasks as learning tasks, so that the process of meeting assessment requirements requires students to engage in processes expected to lead to lasting learning of a worthwhile kind. Sitting for an examination or completing a multiple-choice test will not lead to such learning; working on a project over a period of time, engaging in field work, or producing a carefully constructed essay can well be learning processes which also result in products that provide information about the student’s knowledge and skills. Assessment that engages students in the process of learning needs to involve tasks for which responses need to be created by the student, rather than responses that fall to hand from existing texts, a quick search of the Internet, or from the work of fellow students. The second and most commonly referred to way in which assessment can promote learning is through feedback, where feedback is defined as a process of identifying gaps between actual and desired performance, noting ways of bridging those gaps, and then having students take action to bridge the gaps. Feedback thus conceived is a moderately complex process of learning which is often difficult to enact to the satisfaction of students or teachers. The third way in which assessment can support learning is through the development of students’ capacity to evaluate the quality of their own work while they are undertaking assessment tasks, a process most clearly articulated by Sadler (particularly influentially in Sadler (1989) and interestingly extended in this book) and Boud (2007 and Chapter 3 of this book). Assessment thus functions to help students learn about assessment itself, particularly as it pertains to their own work, since evaluating and improving one’s work becomes an essential requirement of their successful future practice. A fourth way involves the use of assessment results to inform teaching, and thus, indirectly, to improve student learning. While this is a function of assessment that is often acknowledged in higher education, how it can be operationalised is rarely addressed.
1 Introduction: Refocusing Assessment
3
Assessment and Judgement Assessment as judging achievement draws attention to the nature of assessment as the exercise of professional judgment, standing in contrast to misplaced notions of assessment as measurement. As Peter Knight pointed out, ‘‘measurement theory expects real and stable objects, not fluctuating and contested social constructs’’ (Knight, 2007a, p. 77). One of the contributors to this book, David Boud, previously has argued persuasively that assessment is best reframed in terms of informing judgement, that is, ‘‘the capacity to evaluate evidence, appraise situations and circumstances astutely, to draw sound conclusions and act in accordance with this analysis’’ (Boud, 2007, p. 19). Elsewhere Knight (2007b) noted a set of increasingly important learning outcomes that not only defy measurement but are extraordinarily difficult to judge, namely the wicked competencies associated with graduate attributes or generic learning outcomes. The process of making judgements, and the bases on which they are made, are often considered to be unproblematic: experienced teachers apply predetermined criteria based on required course outcomes to allocate marks and determine grades. This treatment of student work is strongly challenged in this book, particularly by Sadler when he addresses the nature of expert judgement as essentially holistic. When assessment’s purpose is defined as judging achievement, the achievement referred to is usually internally referent, that is, it is concerned with performance associated with the content and goals of the university course, often defined as learning objectives or intended learning outcomes. With this internal reference point, the responsibility of the academic to his or her department and its programmes is to the fore. Assessment as maintaining professional or disciplinary standards moves the focus of assessment outside the course and locates it in the world of professional or disciplinary practice. When assessment is construed as a means of protecting and promoting practice standards, assessment methods need to move towards the forms of ongoing assessment that are found in the workplace. This is a challenge for many university teachers, since it locates assessment beyond the sphere of academia. Where a course has a significant practical placement, this step may be only slightly less difficult. With a focus on professional standards, the identification of the academic with his or her discipline or profession is to the fore, while responsibility to the institution moves to the background. The maintenance of standards also raises questions about the quality standards of assessment practices themselves. Employers and professional bodies, as well as universities, need confidence in the abilities of graduates as certified by their awards. When assessment has been conceived of as an act of measurement with an almost exclusive purpose of determining student achievement, traditional
4
G. Joughin
standards of testing and measurement have applied. High stakes assessment practices in the final year or years of high school often find their way into university, with standards for reliability, validity and fairness, such as the Standards for Educational and Psychological Testing developed by the American Educational Research Association, the American Psychological Association and the National Council on Measurement in Education (1999) being seen as essential if assessment is to be credible to its various stakeholders. Such standards have been developed within a measurement paradigm, with an almost exclusive emphasis on measuring achievement such that not only is learning not seen as a function of assessment, but little or no concession is made to the consequential effects of assessment on how students go about learning. Assessment practices designed to promote learning, while seeking to balance assessment’s learning and judging functions, call not for the simple abandonment of traditional standards from the measurement paradigm but for the rethinking of standards that address assessment’s multiple functions in a more compelling way.
Assessment and Change We talk and write about assessment because we seek change, yet the process of change is rarely addressed in work on assessment. Projects to improve assessment in universities will typically include references to dissemination or embedding of project outcomes, yet real improvements are often limited to the work of enthusiastic individuals or to pioneering courses with energetic and highly motivated staff. Examples of large scale successful initiatives in assessment are difficult to find. On the other hand, the exemplary assessment processes of Alverno College continue to inspire (and senior Alverno staff have contributed to this book) and the developing conceptualisation of universities as complex adaptive systems promises new ways of understanding and bringing about realistic change across a university system. Assessment is far from being a technical matter, and the improvement of assessment requires more than a redevelopment of curricula. Assessment occurs in the context of complex intra- and interpersonal factors, including teachers’ conceptions of and approaches to teaching; students’ conceptions of and approaches to learning; students’ and teachers’ past experiences of assessment; and the varying conceptions of assessment held by teachers and students alike. Assessment is also impacted by students’ varying motives for entering higher education and the varying demands and priorities of teachers who typically juggle research, administration and teaching within increasingly demanding workloads. Pervading assessment are issues of power and complex social relationships.
1 Introduction: Refocusing Assessment
5
Re-Focusing Assessment This book does not propose a radical re-thinking of assessment, but it does call for a re-focusing on educationally critical aspects of assessment in response to the problematic issues raised across these themes by the contributors to this book. The following propositions which emerge from the book’s various chapters illustrate the nature and scope of this refocusing task:
Frequently cited research linking assessment and learning is often misunder
stood and may not provide the sure foundations for action that are frequently claimed. Assessment is principally a matter of judgement, not measurement. The locus of assessment should lie beyond the course and in the world of practice. Criteria-based approaches can distort judgement; the use of holistic approaches to expert judgement needs to be reconsidered. Students need to become responsible for more aspects of their own assessment, from evaluating and improving assignments to making claims for their acquisition of complex generic skills across their whole degree. Traditional standards of validity, reliability and fairness break down in the context of new modes of assessment that support learning; new approaches to quality standards for assessment are required. Plagiarism is a learning issue; assessment tasks that can be completed without active engagement in learning cannot demonstrate that learning has occurred. Students entering higher education come from distinctive and powerful learning and assessment cultures in their prior schools and colleges. These need to be acknowledged and accommodated as students are introduced to the new learning and assessment cultures of higher education. Universities are complex adaptive systems and need to be understood as such if change is to be effective. Moreover, large scale change depends on generating consensus on principles rather than prescribing specific practices.
The Structure of the Book The themes noted above of assessment, learning, judgement, and the imperatives for change associated with them, are addressed in different ways throughout the book, sometimes directly and sometimes through related issues. Moreover, these themes are closely intertwined, so that, while some chapters of this book focus on a single theme, many chapters address several of them, often in quite complex ways. In Chapter 2, Joughin provides a short review of four matters that are at the heart of our understanding of assessment, learning and judgement and which for various reasons are seen as highly problematic. The first of these concerns how assessment is often defined in ways that either under-represent or conflate
6
G. Joughin
the construct, leading to the suggestion that a simple definition of assessment may facilitate our understanding of assessment itself and its relationship to other constructs. The chapter then addresses the empirical basis of two propositions about assessment and learning that have become almost articles of belief in assessment literature: that assessment drives student learning and that students’ approaches to learning can be improved by changing assessment. Finally, research on students’ experience of feedback is contrasted with the prominence accorded to feedback in learning and assessment theory. Three chapters – those of Boud, Sadler and Yorke – address problematic issues of judgement and the role of students in developing the capacity to evaluate their own work. Boud’s chapter, How Can Practice Reshape Assessment, begins this theme. Boud seeks to move our point of reference for assessment from the world of teaching and courses to the world of work where assessment is part of dayto-day practice as we make judgements about the quality of our own work, typically in a collegial context where our judgements are validated. Boud argues that focusing on what our graduates do once they commence work, specifically the kinds of judgements they make and the contexts in which these occur, can inform how assessment is practised within the educational institution. Boud’s contribution is far more that a plea for authentic assessment. Firstly, it represents a new perspective on the purpose of assessment. As noted earlier in this chapter, assessment is usually considered to have three primary functions – to judge achievement against (usually preset) educational standards; to promote learning; and to maintain standards for entry into professional and other fields of practice – as well as a number of subsidiary ones. Boud foregrounds another purpose, namely to develop students’ future capacity to assess their own work beyond graduation. We are well accustomed to the notion of learning to learn within courses as a basis for lifelong learning in the workplace and beyond. Boud has extended this in a remarkable contribution to our understanding. Along the way he provides useful insights into the nature of practice, argues for conceptualising assessment as informing judgement, and highlights apprenticeship as a prototype of the integration of assessment and feedback in daily work. He concludes with a challenge to address 10 assessment issues arising from the practice perspective. Sadler, in his chapter on Transforming Holistic Assessment and Grading into a Vehicle for Complex Learning, issues a provocative but timely challenge to one of the current orthodoxies of assessment, namely the use of pre-determined criteria to evaluate the quality of students’ work. Sadler brings a strong sense of history to bear, tracing the progressive development of analytic approaches to Edmund Burke in 1759, noting literature in many fields, and specifically educational work over at least forty years. This chapter represents a significant extension of Sadler’s previous seminal work on assessment (see especially Sadler, 1983, 1989, the later being one of the most frequently cited works on formative assessment). In his earlier work, Sadler established the need for
1 Introduction: Refocusing Assessment
7
students to develop a capacity to judge the quality of their work similar to that of their teacher, using this capacity as a basis to improve their work while it is under development. Sadler’s support for this parallels Boud’s argument regarding the need for practice to inform assessment, namely that this attribute is essential for students to perform adequately in the world of their future work. Sadler builds on his earlier argument that this entails students developing a conception of quality as a generalized attribute, since this is how experts, including experienced teachers, make judgements involving multiple criteria. Consequently he argues here that students need to learn to judge work holistically rather than analytically. This argument is based in part on the limitations of analytic approaches, in part on the nature of expert judgement, and it certainly requires students to see their work as an integrated whole. Fortunately Sadler not only argues the case for students’ learning to make holistic judgements but presents a detailed proposal for how this can be facilitated. Yorke’s chapter, Faulty Signals? Inadequacies of Grading Systems and a Possible Response, is a provocative critique of one of the bases of assessment practices around the world, namely the use of grades to summarise student achievement in assessment tasks. Yorke raises important questions about the nature of judgement involved in assessment, the nature of the inferences made on the basis of this judgement, the distortion which occurs when these inferences come to be expressed in terms of grades, and the limited capacity of grades to convey to others the complexity of student achievement. Yorke points to a number of problems with grades, including the basing of grades on students’ performance in relation to a selective sampling of the curriculum, the wide disparity across disciplines and universities in types of grading scales and how they are used, the often marked differences in distributions of grades under criterion- and norm-referenced regimes, and the questionable legitimacy of combining grades derived from disparate types of assessment and where marks are wrongly treated as lying on an interval scale. Grading becomes even more problematic when what Knight (2007a, b) termed wicked competencies are involved – when real-life problems are the focus of assessment and generic abilities or broad-based learning outcomes are being demonstrated. Yorke proposes a radical solution – making students responsible for preparing a claim for their award based on evidence of their achievements from a variety of sources, including achievement in various curricular components, work placements and other learning experiences. Such a process would inevitably make students active players in learning to judge their own achievements and regulate their own learning accordingly. When assessment practices are designed to promote learning, certain consequences ensue. Dochy, Carroll and Suskie examine three of these. Dochy, in The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects, notes that when assessment focuses on learning and moves away from a measurement paradigm, the criteria used to evaluate the quality of the assessment practices need to change. Carroll links learning directly to the need to design assessment that will deter plagiarism. Suskie describes how the careful
8
G. Joughin
analysis of assessment outcomes can be a powerful tool for further learning through improved teaching. Dochy begins his chapter by overviewing the negative effects that traditional forms of high stakes assessment can have on students’ motivation and selfesteem, on teachers’ professionalism, and, perhaps most importantly on limiting the kinds of learning experiences to which students are exposed. He quickly moves on to describe the current assessment culture in higher education which he depicts as strongly learning oriented, emphasizing the integration of assessment and instruction and using assessment as a tool for learning. This is a culture in which students are critically involved, sharing responsibility for such things as the development of assessment criteria. Most importantly for Dochy is the basing of new modes of assessment on complex, real life problems or authentic replications of such problems. Such forms of assessment challenge traditional ways of characterising the quality of assessment in terms of validity and reliability, so that these constructs need to be reframed. He argues that classic reliability theory should be replaced by ‘generalisability theory’ which is concerned with how far judgements of competence in relation to one task can be generalised to other tasks. Dochy draws on Messick’s well known conceptualization of validity in terms of content, substantive, structural, external, generalizability, and consequential aspects, interpreting and expanding these by considering transparency, fairness, cognitive complexity, authenticity, and directness of assessment based on immediate holistic judgements. It is this reconceptualisation of criteria for evaluating new modes of assessment that leads to Dochy’s emphasis on the edumetric rather than the psychometric qualities of assessment. Carroll’s chapter on Plagiarism as a Threat to Learning: An Educational Response explores the relationship between plagiarism and learning, arguing from a constructivist perspective the need for students to transform, use or apply information, rather than reproduce information, if learning is to occur – plagiarised work fails to show that students have done this work of making meaning and consequently indicates that the process of learning has been bypassed. Interestingly, Carroll links students’ capacity to understand the problem with plagiarism to Perry’s stages of intellectual and ethical development (Perry, 1970), with students at a dualistic stage having particular difficulty in seeing the need to make ideas their own rather than simply repeating the authoritative words of others, while students at the relativistic stage are more likely to appreciate the nature of others’ work and the requirements for dealing with this according to accepted academic principles. The chapter concludes with a series of questions to be addressed in setting assessment tasks that will tend to engage students in productive learning processes while also inhibiting them from simply finding an answer somewhere and thus avoiding learning. In her chapter on Using Assessment Results to Inform Teaching Practice and Promote Lasting Learning, Suskie lists thirteen conditions under which students learn most effectively. These then become the basis for elucidating the ways in which assessment can promote what she terms ‘‘deep, lasting learning’’. Given
1 Introduction: Refocusing Assessment
9
the limitations to assessment as a means of positively influencing students’ approaches to learning noted in Chapter 1, Suskie’s location of assessment principles in broader pedagogical considerations is an important qualification. The use of assessment to improve teaching is often suggested as an important function of assessment, but how this can occur in practice is rarely explained. Suskie suggests that a useful starting point is to identify the decisions about teaching that an analysis of assessment results might inform, then clarify the frame of reference within which this analysis will occur. The latter could be based on identifying students’ strengths and weaknesses, the improvement in results over the period of the class or course, how current students’ performance compares with previous cohorts, or how current results compare to a given set of standards. Suskie then elucidates a range of practical methods for summarising and then interpreting results in ways that will suggest foci for teacher action. While most of the chapters in this book are strongly student oriented, Ecclestone’s chapter provides a particularly sharp focus on students’ experience of assessment before they enter university and the consequences this may have for their engagement in university study and assessment. Ecclestone notes that how students in higher education respond to assessment practices explicitly aimed at encouraging engagement with learning and developing motivation will be influenced by the learning cultures they have experienced before entering higher education. Consequently university teachers seeking to use assessment to promote learning need to understand the expectations, attitudes and practices regarding assessment that students bring with them based on their experience of assessment and learning in schools or in the vocational education sector. Ecclestone’s chapter provides critical insights into motivation and the concept of a learning culture before exploring the learning cultures of two vocational courses, with emphases on formative assessment and motivation. The research reported by Ecclestone supports the distinction between the spirit and the letter of assessment for learning, with the former encouraging learner engagement and what Eccelstone terms sustainable learning, while the latter is associated with a more constrained and instrumental approach to learning. Two chapters deal with institution level change – Riordan and Loacker through an analysis of system level processes at Alverno College, and Macdonald and Joughin through proposing a particular conceptual understanding of university systems. While Yorke raises serious issues regarding the assessment of complex learning outcomes, Riordan and Loacker approach this without reservation in their chapter on Collaborative and Systemic Assessment of Student Learning: From Principles to Practice. Perhaps buoyed by a long experience of successfully developing an approach to education grounded in Alverno’s ability-based curriculum, they present a set of six tightly integrated principles that have come to underpin the Alverno approach to assessment. Some of these principles are clearly articulated statements that reinforce the assertions of other contributors to this book. Thus, along with Sadler and Boud, they emphasize the
10
G. Joughin
importance of self-assessment at college as a precursor to self-assessment as a practicing graduate. And like Dochy, they emphasize the importance of performance assessment based on the kinds of contexts students will face later in their working lives. However, in contrast to Sadler, they locate criteria at the heart of the process of feedback, using it to help students develop their sense of their learning in action and to develop a language for the discussion of their performance. Each of the preceding chapters has proposed change. In suggesting new ways of thinking about assessment, learning and judgement, they have also argued for new practices. In most cases the changes advocated are not incremental but rather call for significant changes at the level of programmes, institutions or even more broadly across the higher education sector. Macdonald and Joughin’s chapter on Changing Assessment in Higher Education: A Model in Support of Institutional-wide Improvement consequently addresses change in the context of universities as complex adaptive systems, drawing on the insights of organisational and systems theorists to propose a model of universities that will contribute to the support of institution-wide change in assessment, based on participation, conversation, unpredictability, uncertainty and paradox. The concluding chapter outlines a series of progressions in our thinking about assessment which arise from the preceding chapters. These progressions in turn give rise to a set of challenges, each far from trivial, which could well set a constructive agenda for theorizing, researching, and acting to improve assessment. This agenda would include the ongoing exploration of central tenets of assessment; renewing research into taken-for-granted beliefs about the influence of assessment on learning; realigning students’ and teachers’ roles in assessment; and applying emerging understandings of universities as complex adaptive systems to the process of improving assessment across our higher education institutions.
References American Educational Research Association, the American Psychological Association, & the National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington DC: American Educational Research Association. Angelo, T. (1999). Doing assessment as if learning matters most. AAHE Bulletin, 51(9), 3–6. Boud, D. (2007). Reframing assessment as if learning were important. In D. Boud & N. Falchikov (Eds), Rethinking assessment for higher education: Learning for the longer term (pp. 14–25). London: Routledge. Brown, G., Bull, J., & Pendlebury, M. (1997). Assessing student learning in higher education. London: Routledge. Carless, D., Joughin, G., Liu, N.F., & Associates (2006). How assessment supports learning: Learning-oriented assessment in action. Hong Kong: Hong Kong University Press. Gibbs, G., & Simpson, C. (2004–5). Conditions under which assessment supports students’ learning. Learning and Teaching in Higher Education, 1, 3–31.
1 Introduction: Refocusing Assessment
11
Joughin, G. (2004, November). Learning oriented assessment: A conceptual framework. Paper presented at Effective Teaching and Learning Conference, Brisbane, Australia. Joughin, G., & Macdonald, R. (2004). A model of assessment in higher education institutions. The Higher Education Academy. Retrieved September 11, 2007, from http://www. heacademy.ac.uk/resources/detail/id588_model_of_assessment_in_heis Knight, P. (2007a). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.), Rethinking assessment in higher education (pp. 72–86). Abingdon and New York: Routledge. Knight, P. (2007b). Fostering and assessing ‘wicked’ competencies. The Open University. Retrieved November 5, 2007, from http://www.open.ac.uk/cetl-workspace/cetlcontent/ documents/460d1d1481d0f.pdf Perry, W. (1970). Forms of intellectual and ethical development in the college years: A scheme. New York: Holt. Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 1, 1–25.
Chapter 2
Assessment, Learning and Judgement in Higher Education: A Critical Review Gordon Joughin
Introduction The literature on assessment in higher education is now so vast that a comprehensive review of it would be ambitious under any circumstances. The goals of this chapter are therefore modest: it offers a brief review of four central concepts regarding assessment, learning and judgement that are considered problematic because they are subject to conceptual confusion, lack the clear empirical support which is often attributed to them, or give rise to contradictions between their theoretical explication and actual practice. The first of these concepts concerns how assessment is defined – the chapter thus begins by proposing a reversion to a simple definition of assessment and noting the relationships between assessment and learning in light of this definition. The second concept concerns the axiom that assessment drives student learning. The third concept concerns the widely held view that students’ approaches to learning can be improved by changing assessment. The fourth and final matter addressed is the contradiction between the prominence of feedback in theories of learning and its relatively impoverished application in practice.
Towards a Simple Definition of Assessment Any discussion of assessment in higher education is complicated by two factors. Firstly, assessment is a term that is replete with emotional associations, so that any mention of assessment can quickly become clouded with reactions that inhibit profitable discussion. The statement that ‘‘assessment often engenders strong emotions’’ (Carless, Joughin, Liu & Associates, 2006, p. 2) finds particular support in the work of Boud who has pointed out that ‘‘assessment probably provokes more anxiety among students and irritation among staff than any other G. Joughin Centre for Educational Development and Interactive Resources, University of Wollongong, NSW 2522, Australia e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_2, Ó Springer ScienceþBusiness Media B.V. 2009
13
14
G. Joughin
feature of higher education’’ (Boud, 2006, p. xvii), while Falchikov and Boud’s recent work on assessment and emotion has provided graphic descriptions of the impact of prior assessment experiences on a group of teachers completing a masters degree in adult education (Falchikov & Boud, 2007). In biographical essays, these students reported few instances of positive experiences of assessment in their past education, but many examples of bad experiences which had major impacts on their learning and self-esteem. Teachers in higher education will be affected not only by their own past experience of assessment as students, but also by their perceptions of assessment as part of their current teaching role. Assessment in the latter case is often associated with high marking loads, anxietyinducing deadlines as examiners board meetings approach, and the stress of dealing with disappointed and sometimes irate students. There is a second difficulty in coming to a clear understanding of assessment. As Rowntree has noted, ‘‘it is easy to get fixated on the trappings and outcomes of assessment – the tests and exams, the questions and marking criteria, the grades and degree results – and lose sight of what lies at the heart of it all’’ (Rowntree, 2007, para. 6). Thus at an individual, course team, department or even university level, we come to see assessment in terms of our own immediate contexts, including how we view the purpose of assessment, the roles we and our students play, institutional requirements, and particular strategies that have become part of our taken-for-granted practice. Our understanding of assessment, and thus how we come to define it, may be skewed by our particular contexts. Space does not permit the exploration of this through a survey of the literature and usage of the term across universities. Two definitions will suffice to illustrate the issue and highlight the need to move towards a simple definition of assessment. Firstly, an example from a university policy document. The University of Queensland, Australia, in its statement of assessment policy and practice, defines assessment thus: Assessment means work (e.g., examination, assignment, practical, performance) that a student is required to complete for any one or a combination of the following reasons: the fulfillment of educational purposes (for example, to motivate learning, to provide feedback); to provide a basis for an official record of achievement or certification of competence; and/or to permit grading of the student. (The University of Queensland, 2007)
Here we might note that (a) assessment is equated with the student’s work; (b) there is no reference to the role of the assessor or to what is done in relation to the work; and (c) the various purposes of assessment are incorporated into its definition. Secondly, a definition from a frequently cited text. Rowntree defines assessment thus: ... assessment can be thought of as occurring whenever one person, in some kind of interaction, direct or indirect, with another, is conscious of obtaining and interpreting information about the knowledge and understanding, or abilities and attitudes of that other person. To some extent or other it is an attempt to know that person. (Rowntree, 1987, p. 4)
2 Assessment, Learning and Judgement in Higher Education
15
In this instance we might note the reference to ‘‘interaction’’ with its implied reciprocity, and that assessment is depicted as the act of one person in relation to another, thereby unintentionally excluding self-assessment and the student’s monitoring of the quality of his or her own work. Both of these definitions have been developed thoughtfully, reflect important aspects of the assessment process, and have been rightly influential in their respective contexts of institutional policy and international scholarship. Each, however, is problematic as a general definition of assessment, either by going beyond the meaning of assessment per se, or by not going far enough to specify the nature of the act of assessment. Like many, perhaps most, definitions of assessment, they incorporate or omit elements in a way that reflects particular contextual perspectives, giving rise to the following question: Is it possible to posit a definition of assessment that encapsulates the essential components of assessment without introducing superfluous or ancillary concepts? The remainder of this section attempts to do this. One difficulty with assessment as a term in educational contexts is that its usage often departs from how the term is understood in everyday usage. The Oxford English Dictionary (2002-) is instructive here through its location of educational assessment within the broader usage of the term. Thus it defines ‘‘to assess’’ as ‘‘to evaluate (a person or thing); to estimate (the quality, value, or extent of), to gauge or judge’’ and it defines assessment in education as ‘‘the process or means of evaluating academic work’’. These two definitions neatly encompass two principal models of assessment: where assessment is conceived of quantitatively in terms of ‘‘gauging’’ the ‘‘extent of’’ learning, assessment follows a measurement model; where it is construed in terms of ‘‘evaluation’’, ‘‘quality’’, and ‘‘judgement’’, it follows a judgement model. Hager and Butler (1996) have explicated the distinctions between these paradigms very clearly (see also Boud in Chapter 3). They describe a scientific measurement model in which knowledge is seen as objective and context-free and in which assessment tests well-established knowledge that stands apart from practice. In this measurement model, assessment utilises closed problems with definite answers. In contrast, the judgement model integrates theory and practice, sees knowledge as provisional, subjective and contextdependent, and uses practice-like assessment which includes open problems with indefinite answers. Knight (2007) more recently highlighted the importance of the distinction between measurement and judgement by pointing out the common mistake of applying measurement to achievements that are not, in an epistemological sense, measurable and noting that different kinds of judgement are required once we move beyond the simplest forms of knowledge. Boud (2007) has taken the further step of arguing for assessment that not merely applies judgement to students’ work but serves actively to inform students’ own judgement of their work, a skill seen to be essential in their future practice. Assessment as judgement therefore seems to be at the core of assessment, and its immediate object is a student’s work. However, one further step seems
16
G. Joughin
needed. Is assessment merely about particular pieces of work or does the object of assessment go beyond the work? Two other definitions are instructive. Firstly, in the highly influential work of the Committee on the Foundations of Assessment, Knowing What Students Know: The Science and Design of Educational Assessment, assessment is defined as ‘‘a process by which educators use students’ responses to specially created or naturally occurring stimuli to draw inferences about the students’ knowledge and skills’’ (Committee on the Foundations of Assessment, 2001, p. 20). Secondly, Sadler (in a private communication) has incorporated these elements in a simple, three-stage definition: ‘‘The act of assessment consists of appraising the quality of what students have done in response to a set task so that we can infer what students can do1, from which we can draw an inference about what students know.’’ From these definitions, the irreducible core of assessment can be limited to (a) students’ work, (b) judgements about the quality of this work, and (c) inferences drawn from this about what students know. Judgement and inference are thus at the core of assessment, leading to this simple definition: To assess is to make judgements about students’ work, inferring from this what they have the capacity to do in the assessed domain, and thus what they know, value, or are capable of doing. This definition does not assume the purpose(s) of assessment, who assesses, when assessment occurs or how it is done. It does, however, provide a basis for considering these matters clearly and aids the discussion of the relationship between assessment and learning.
Assessment, Judgement and Learning What then is the relationship between assessment, understood as judgment about student work and consequent inferences about what they know, and the process of learning? The following three propositions encapsulate some central beliefs about this relationship as expressed repeatedly in the literature on assessment and learning over the past forty years:
What students focus on in their study is driven by the work they believe they will be required to produce.
Students’ adapt their approaches to learning to meet assessment requirements, so that assessment tasks can be designed to encourage deep approaches to learning. Students can use judgements about their work to improve their consequent learning. The following three sections address each of these propositions in turn. 1
Since what they have done is just one of many possible responses, and the task itself was also just one of many possible tasks that could have been set.
2 Assessment, Learning and Judgement in Higher Education
17
Assessment as Driving Student Learning The belief that students focus their study on what they believe will be assessed is embedded in the literature of teaching and learning in higher education. Derek Rowntree begins his frequently cited text on assessment in higher education by stating that ‘‘if we wish to discover the truth about an educational system, we must look into its assessment procedures’’ (Rowntree, 1977, p. 1). This dictum has been echoed by some of the most prominent writers on assessment in recent times. Ramsden, for example, states that ‘‘from our students’ point of view, assessment always defines the actual curriculum’’ (Ramsden, 2003, p. 182); Biggs notes that ‘‘students learn what they think they will be tested on’’ (Biggs, 2003, p. 182); Gibbs asserts that ‘‘assessment frames learning, creates learning activity and orients all aspects of learning behaviour’’ (Gibbs, 2006, p. 23); and Bryan and Clegg begin their work on innovative assessment in higher education with the unqualified statement that ‘‘research of the last twenty years provides evidence that students adopt strategic, cue-seeking tactics in relation to assessed work’’ (Bryan & Clegg, 2006, p. 1). Three works from the late 1960s and early 1970s are regularly cited to support this view that assessment determines the direction of student learning: Miller and Parlett’s Up to the Mark: A Study of the Examination Game (1974), Snyder’s The Hidden Curriculum (1971) and Becker, Geer and Hughes’ Making the Grade (1968; 1995). So influential has this view become, and so frequently are these works cited to support it, that a revisiting of these studies seems essential if we are to understand this position correctly. Up to the Mark: A Study of the Examination Game is well known for introducing the terms cue-conscious and cue-seeking into the higher education vocabulary. Cue-conscious students, in the study reported in this book, ‘‘talked about a need to be perceptive and receptive to ‘cues’ sent out by staff – things like picking up hints about exam topics, noticing which aspects of the subject the staff favoured, noticing whether they were making a good impression in a tutorial and so on’’ (Miller & Parlett, 1974, p. 52). Cue-seekers took a more active approach, seeking out staff about exam questions and discovering the interests of their oral examiners. The third group in this study, termed cue-deaf, simply worked hard to succeed without seeking hints on examinations. While the terms ‘‘cue-conscious’’ and ‘‘cue-seeking’’ have entered our collective consciousness, we may be less aware of other aspects of Miller and Parlett’s study. Firstly, the study was based on a very particular kind of student and context – final year honours students in a single department of a Scottish university. Secondly, the study’s sample was small – 30. Thirdly, and perhaps most importantly, the study’s quantitative findings do not support the conclusion that students’ learning is dominated by assessment: only five of the sample were categorized as cue-seekers, 11 were cue conscious, while just under half (14) of the sample, and therefore constituting the largest group of students, were cue-deaf.
18
G. Joughin
One point of departure for Miller and Parlett’s study was Snyder’s equally influential study, The Hidden Curriculum, based on first-year students at the Massachusetts Institute of Technology and Wellesley College. Snyder concluded that students were dominated by the desire to achieve high grades, and that ‘‘each student figures out what is actually expected as opposed to what is formally required’’ (Snyder, 1971, p. 9). In short, Snyder concluded that all students in his study were cue-seekers. Given Miller and Parlett’s finding that only some students were cue seekers while more were cue conscious and even more were cue deaf, Snyder’s conclusion that all students adopted the same stance regarding assessment seems unlikely. The credibility of this conclusion also suffers in light of what we now know about variation in how students can view and respond to the same context, making the singular pursuit of grades as a universal phenomenon unconvincing. Snyder’s methodology is not articulated, but it does not seem to have included a search for counter-examples. The third classic work in this vein is Making the Grade (Becker, Geer & Hughes, 1968; 1995), a participant observation study of students at the University of Kansas. The authors described student life as characterized by ‘‘the grade point perspective’’ according to which ‘‘grades are the major institutionalized valuable of the campus’’ (1995, p. 55) and thus the focus of attention for almost all students. Unlike Snyder, these researchers did seek contrary evidence, though with little success – only a small minority of students ignored the grade point perspective and valued learning for its own sake. However, the authors are adamant that this perspective represents an institutionalization process characteristic of the particular university studied and that they would not expect all campuses to be like it. Moreover, they state that ‘‘on a campus where something other than grades was the major institutionalized form of value, we would not expect the GPA perspective to exist at all’’ (Becker, Geer & Hughes, 1995, p. 122). Let us be clear ... about what we mean. We do not mean that all students invariably undertake the actions we have described, or that students never have any other motive than getting good grades. We particularly do not mean that the perspective is the only possible way for students to deal with the academic side of campus life. (Becker, Geer & Hughes, 1995, p. 121)
Yet many who have cited this study have ignored this emphatic qualification by its authors, preferring to cite this work in support of the proposition that students in all contexts, at all times, are driven in their academic endeavours by assessment tasks and the desire to achieve well in them. When Gibbs (2006, p. 23) states that ‘‘students are strategic as never before’’, we should be aware (as Gibbs himself is) that students could be barely more strategic than Miller and Parlett’s cue seekers, Snyder’s freshmen in hot pursuit of grades, or Becker, Geer and Hughes’ students seeking to optimize their GPAs. However, we must also be acutely aware that we do not know the extent of this behaviour, the forms it may take in different contexts, or the extent to
2 Assessment, Learning and Judgement in Higher Education
19
which these findings can be applied across cultures, disciplines and the thirty or more years since they were first articulated. One strand of research has not simply accepted the studies cited in the previous section but has sought to extend this research through detailed empirical investigation. This work has been reported by Gibbs and his associates in relation to what they have termed ‘‘the conditions under which assessment supports student learning’’ (Gibbs & Simpson, 2004). The Assessment Experience Questionnaire (AEQ) developed as part of their research is designed to measure, amongst other things, the consistency of student effort over a semester and whether assessment serves to focus students’ attention on particular parts of the syllabus (Gibbs & Simpson, 2003). Initial testing of the AEQ (Dunbar-Goddett, Gibbs, Law & Rust, 2006; Gibbs, 2006) supports the proposition that assessment does influence students’ distribution of effort and their coverage of the syllabus, though the strength of this influence and whether it applies to some students more than to others is unclear. One research study, using a Chinese version of the AEQ and involving 108 education students in Hong Kong produced contradictory findings, leaving it unclear whether students tended to believe that assessment allowed them to be selective in what they studied or required them to cover the entire syllabus (Joughin, 2006). For example, while 58% of the students surveyed agreed or strongly agreed with the statement that ‘‘it was possible to be quite strategic about which topics you could afford not to study’’, 62% agreed or strongly agreed with the apparently contradictory position that ‘‘you had to study the entire syllabus to do well in the assessment’’. While the AEQ-based research and associated studies are still embryonic and not widely reported, they do promise to offer useful insights into how students’ patterns and foci of study develop in light of their assessment. At present, however, findings are equivocal and do not permit us to make blanket statements – the influence of assessment on students’ study patterns and foci may well vary significantly from student to student.
Assessment and Students’ Approaches to Learning Numerous writers have asserted that assessment not only determines what students focus on in their learning, but that it exercises a determinative influence on whether students adopt a deep approach to learning in which they seek to understand the underlying meaning of what they are studying, or a surface approach based on becoming able to reproduce what they are studying without necessarily understanding it (Marton & Sa¨ljo, ¨ 1997). Nightingale and her colleagues expressed this view unambiguously when they claimed that ‘‘student learning research has repeatedly demonstrated the impact of assessment on students’ approaches to learning’’ (Nightingale & O’Neil, 1994, quoted by Nightingale et al., 1996, p. 6), while numerous authors have cited with approval Elton and Laurillard’s aphorism that ‘‘the quickest way to change student
20
G. Joughin
learning is to change the assessment system’’ (Elton & Laurillard, 1979, p. 100). Boud, as noted previously, took a more circumspect view, stating that ‘‘Assessment activities . . . influence approaches to learning that students take’’ (Boud, 2006, p. 21, emphasis added), rather than determining such approaches. However, it remains a widespread view that the process of student learning can be powerfully and positively influenced by assessment. A review of the evidence is called for. The tenor of the case in favour of assessment directing students’ learning processes was set in an early study by Terry who compared how students studied for essays and for ‘‘objective tests’’ (including true/false, multiple choice, completion and simple recall tests). He found that students preparing for objective tests tended to focus on small units of content – words, phrases and sentences – thereby risking what he terms ‘‘the vice of shallowness or superficiality’’ (Terry, 1933, p. 597). On the other hand, students preparing for essaybased tests emphasized the importance of focusing on large units of content – main ideas, summaries and related ideas. He noted that while some students discriminated between test types in their preparation, others reported no difference in how they prepared for essay and objective tests. Meyer (1934) compared what students remembered after studying for an essay-type test and for an examination which included multiple-choice, true/ false and completion questions, concluding that the former led to more complete mastery and that other forms of testing should be used only in exceptional circumstances. He subsequently asked these students to describe how they studied for the type of exam they were expecting, and how this differed from how they would have studied for a different type of exam. He concluded that students expecting an essay exam attempted to develop a general view of the material, while students expecting the other types of exam focused on detail. Students certainly reported being selective in how they studied, rather than simply following a particular way of studying regardless of assessment format (Meyer, 1935). The conclusion to which these two studies inevitably lead is summarized by Meyer as follows: The kind of test to be given, if the students know it in advance, determines in large measure both what and how they study. The behaviour of students in this habitual way places greater powers in the teacher’s hands than many realize. By the selection of suitable types of tests, the teacher can cause large numbers of his students to study, to a considerable extent at least, in the ways he deems best for a given unit of subject-matter. Whether he interests himself in the question or not, most of his students will probably use the methods of study which they consider best adapted to the particular types of tests customarily employed. (Meyer, 1934, pp. 642–643)
This is a powerful conclusion which seems to be shared by many contemporary writers on assessment 70 years later. But has it been supported by subsequent research? There is certainly a strong case to be made that it has not. In the context of the considerable work done in the student approaches to learning tradition, the greatest hope for assessment and learning would be the prospect
2 Assessment, Learning and Judgement in Higher Education
21
of being able to design assessment tasks that will induce students to adopt deep approaches to learning. Thirty years of studies have yielded equivocal results. Early studies by Marton and Sa¨ljo¨ (and summarized in Marton & Sa¨ljo, ¨ 1997) sought to see if students’ approaches to learning could be changed by manipulating assessment tasks. In one study, Marton sought to induce a deep approach by getting students to respond to questions embedded within a text they were reading and which invited them to engage in the sort of internal dialogue associated with a deep approach. The result was the opposite, with students adapting to the task by responding to the questions without engaging deeply with them. In another study, different groups of students were asked different kinds of questions after reading the same text. The set of questions for one group of students focused on facts and listing ideas while those for another group focused on reasoning. The first group adopted a surface approach – the outcome that was expected. In the second group, however, approximately half of the sample interpreted the task as intended while the other half technified the task, responding in a superficial way that they believed would yet meet the requirements. Marton and Sa¨ljo’s ¨ conclusion more than 20 years after their original experiments is a sobering one: It is obviously quite easy to induce a surface approach and enhance the tendency to take a reproductive attitude when learning from texts. However, when attempting to induce a deep approach the difficulties seem quite profound.’’ (Marton & Sa¨ljo, ¨ 1997, p. 53)
More than 30 years after these experiments, Struyven, Dochy, Janssens and Gielen (in press) have come to the same conclusion. In a quantitative study of 790 first year education students subjected to different assessment methods (and, admittedly, different teaching methods), they concluded that ‘‘students’ approaches to learning were not deepened as expected by the student-activating teaching/learning environment, nor by the new assessment methods such as case based evaluation, peer and portfolio assessment’’ (pp. 9–10). They saw this large-scale study, based in genuine teaching contexts, as confirming the experimental findings of Marton and Sa¨ljo¨ (1997) and concluded that ‘‘although it seems relatively easy to influence the approach students adopt when learning, it also appears very difficult’’ (p. 13). One strand of studies appears to challenge this finding, or at least make us think twice about it. Several comparative studies, in contexts of authentic learning in actual subjects, have considered the proposition that different forms of assessment can lead to different kinds of studying. Silvey (1951) compared essay tests and objective tests, finding that students studied general principles for the former and focused on details for the latter. Scouller (1998) found that students were more likely to employ surface strategies when preparing for a multiple choice test than when preparing for an assignment essay. Tang (1992, 1994) found students applied ‘‘low level’’ strategies such as memorization when preparing for a short answer test but both low level and high level strategies were found amongst students preparing for the assignments. Thomas and Bain (1982) compared essays and objective tests, finding
22
G. Joughin
that students tended to use either deep or surface strategies irrespective of assessment type (essay or objective test) though a subsequent study found that ‘‘transformation’’ approaches increased and ‘‘reproductive’’ approaches decreased with a move from multiple-choice exams to open-ended assignments (Thomas & Bain, 1984). Sambell and McDowell (1998) reported that a move from a traditional unseen exam to an open book exam led to a shift towards a deep approach to learning. Finally, in my own study of oral assessment (Joughin, 2007) many students described adopting a deep approach to learning in relation to oral assessment while taking a more reproductive approach to written assignments. The interpretation of the results of this strand of research is equivocal. While the studies noted above could be seen to support the proposition that certain kinds of assessment can tend to induce students to adopt a deep approach, they are also consistent with the conclusion that (a) students who have the capacity and inclination to adopt a deep approach will do so when this is appropriate to the assessment task, but that they can also adopt a surface approach when this seems appropriate, while (b) other students will tend to consistently adopt a surface approach, regardless of the nature of the task. Consequently, the influence of assessment on approaches to learning may not be that more appropriate forms of assessment can induce a deep approach to learning, but rather that inappropriate forms of assessment can induce a surface approach. Thus Haggis (2003) concluded that ‘‘despite frequent claims to the contrary, it may be that it is almost impossible to ‘induce’ a deep approach if it is not ‘already there’’’ (Haggis, 2003, p. 104), adding a degree of pessimism to Ramsden’s earlier conclusion that ‘‘what still remains unclear, however, is how to encourage deep approaches by attention to assessment methods’’ (Ramsden, 1997, p. 204). While this section has highlighted limits to improving learning through changing assessment and contradicted any simplistic notions of inducing students to adopt deep approaches to learning, it nevertheless reinforces the vital importance of designing assessment tasks that call on students to adopt a deep approach. To do otherwise is to impoverish the learning of many students. Certainly where current assessment tasks lend themselves to surface approaches, Elton and Laurillard’s dictum referred to previously remains true: ‘‘the quickest way to change student learning is to change the assessment system’’ (Elton & Laurillard, 1979, p. 100). However the change may apply only to some students, while the learning of others remains largely unaffected.
Improving Learning and Developing Judgement: Contraditions Between Feedback Theory and Practice The final two interactions between assessment and learning noted in the opening section of this chapter are concerned with the use of judgement to shape learning and the development of students’ capacity to judge the quality of their
2 Assessment, Learning and Judgement in Higher Education
23
own work. Feedback is at the centre of both of these processes and has received considerable attention in the assessment literature over the past two decades. There is widespread agreement that effective feedback is central to learning. Feedback figures prominently in innumerable theories of effective teaching. Ramsden (2003) includes appropriate assessment and feedback as one of his six key principles of learning, noting strong research evidence that the quality of feedback is the most salient factor in differentiating between the best and worst courses. Using feedback (whether extrinsic or intrinsic) and reflecting on the goals-action-feedback process are central to the frequently cited ‘‘conversational framework’’ of learning as presented by Laurillard (2002). Rowntree (1987, p. 24) refers to feedback as ‘‘the lifeblood of learning’’. ‘‘Providing feedback about performance’’ constitutes one of Gagne’s equally well known conditions of learning (Gagne, 1985). Black and Wiliam, in their definitive meta-analysis of assessment and classroom learning research (Black & William, 1998), clearly established that feedback can have a powerful effect on learning, though noting that this effect can sometimes be negative and that positive effects depend on the quality of the feedback. Importantly, they followed Sadler’s definition of feedback (Sadler, 1989), noting that feedback only serves a formative function when it indicates how the gap between actual and desired levels of performance can be bridged and leads to some closure of this gap. Their study suggested a number of aspects of feedback associated with learning, including feedback that focuses on the task and not the student. While their study focused on school level studies along with a small number of tertiary level studies, their findings have been widely accepted within the higher education literature. The work of Gibbs and Simpson noted earlier is located exclusively in the context of higher education and nominates seven conditions required for feedback to be effective, including its quantity and timing; its quality in focusing on learning, being linked to the assignment criteria, and being able to be understood by students; and the requirement that students take notice of the feedback and act on it to improve their work and learning (Gibbs & Simpson, 2004). Nicol and Macfarlane-Dick (2006) also posit seven principles of feedback, highlighting feedback as a moderately complex learning process centred on self-regulation. Based on the work of Sadler, Black and Wiliam and others, their principles include encouraging teacher and peer dialogue and self-esteem, along with the expected notions of clarifying the nature of good performance, self-assessment, and opportunities to close the gap between current and desired performance. Most recently, Hounsell and his colleagues (Hounsell, McCune, Hounsell, & Litjens, 2008) have proposed a six-step guidance and feedback loop, based on surveys and interviews of students in first and final year bioscience courses. The loop begins with students’ prior experiences of assessment, moves through preliminary guidance and ongoing clarifications through feedback on performance to supplementary support and feed-forward as enhanced understanding is applied in subsequent work.
24
G. Joughin
If theory attests to the critical role of feedback in learning and recent work has suggested principles for ensuring that this role is made effective, what does empirical research tell us about actual educational practice? Four studies suggest that the provision of feedback in higher education is problematic. Glover and Brown (2006), in a small interview study of science students, reported that students attended to feedback but often did not act on it, usually because feedback was specific to the topic covered and was not relevant to forthcoming work. The authors subsequently analysed feedback provided on 147 written assignments, noting the relative absence of ‘‘feed-forward’’ or suggestions on how to improve and excessive attention to grammar. Chanock (2000) highlighted a more basic problem – that students often simply do not understand their tutors’ comments, a point supported by Ivanicˇ, Clark and Rimmershaw (2000) in their evocatively titled paper, ‘‘What am I supposed to make of this? The messages conveyed to students by tutors’ written comments’’. Higgins, Hartley and Skelton (2002), in an interview and survey study in business and humanities, found that 82% of students surveyed agreed that they paid close attention to feedback, while 80% disagreed with the statement that ‘‘Feedback comments are not that useful’’, though this study did not report if students actually utilized feedback in further work. Like the students in Hyland’s study (Hyland, 2000) these students may have appreciated the feedback they received without actually using it. It appears from the literature and research cited in this section that the conceptualisation of feedback may be considerably in advance of its application, with three kinds of problems being evident: problems that arise from the complexity of feedback as a learning process; ‘‘structural problems’’ related to the timing of feedback, its focus and quality; and what Higgins, Hartley and Skelton (2001, p. 273) refer to as ‘‘issues of power, identity, emotion, discourse and subjectivity’’. Clearly, assertions about the importance of feedback to learning stand in contrast to the findings of empirical research into students’ experience of assessment, raising questions regarding both the theoretical assumptions about the centrality of feedback to learning and the frequent failure to bring feedback effectively into play as part of teaching and learning processes.
Conclusion The literature on assessment and learning is beset with difficulties. Four of these have been noted in this chapter, beginning with problems associated with conflated definitions of assessment. Of greater concern is the reliance by many writers on foundational research that is not fully understood and frequently misinterpreted. Certainly the assertions that assessment drives learning and that students’ approaches to learning can be improved simply by changing assessment methods must be treated cautiously in light of the nuanced research which is often associated with these claims. Finally, the failure of feedback in
2 Assessment, Learning and Judgement in Higher Education
25
practice to perform the pre-eminent role accorded it in formative assessment theory raises concerns about our understanding of learning and feedback’s role within it. Acknowledgment I am grateful to Royce Sadler for his critical reading of the first draft of this chapter and his insightful suggestions for its improvement.
References Becker, H. S., Geer, B., & Hughes, E. C. (1968; 1995). Making the grade: The academic side of college life. New Brunswick: Transaction. Biggs, J. B. (2003). Teaching for quality learning at university (2nd ed.). Maidenhead: Open University Press. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. Boud, D. (2006). Foreword. In C. Bryan & K. Clegg (Eds.), Innovative assessment in higher education (pp. xvii–xix). London and New York: Routledge. Boud, D. (2007). Reframing assessment as if learning were important. In D. Boud & N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer term (pp. 14–25). London and New York: Routledge. Bryan, C., & Clegg, K. (2006). Introduction. In C. Bryan & K. Clegg (Eds.), Innovative assessment in higher education (pp. 1–7). London and New York: Routledge. Carless, D., Joughin, G., Liu, N-F., & Associates (2006). How assessment supports learning: Learning-oriented assessment in action. Hong Kong: Hong Kong University Press. Chanock, K. (2000). ‘Comments on essays: Do students understand what tutors write?’ Teaching in Higher Education, 5(1), 95–105. Committee on the Foundations of Assessment; Pellegrino, J.W., Chudowsky, N., & Glaser, R., (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington: National Academy Press. Dunbar-Goddett, H., Gibbs, G., Law, S., & Rust, C. (2006, August–September). A methodology for evaluating the effects of programme assessment environments on student learning. Paper presented at the Third Biennial Joint Northumbria/EARLI SIG Assessment Conference, Northumbria University. Elton, L., & Laurillard, D. M. (1979). Trends in research on student learning. Studies in Higher Education, 4, 87–102. Falchikov, N., & Boud, D. (2007). Assessment and emotion: the impact of being assessed. In D. Boud, & N. Falchikov, (Eds.), Rethinking Assessment for Higher Education: Learning for the Longer Term (pp. 144–155). London: Routledge. Gagne, R. M. (1985). The conditions of learning and theory of instruction. New York: CBS College Publishing. Gibbs, G. (2006). How assessment frames student learning. In C. Bryan & K. Clegg (Eds.), Innovative assessment in higher education (pp. 23–36). London: Routledge. Gibbs, G., & Simpson, C. (2003, September). Measuring the response of students to assessment: The Assessment Experience Questionnaire. Paper presented at the 11th International Improving Student Learning Symposium, Hinckley. Gibbs, G., & Simpson, C. (2004). Conditions under which assessment supports students’ learning. Learning and Teaching in Higher Education, 1, 3–31. Glover, C., & Brown, E. (2006). Written feedback for students: Too much, too detailed or too incomprehensible to be effective? Bioscience Education ejournal, vol 7. Retrieved 5 November from http://www.bioscience.heacademy.ac.uk/journal/vol7/beej-7-3.htm
26
G. Joughin
Hager, P., & Butler, J. (1996). Two models of educational assessment. Assessment and Evaluation in Higher Education, 21(4), 367–378. Haggis, T. (2003). Constructing images of ourselves? A critical investigation into ‘approaches to learning’ research in higher education. British Educational Research Journal, 29(1), 89–104. Higgins, R., Hartley, P., & Skelton, A. (2001). Getting the message across: The problem of communicating assessment feedback. Teaching in Higher Education, 6(2), 269–274. Higgins, R., Hartley, P., & Skelton, A. (2002). The conscientious consumer: Reconsidering the role of assessment feedback in student learning. Studies in Higher Education, 27(1), 53–64. Hounsell, D., McCune, V., Hounsell, J., & Litjens, J. (2008). The quality of guidance and feedback to students. Higher Education Research and Development, 27(1), 55–67. Hyland, P. (2000). Learning from feedback on assessment. In P. Hyland & A. Booth (Eds.), The practice of university history teaching (pp. 233–247). Manchester, UK: Manchester University Press. Ivanic, R., Clark, R., & Rimmershaw, R. (2000). What am I supposed to make of this? The messages conveyed to students by tutors’ written comments. In M. R. Lea & B. Stierer (Eds.), Student writing in higher education: New contexts (pp. 47–65). Buckingham, UK: SRHE & Open University Press Joughin, G. (2006, August–September). Students’ experience of assessment in Hong Kong higher education: Some cultural considerations. Paper presented at the Third Biennial Joint Northumbria/EARLI SIG Assessment, Northumbria University. Joughin, G. (2007). Student conceptions of oral presentations. Studies in Higher Education, 32(3), 323–336. Knight, P. (2007). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.), Rethinking assessment in higher education (pp. 72–86). Abingdon and New York: Routledge. Laurillard, D. (2002). Rethinking university teaching (2nd ed.). London: Routledge. Marton, F., & Sa¨ljo, ¨ R. (1997). Approaches to learning. In F. Marton, D. Hounsell, & N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 39–58). Edinburgh: Scottish Academic Press. Meyer, G. (1934). An experimental study of the old and new types of examination: 1. The effect of the examination set on memory. The Journal of Educational Psychology, 25, 641–661. Meyer, G. (1935). An experimental study of the old and new types of examination: II. Methods of study. The Journal of Educational Psychology, 26, 30–40. Miller, C. M. L., & Parlett, M. (1974). Up to the mark: A study of the examination game. London: Society for Research into Higher Education. Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199–218. Nightingale, P., & O’Neil, M. (1994). Achieving quality in learning in higher education. London: Kogan Page. Nightingale, P., Wiata, I. T., Toohey, S., Ryan, G., Hughes, C., & Magin, D. (1996). Assessing learning in universities. Sydney, Australia: University of New South Wales Press. Oxford English Dictionary [electronic resource] (2002-). Oxford & New York: Oxford University Press, updated quarterly. Ramsden, P. (1997). The context of learning in academic departments. In F. Marton, D. Hounsell, & N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 198–216). Edinburgh: Scottish Academic Press. Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge. Rowntree, D. (1977). Assessing students: How shall we know them? (1st ed.). London: Kogan Page.
2 Assessment, Learning and Judgement in Higher Education
27
Rowntree, D. (1987). Assessing students: How shall we know them? (2nd ed.). London: Kogan Page. Rowntree, D. (2007). Designing an assessment system. Retrieved 5 November, 2007, from http://iet.open.ac.uk/pp/D.G.F.Rowntree/Assessment.html Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. Sambell, K., & McDowell, L. (1998). The construction of the hidden curriculum. Assessment and Evaluation in Higher Education, 23(4), 391–402. Scouller, K. (1998). The influence of assessment method on students’ learning approaches: Multiple choice question examination versus assignment essay. Higher Education, 35, 453–472. Silvey, G. (1951). Student reaction to the objective and essay test. School and Society, 73, 377–378. Snyder, B. R. (1971). The hidden curriculum. New York: Knopf. Struyven, K., Dochy, F., Janssens, S., & Gielen, S. (in press). On the dynamics of students’ approaches to learning: The effects of the teaching/learning environment. Learning and Instruction. Tang, K. C. C. (1992). Perception of task demand, strategy attributions and student learning. Eighteenth Annual Conference of the Higher Education Research and Development Society of Australasia, Monash University Gippsland Campus, Churchill, Victoria, 474–480. Tang, K. C. C. (1994). Effects of modes of assessment on students’ preparation strategies. In G. Gibbs (Ed.), Improving student learning: Theory and practice (pp. 151–170). Oxford, England: Oxford Centre for Staff Development. Terry, P. W. (1933). How students review for objective and essay tests. The Elementary School Journal, April, 592–603. The University of Queensland. (2007). Assessment Policy and Practices. Retrieved 5 November, 2007, from http://www.uq.edu.au/hupp/index.html?page=25109 Thomas, P. R., & Bain, J. D. (1982). Consistency in learning strategies. Higher Education, 11, 249–259. Thomas, P. R., & Bain, J. D. (1984). Contextual dependence of learning approaches: The effects of assessments. Human Learning, 3, 227–240.
Chapter 3
How Can Practice Reshape Assessment? David Boud
Introduction Assessment in higher education is being challenged by a multiplicity of demands. The activities predominantly used – examinations, assignments and other kinds of tests – have emerged from within an educational tradition lightly influenced by ideas from psychological measurement, but mostly influenced by longstanding cultural practices in the academic disciplines. Assessment in higher education has for a long time been a process influenced more from within the university rather than externally. It has typically been judged in terms of how well it meets the needs of educational institutions for selection and allocation of places in later courses or research study, and whether it satisfies the expectations of the almost totally exclusive academic membership of examination committees. Within courses, it has been judged by how well it meets the needs of those teaching. In more recent times it is judged in terms of how well it addresses the learning outcomes for a course. When we think of assessment as a feature of educational programs and construct it as part of the world of teaching and courses, our points of reference are other courses and assessment that occurs to measure knowledge acquired. Assessment is positioned as part of a world of evaluating individuals in an educational system separated from engagement in the everyday challenges of work. In contrast, in the everyday world of work, assessments are an intrinsic part of dealing with the challenges that any form of work generates. When we learn through our lives, we necessarily engage in assessment. We make judgements about what needs to be done and whether we have done it effectively. While we may do this individually, we also do it with colleagues and others in the situations in which we find ourselves. This occurs in a social context, not in isolation from others. We also make judgements about situations and groups, not just about individuals, and when we make judgements about individuals we do so in a very particular context. Judgements are typically validated as part of D. Boud University Graduate School, University of Technology, Sydney, Australia e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_3, Ó Springer ScienceþBusiness Media B.V. 2009
29
30
D. Boud
a community of practice. Indeed, the community of judgement defines what constitutes good work. Given the increasing focus on a learning outcomes-oriented approach to education, it is useful to examine assessment from a perspective outside the immediate educational enterprise. How can it be deployed to meet the ends rather than the means of education? How can it address the needs of continuing learning? Such a focus on what graduates do when they practise after they complete their courses can enable us to look afresh at assessment within educational institutions to ensure that it is not undermined by short-term and local needs (Boud & Falchikov, 2006). This focus can be gained from looking at practice in the world for which graduates are being prepared. This chapter investigates how an emphasis on practice might be used to examine assessment within higher education. Practice is taken pragmatically here as representing what students and graduates do when they exercise their knowledge, skills and dispositions with respect to problems and issues in the world. For some, this will mean acting as a professional practitioner and engaging in practice with a client or customer, for others it will be the practice of problem analysis and situation improvement in any work or non-work context. Acts of practice are many and varied. It is the intention here to use the perspective that they provide to question how students are prepared for practice, particularly with respect to how assessment operates to enhance and inhibit students’ capacity to engage effectively in practice. However, practice is not an unproblematic concept and what constitutes practice is not self-evident. The chapter therefore starts with an examination of practice and how it is used. It draws in this on theoretical notions of practice that are influencing the literature now. We then move from practice to examine some of the ideas emerging about assessment in higher education that can be linked to this. These include a focus on assessment for learning, the role of the learner and the need to take account of preparing for learning beyond graduation. The argument highlights a view of assessment as informing judgements in contrast to a view of assessment as measuring learning outcomes. Implications for the ways in which assessment priorities in higher education can be reframed are considered throughout.
Practice and Practice Theory What is practice and why might it be a useful prompt for considering assessment? In one sense practice is simply the act of doing something in a particular situation, for example, analysing particular kinds of problems and applying the results to make a change. That is, it is a description of the everyday acts of practitioners, and most graduates find themselves in positions in which they can be regarded as practitioners of one kind or another even if they are not
3 How Can Practice Reshape Assessment?
31
involved in what has traditionally been regarded as professional work. However, analysing problems in the generic sense is not a useful representation of practice. Practice will involve analysing particular kinds of problems in a certain range of contexts. What is important for practice is what is done, by whom, and in what kind of setting. However, practice has come in recent years to be more than a way of referring to acts of various kinds. It is becoming what Schatzki (2001) refers to as best naming the primary social thing. That is, it is a holistic way of bringing together the personal and social into sets of activities that can be named and referred to and which constitute the domains of the everyday world in which we operate. Professions, occupations and many other activities can be regarded as sets of practices. To talk of them as practices is to acknowledge that they are not just the exercise of the knowledge and skills of practitioners, but to see them as fulfilling particular purposes in particular social contexts. The practice is meaningful within the context in which it takes place. To abstract it from the environment in which it operates is to remove key features of the practice. Teaching, for example, may occur in the context of a school. Courses that prepare school teachers take great care to ensure that what is learned is learned in order to be exercised in this context, so that courses for vocational teachers or higher education teachers might differ markedly, not as a result of fundamental differences in how different students learn, but because of the social, institutional and cultural context of the organizations in which teaching occurs. The practices, therefore, of different kinds of teachers in different kinds of settings, differ in quite profound ways, while retaining similar family resemblances. Practice is also a theoretical notion that provides a way of framing ways in which we can investigate the world. Schatzki (2001) has identified the practice turn in contemporary theory and Schwandt (2005) has proposed ways of modelling the practice fields, that is, those areas represented in higher education that prepare students for particular professional practices. In our present context it is important to make a distinction between practice as a location for activity and doing an activity (i.e., a hospital ward is a setting for student nurses’ learning in practice) on the one hand, and practice as a theoretical construct referring to the nature of activity (i.e., practice is where skills, knowledge and dispositions come together and perform certain kinds of work), on the other (Schatzki, 2001). Considerations of practice therefore enable us to sidestep theoretical bifurcations, such as those between individual and social, structure and agency or system and lifeworld. This theorisation of practice points to the need to consider assessment as more than individual judgements of learning. It brings in the importance of the nature of the activity that performs work and the setting in which it occurs. Practice is a holistic conception that integrates what people do, where they do it and with whom. It integrates a multifaceted range of elements into particular functions in which all the elements are needed. Schwandt (2005) encapsulates these ideas in two models that represent different characteristic views of practice. The first of these, Model1, is based
32
D. Boud
in scientific knowledge traditions and broadly describes the views that underpin common traditional programs. It is instrumental and based on means-end rationalities. Practice is seen as an array of techniques that can be changed, improved or learned independently of the ‘‘contingent and temporal circumstances’’ (p. 316) in which practices are embedded. The kind of knowledge generated about practice ought to be ‘‘explicit, general, universal and systematic’’ (p. 318). To achieve this, such knowledge must by definition eliminate the inherent complexity of the everyday thinking that actually occurs in practice. The second, Model2, draws from practical knowledge traditions. In it practice is a ‘‘purposeful, variable engagement with the world’’ (p. 321). Practices are fluid, changeable and dynamic, characterised by their ‘alterability, indeterminacy and particularity’ (p. 322). What is important is the specific situation in which particular instances of practice occur and hence the context-relativity of practical knowledge. In this model, knowledge must be a flexible concept, capable of attending to the important features of specific situations. Practice is understood as situated action. While Schwandt (2005) presents these models in apparent opposition to each other, for our present purposes both need to be considered, but because the first has been overly emphasised in discussions of courses and assessment, we will pay attention to Model2. As introduced earlier, the key features of a practice view from the practical knowledge traditions are that, firstly, practice is necessarily contextualised, that is, it cannot be discussed independently of the settings in which it occurs. It always occurs in particular locations and at particular times; it is not meaningful to consider it in isolation from the actual sites of practice. Secondly, practice is necessarily embodied, that is, it involves whole persons, including their motives, feelings and intentions. Discussion of it in isolation from the persons who practice is to misunderstand practice. Furthermore, we must also consider the changing context of professional practice (Boud, 2006). This involves, firstly, a collective rather than an individual focus on practice, that is, a greater emphasis on the performance of teams and groups of practitioners. Secondly, it involves a multidisciplinary and, increasingly, a transdisciplinary focus of practice. In this, practitioners of different specialisations come together to address problems that do not fall exclusively in the practice domain of any one discipline. It is not conducted by isolated individuals, but in a social and cultural context in which what one professional does has necessarily to link with what others do. This is even the case in those professions in which past cultural practice has been isolationist. Thirdly, there is a new emphasis on the co-production of practice and the co-construction of knowledge within it. Professionals are increasingly required to engage clients, patients and customers as colleagues who co-produce solutions to problems and necessarily co-produce the practices in which they are engaged. Practice and practice theory point to a number of features we need to consider in assessment. The first is the notion of context knowledge and skills used in a particular practice setting. The kinds of knowledge and skills utilised
3 How Can Practice Reshape Assessment?
33
depend on the setting. The second is the bringing together of knowledge and skills to operate in a particular context for a particular purpose. Practice involves these together, not each operating separately. Thirdly, knowledge and skills require a disposition on the part of the practitioner, a willingness to use these for the practice purpose. Fourthly, there is a need in many settings to work with other people who might have different knowledge and skills to undertake practice. And, finally, there is the need to recognise that practice needs to take account of and often involve those people who are the focus of the practice. How well does existing assessment address these features of practice? In general, it meets these requirements very poorly. Even when there is an element of authentic practice in a work setting as part of a course, the experience is separated from other activities and assessment of course units typically occurs separately from placements, practical work and other located activities. Also, the proportion of assessment activities based upon practice work, either on campus or in a placement, or indeed any kind of working with others, is commonly quite small and is rarely the major part of assessment in higher education. When the vast bulk of assessment activities are considered, they may use illustrations and examples from the world of practice, but they do not engage with the kinds of practice features we have discussed here. Significantly, assessment in educational institutions is essentially individualistic (notwithstanding some small moves towards group assessment in some courses). All assessment is recorded against individuals and group assessments are uncommon and are often adjusted to allow for individual marks. If university learning and assessment is to provide the foundation for students to subsequently engage in practice, then it needs to respond to the characteristics identified as being central to practice and find ways of incorporating them into common assessment events. This needs to occur whether or not the course uses placements at all or whether or not it is explicitly vocational. Of course, in many cases we do not know the specific social and organisational contexts in which students will subsequently practice. Does this mean we must reject the practice perspective? Not at all: while the specifics of any given context may not be known, it is known that the exercise of knowledge and skill will not normally take place in isolation from others, both other practitioners and people for whom a service or product is provided. It will take place in an organisational context. It will involve emotional investments on the part of the practitioner and it will involve them in planning and monitoring their own learning. These elements therefore will need to be considered in thinking about how students are assessed and examples of contexts will need to be assumed for these purposes.
Assessment Embedded in Practice Where can we look for illustrations of assessment long embedded in practice? A well-established example is that of traditional trade apprenticeships. While they do not occur in the higher education sector, and they have been modified in
34
D. Boud
recent years through the involvement of educational institutions, they show a different view of the relationship between learning, assessment and work than is familiar from universities. Mention of apprenticeships here is not to suggest that university courses should be more like apprenticeships, but to demonstrate that assessment can be conceived of in quite different terms than is commonplace in universities and can meaningfully engage with notions of practice. In the descriptions of apprenticeship that follow, I am grateful for the work of Kvale, Tanggaard and Elmholdt and their observations about how it works and how it is accepted as normal (Kvale, 2008; Tanggaard & Elmholdt, 2008). In an apprenticeship a learner works within an authentic site of production. The learner is immersed in the particular practice involved. It occurs all around and practice is not contrived for the purposes of learning: the baker is baking, the turner is machining metal and the hairdresser is cutting real hair for a person who wants the service. The apprentice becomes part of a community of practice over time (Lave & Wenger, 1991). They start by taking up peripheral roles in which they perform limited and controlled aspects of the practice and move on by being given greater responsibility for whole processes. They may start by washing, but not cutting hair, they may assemble the ingredients for bread and so on. The apprentice is surrounded by continual opportunities for guidance and feedback from experienced practitioners. The people they learn from are practising the operations involved as a normal part of work. They are not doing it for the benefit of the learner, but they provide on-going role models of how practice is conducted from which to learn. When the apprentice practises, assessment is frequent, specific and standards-based. It doesn’t matter whether the piece is manufactured better than can be done by other apprentices. What counts is to what tolerances the metal is machined. If the work is not good enough, it gets repeated until the standards are reached. Norm-referenced assessment is out and standards-based assessment is in.
Assessment for Learning In considering what a practice perspective might contribute to assessment, it is necessary to explore what assessment aims to do, what it might appropriately do and what it presently emphasises. Assessment in higher education has mainly come from the classroom tradition of teachers ‘marking’ students’ work, overlaid with external examinations to judge performance for selection and allocation of scarce resources. The latter focus has meant that an emphasis on comparability, consistency across individuals, defensibility and reliability has dominated. Summative assessment in many respects has come to define the norm for all assessment practice and other assessment activities are often judged in relation to this, especially as summative assessment pervades all stages of courses. While formative purposes are often acknowledged in university assessment policies, these are usually
3 How Can Practice Reshape Assessment?
35
subordinate to the summative purpose of certification. There are indications that this is changing though and that an exclusive emphasis on summative purposes has a detrimental effect on learning. Measurement of performance can have negative consequences for what is learned (Boud, 2007). The classroom assessment tradition has been reinvigorated in recent times through a renewed emphasis on research on formative assessment in education generally (Black & Wiliam, 1998), on self assessment in higher education (Boud, 1995) and on how assessment frames learning (Gibbs, 2006). This work has drawn attention to otherwise neglected aspects of assessment practice and the effects of teachers’ actions and learning activities on students’ work. Sadler (1989, 1998) has for a long time stressed the importance of bridging the gap between comments from teachers, often inaccurately termed ‘feedback’, and what students do to effect learning. Nicol and MacFarlane-Dick (2006) have most recently developed principles to guide feedback practice that can influence students regulating their own learning. Regretfully, while there are many rich ideas to improve assessment practice to assist learning they have been eclipsed by the greater imperatives of grading and classification. It is worth noting in passing that it is tempting to align the need for certification and the need for learning. This though would be an inappropriate simplification that failed to acknowledge the contradictions between being judged and developing the capacity to make judgements. We have the difficulty of the limitations of language in having one term in most parts of the Englishspeaking world – assessment – to describe something with often opposing purposes. While it may be desirable to choose new terms, we are stuck with the day-to-day reality of using assessment to encompass many different and often incompatible activities. Elsewhere (Boud, 2007), I have argued that we should reframe the notion of assessment to move away from the unhelpful polarities prompted by the formative/summative split and an emphasis on measurement as the guiding metaphor in assessment thinking. This would enable us to consider how assessment can be used to foster learning into the world of practice beyond the end of the course. The focus of this reframing is around the notion of informing judgement and of judging learning outcomes against appropriate standards. Students must necessarily be involved in assessment as assessment is a key influence in their formation and they are active subjects. Such involvement enables assessment to contribute not only to learning during the course but also to future learning beyond the end of the course through the development of students’ capacity for making judgments about the quality of their work, and thus making decisions about their learning. Unless students develop the capacity to make judgments about their own learning they cannot be effective learners now or in the future. A move away from measurement-oriented views of assessment is not new. Hager and Butler (1996), for example, have drawn attention to what they identify as two contrasting models of educational assessment. The first of these is what they term the scientific measurement model in which practice is derived from theory, knowledge is a ‘given’ for practical purposes, knowledge is
36
D. Boud
‘impersonal’ and context free, assessment is discipline-driven and assessment deals with structured problems. The second they term the judgemental model in which practice and theory are loosely symbiotic, knowledge is understood as provisional, knowledge is a human construct and reflects the context in which it is generated and used, assessment is problem-driven and assessment deals with unstructured problems. Such a judgemental model places assessment within the context of practice and the use of knowledge. The notion of informing judgement can be a more generative and integrating notion through which to view assessment than the polarising view of formative and summative assessment. It focuses on the key act of assessment that needs to be undertaken by both teachers and students in ensuring that learning has occurred. Assessment may in this light be more productively seen as a process of human judgement than as a process of scientific measurement. Such a perspective is also helpful for a practice-based orientation to assessment. In order to improve practice we need helpful information that not only indicates what we are doing, but the judgements that go with this. Interpretation is needed, as raw data alone does not lend itself to fostering change. This highlights the importance of feedback, yet as teachers with increasingly large classes, we can never provide students with as much or as detailed feedback as they need. Indeed, after the end of the course there is no teacher to offer feedback. Consequently, judgements must be formed for ourselves, whether we are students or practitioners, drawing upon whatever resources are available to us. The idea of assessment as informing judgement can take us seamlessly from being a student to becoming a professional practitioner. It is an integral part of ongoing learning and developing the capacity to be a practitioner. There are many implications of viewing assessment as informing student judgement (Boud & Falchikov, 2007). The most fundamental is to create the circumstances that encourage students to see themselves as active agents in their own learning. Without a powerful sense that they are actively shaping themselves as persons who can exercise increasingly sophisticated kinds of judgement, they are destined to become dependent on others. These others may be the lecturer or tutor while in university, but later these may morph into experts, authority figures and employers. Of course, there is nothing wrong in respecting the judgement of such people and taking these into account. Difficulties arise though when judgement apart from such people cannot be made. When this occurs, substantial risks arise; calculations may be not sufficiently checked, ethical considerations may be glossed over and implications of decisions not adequately considered. If someone else is believed to be the final arbiter of the quality of one’s work, then responsibility has not been accepted. An emphasis on producing students who see themselves as active agents requires a new focus on fostering reflexivity and self-regulation through all aspects of a course, not just assessment tasks. It cannot be expected that judgement comes fully developed into play in assessment acts. What precedes assessment must also make a significant contribution. This leads to the importance of organising opportunities for developing informed judgement throughout
3 How Can Practice Reshape Assessment?
37
programs. It could be argued that structuring occasions for this is the most important educational feature of any course. Fostering reflexivity and selfregulation is not something that can take place in one course unit and be expected to benefit others. It is a fundamental attribute of programs that needs to be developed and sustained throughout. Assessment must be integrated with learning and integrated within all elements of a program over time. As soon as one unit is seen to promote dependency, it can detract from the overall goal of the program. These are more demanding requirements than they appear at first sight. They require vigilance for all assessment acts, even the most apparently innocuous. For example, tests at the start of a course unit which test simple recall of terminology as a pre-requisite for what follows, can easily give the message that remembering the facts is what is required in this course. This does not mean that all such tests are inappropriate, but it does mean that the total experience of students needs to be considered, and perhaps tests of one kind need to be balanced with quite different activities in order to create a program that has the ultimate desired outcomes. This points to the need to examine the consequences of all assessment acts for learning. Do they or do they not improve students’ judgement, and if so, how do they do it? If they do not, then they are insufficiently connected with their main raison d’eˆtre of education to be justified – if assessment does not actively support the kinds of learning that are sought, then it is at least a missed opportunity, if not an act that undermines the learning outcome.
Apprenticeship as a Prototype It can help to gain some perspective on higher education practice by considering the other example of a situation mentioned earlier in which assessment and learning occurs. As we have seen, there is a tradition of assessment that comes from the world of work. This tradition of the artisan, craftsperson and apprentice predates almost all of formal education (Tanggaard & Elmholdt, 2008). Here the emphasis has been on the formation of the person into an expert practitioner. All judgements are made in comparison to what is good work and how that can be achieved. While there is an ultimate concern with final products, what is more important is how they are achieved. Correct processes are what is emphasised. These will lead to good products, but final production of complete items may be delayed for a considerable time. What is particularly interesting in the case of apprenticeships is how robust this practice has been over time and changing patterns of work and how the de facto assessment activities have been able to withstand the re-regulation of assessment tasks into new competency-based frameworks. One of the main reasons for the strength of apprenticeships is that they are unambiguously preparing students for particular forms of practice that are visible to them and their workplace teachers throughout their training.
38
D. Boud
What more can we learn from the apprenticeship tradition? Firstly, it has a strong base in a community of practice in which the apprentice is gradually becoming a part. That is, the apprentice sees the practices of which he or she is becoming a part. They can imagine their involvement and vicariously, and sometimes directly, experience the joys and frustrations of competent work. The apprentice is a normal part of the work community and is accepted as a part of it. There are clear external points of reference for judgements that are made about work itself. These are readily available and are used on a regular basis. Practice is reinforced by repetition and skill is developed. In this, assessment and feedback are integrated into everyday work activities. They are not isolated from it and conducted elsewhere. Final judgements are based on what a person can do in a real setting, not on a task abstracted from it. Grades are typically not used. There is no need to extract an artificial measure when competence or what is ‘fit-for-purpose’ are the yardsticks. Of course, we can only go so far in higher education in taking such an example. Higher education is not an apprenticeship and it is not being suggested that it should become like one. In apprenticeship, the skills developed are of a very high order, but the range of activity over which they are deployed can be more limited than in the professions and occupations for which higher education is a preparation. The challenge for higher education is to develop knowledge and skills in a context in which the opportunities for practice and the opportunities to see others practice are more restricted but where high-level competencies are required. This does not suggest that practice be neglected, it means though that it is even more important to focus on opportunities for it to be considered than the apparent luxury of the everyday practice setting of the apprenticeship.
Assessment for Practice What else can we take from a practice view for considering how assessment in higher education should be conducted? The first, and most obvious consideration is that assessment tasks must be contextualized in practice. Learning occurs for a purpose, and while not all ultimate purposes are known at the time of learning, it is clear that they will all involve applications in sites of practice. To leave this consideration out of assessment then is to denature assessment and turn it into an artificial construct that does not connect with the world. Test items that are referential only to ideas and events that occur in the world of exposition and education may be suitable as intermediate steps towards later assessment processes, but they are not realistic on their own. While there may be scope in applied physics, for example, for working out the forces on a weightless pulley or on an object on a frictionless surface, to stop there is to operate in a world where there is a lack of consequence. To make this assumption is to deny that real decisions will have to be made and to create the need for students to unlearn something in order to operate in the world.
3 How Can Practice Reshape Assessment?
39
The second consideration, which is related to this, is that performance should be judged by the standards of the practice itself, not an abstraction which owes little to an understanding of it. The key question to be considered is: what is the appropriate community of judgement that should be the reference point for assessment purposes? That is, from where should standards be drawn? This is an issue for both students as they learn to discern appropriate sources of standards for their work and also for teachers and assessors. The community of judgement may vary for any given set of subject matter. What is judged to be an appropriate level and type of mathematical knowledge, for example, may vary between engineers who use mathematics, and mathematicians who may have a role in teaching it. A focus on standards also draws attention to the problem that there are far more things to learn, know and do than can possibly be included in the assessment regime of any particular course or unit of study. Rather than attempt to squeeze an excessive number of outcomes into assessment acts it may be necessary, as the late Peter Knight has persuasively argued (Knight, 2007), to ensure that the environment of learning provides sufficient opportunities to warrant that learning has occurred rather than to end-load course assessment with so many tasks that they have to be approached by students in a manner that produces overload and ensures they are dealt with in a superficial way. Finally, recognition from practice that learning is necessarily embodied and engages the emotions and volition of learners points to the need to acknowledge that assessment has visceral effects rather than to ignore them. This implies that assessment needs to have consequences other than the grading of students. Students need to be involved in the impact of their learning on others. Part of this may be simulated, but part, as occurs when there are placements in practice settings, may be real. When they are real, as happens in the teacher education practicum or the nursing clinical placement, students are involved with real children or patients, but they are supervised and the risk of possible negative consequences is controlled. Nevertheless, such settings provide students with the social-emotional environment to enable them to experience the emotional consequences of their actions. To draw this together, what is being argued here is not that we must move assessment into practice-settings in the way that has occurred in some aspects of vocational education and training, but that an awareness of and sensitivity to practice needs to pervade ways in which assessment is conceptualised and to balance some of the short-term and technical considerations that have dominated the agenda. While consideration of practice can reshape ways in which we think about assessment, practice settings can also create challenges for assessment. In considering this there is a need to distinguish between the locations available during courses for students to practise, and practice-settings more widely. During courses students are exposed to a limited range of settings and may take up partial roles of practice within them. The argument here is not about the location of assessment, nor of its content, but of the overriding purpose of
40
D. Boud
involving students in making judgements so that they and others can take a view about whether learning suitable for life after courses has been achieved and what more needs to be engaged in.
Implications Why then should we consider starting with practice as a key organiser for assessment? As we have seen, it is anchored in the professional world, not the world of educational institutions. This means that there are multiple views of practice that are available external to the educational enterprise. Practice focuses attention on work outside the artifacts of the course – there is a point of reference for decision-making beyond course-determined assessment criteria and actions that take place have consequences beyond those of formal assessment requirements. These create possibilities for new approaches to assessment. In addition to this, judgements of those in a practice situation (professional or client) make a difference to those involved. That is, there are consequences beyond the learning of the student that frame and constrain actions. These provide a reality-check not available in an internally-referenced assessment context. These considerations raise the stakes, intensify the experience and embody the learner more thoroughly in situations that anticipate engagement as a full professional. As we see beyond our course into the world of professional practice, assessment becomes necessarily authentic: authenticity does not need to be contrived. To sum up, a ‘practice’ perspective helps us to focus on a number of issues for assessment tasks within the mainstream of university courses. These include: 1. Locating assessment tasks in authentic contexts. These need not necessarily involve students being placed in external work settings, but involve the greater use of features of authentic contexts to frame assessment tasks. They could model or simulate key elements of authentic contexts. 2. Establishing holistic tasks rather than fragmented ones. The least authentic of assessment tasks are those taken in isolation and disembodied from the settings in which they are likely to occur. While tasks may need to be disaggregated for purposes of exposition and rehearsal of the separate elements, they need to be put back together again if students are to see knowledge as a whole. 3. Focusing on the processes required for a task rather than the product or outcome per se. Processes and ways of approaching tasks can often be applied from one situation to another whereas the particularities of products may vary markedly. Involving students in ways of framing tasks in assessment is often neglected in conventional assessment.
3 How Can Practice Reshape Assessment?
41
4. Learning from the task, not just demonstrating learning through the task. A key element of learning from assessment is the ability to identify cues from tasks themselves which indicate how they should be approached, the criteria to be used in judging performance and what constitutes successful completion. 5. Having consciousness of the need for refining the judgements of students, not just the judgement of students by others. Learning in practice involves the ability to continuously learn from the tasks that are encountered. This requires progressive refinement of judgements by the learner which may be inhibited by the inappropriate deployment of the judgements of others when learners do not see themselves as active agents. 6. Involving others in assessment activities, away from an exclusive focus on the individual. Given that practice occurs in a social context, it is necessary that the skill of involving others is an intrinsic part of learning and assessment. Assessment with and for others needs far greater emphasis in courses. 7. Using standards appropriate to the task, not on comparisons with other students. While most educational institutions have long moved from inappropriate norm-referenced assessment regimes, residues from them still exist. The most common is the use of generic rather than task-specific standards and criteria that use statements of quality not connected to the task in hand (eg. abstract levels using terms such as adequate or superior performance, without a task-oriented anchor). 8. Moving away from an exclusive emphasis on independent assessment in each course unit towards development of assessment tasks throughout a program and linking activities from different courses. The greatest fragmentation often occurs through the separate treatment of individual course units for assessment purposes. Generic student attributes can only be achieved through coordination and integration of assessment tasks across units. Most of the skills of practice discussed here develop over time and need practice over longer periods than a semester and cannot be relegated to parts of an overall program. 9. Acknowledging student agency and initiation rather than have them always responding to the prompts of others. The design of assessment so that it always responds to the need to build student agency in learning and development is a fundamental challenge for assessment activities. This does not mean that students have to choose assessment tasks, but that they are constructed in ways that maximise active student involvement in them. 10. Building in an awareness of co-production of outcomes with others. Practitioners not only work with others, but they co-produce with them. This implies that there needs to be assessment tasks in which students co-construct outcomes. While this does not necessarily require group assessment as such, it needs to design activities with multi-participant outcomes into an overall regime.
42
D. Boud
The challenge the practice perspective creates is to find ways of making some of these shifts in assessment activities in a higher education context that is moving rapidly in an outcomes-oriented direction, but which embodies the cultural practices of an era deeply sceptical of the excessively vocational. The implication of taking such a perspective is not that more resources are required or that we need to scrap what we are doing and start again. It does however require a profound change of perspective. We need to move from privileging our own academic content and of assessing students as if our part of the course was more important than anything else, to a position that is more respectful of the use of knowledge, of the program as a whole and the need to build the capacity of students to learn and assess for themselves once they are out of our hands. Some of the changes required are incremental and involve no more than altering existing assessment tasks by giving them stronger contextual features. However, others create a new agenda for assessment and provoke us to find potentially quite new assessment modes that involve making judgements in co-production of knowledge. If we are to pursue this agenda, we need to take up the challenges and operate in ways in which our graduates are being increasingly required to operate in the emerging kinds of work of the twenty-first century.
References Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. Boud, D. (1995). Enhancing learning through self assessment. London: Kogan Page. Boud, D. (2006, July). Relocating reflection in the context of practice: Rehabilitation or rejection? Keynote address presented at Professional Lifelong Learning: Beyond Reflective Practice, a conference held at Trinity and All Saints College. Leeds: Institute for Lifelong Learning, University of Leeds. Retrieved 20 October, 2007, from http://www. leeds.ac.uk/educol/documents/155666.pdf Boud, D. (2007). Reframing assessment as if learning was important. In D. Boud & N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer term (pp. 14–25) London: Routledge. Boud, D., & Falchikov, N. (2006). Aligning assessment with long term learning. Assessment and Evaluation in Higher Education, 31(4), 399–413. Boud, D., & Falchikov, N. (2007). Developing assessment for informing judgement. In Boud, D. and Falchikov, N. (Eds) Rethinking Assessment in Higher Education: Learning for the Longer Term (pp. 181–197). London: Routledge. Gibbs, G. (2006). How assessment frames student learning. In K. Clegg & C. Bryan (Eds.), Innovative Assessment in Higher Education. London: Routledge. Hager, P., & Butler, J. (1996). Two models of educational assessment. Assessment and Evaluation in Higher Education, 21(4), 367–378. Knight, P. (2007). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.), Rethinking Assessment in higher education: Learning for the longer term (pp. 72–86) London: Routledge. Kvale, S. (2008). A workplace perspective on school assessment, In A. Havnes & L. McDowell (Eds.), Balancing dilemmas in assessment and learning in contemporary education (pp. 197–208) New York: Routledge.
3 How Can Practice Reshape Assessment?
43
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, UK: Cambridge University Press. Nicol, D. J., & MacFarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199–218. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education, 5(1), 77–84. Schatzki, T. R. (2001). Introduction: Practice theory. In T. Schatzki, K. Knorr Cetina, & E. von Savigny (Eds.), The practice turn in contemporary theory (pp. 1–14). London: Routledge. Schwandt, T. (2005). On modelling our understanding of the practice fields. Pedagogy, Culture & Society, 13(3), 313–332. Tanggaard, L., & Elmholdt, C. (2008). Assessment in practice: An inspiration from apprenticeship. Scandinavian Journal of Educational Research, 52(1), 97–116.
Chapter 4
Transforming Holistic Assessment and Grading into a Vehicle for Complex Learning D. Royce Sadler
Introduction One of the themes running through my work since 1980 has been that students need to develop the capacity to monitor the quality of their own work during its actual production. For this to occur, students need to appreciate what constitutes work of higher quality; to compare the quality of their emerging work with the higher quality; and to draw on a store of tactics to modify their work as necessary. In this chapter, this theme is extended in two ways. The first is an analysis of the fundamental validity of using preset criteria as a general approach to appraising quality. The second is a teaching design that enables holistic appraisals to align pedagogy with assessment. For the purposes of this chapter, a course refers to a unit of study that forms a relatively self-contained component of a degree program. A student response to an assessment task is referred to as a work. The assessed quality of each work is represented by a numerical, literal or verbal mark or grade. Detailed feedback from the teacher may accompany the grade. For the types of works of interest in this chapter, grades are mostly produced in one of two ways. In analytic grading, the teacher makes separate qualitative judgments on a limited number of properties or criteria. These are usually preset, that is, they are nominated in advance. Each criterion is used for appraising each student’s work. The teacher may prescribe the criteria, or students and teachers may negotiate them. Alternatively, the teacher may require that students develop their own criteria as a means of deepening their involvement in the assessment process. In this chapter, how the criteria are decided is not important. After the separate judgments on the criteria are made, they are combined using a rule or formula, and converted to a grade. Analytic grading is overtly systematic. By identifying the specific elements that contribute to the final grade, analytic grading provides the student with explicit feedback. The template used in D.R. Sadler Griffith Institute for Higher Education, Mt Gravatt Campus, Griffith University, Nathan, Qld, Australia e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_4, Ó Springer ScienceþBusiness Media B.V. 2009
45
46
D.R. Sadler
implementing the process may be called a rubric, or any one of scoring, marking or grading paired with scheme, guide, matrix or grid. As a group, these models are sometimes referred to as criterion-based assessment or primary trait analysis. In holistic or global grading, the teacher responds to a student’s work as a whole, then directly maps its quality to a notional point on the grade scale. Although the teacher may note specific features that stand out while appraising, arriving directly at a global judgment is foremost. Reflection on that judgment gives rise to an explanation, which necessarily refers to criteria. Holistic grading is sometimes characterised as impressionistic or intuitive. The relative merits of analytic and holistic grading have been debated for many years, at all levels of education. The most commonly used criterion for comparison has been scorer reliability. This statistic measures the degree of consistency with which grades are assigned to the same set of works by different teachers (inter-grader reliability), or by the same teacher on separate occasions (temporal reliability). Scorer reliability is undoubtedly a useful criterion, but is too narrow on its own. It does not take into account other factors such as the skills of the markers in each method, or the extent to which each method is able to capture all the dimensions that matter. The use of analytic grading schemes and templates is now firmly established in higher education. Internationally, rapid growth in popularity has occurred since about 1995. Nevertheless, the basic ideas are not new. Inductively decomposing holistic appraisals goes back at least to 1759, when Edmund Burke set out to identify the properties that characterise beautiful objects in general. In the forward direction, the theory and practice of assembling overall judgments from discrete appraisals on separate criteria has developed mostly over the last 50 years. It has given rise to an extensive literature touching many fields. Key research areas have been clinical decision making (Meehl, 1954/1996) and human expertise of various types (Chi, Glaser & Farr, 1988; Ericsson & Smith, 1991). The terminology used is diverse, and includes ‘policy capturing’ and ‘actuarial methods’. Specifically in educational assessment, Braddock, Lloyd-Jones, and Schoer (1963) reported early developmental work on analytic approaches to grading English composition, and the rationale for it; primary trait scoring is described in Lloyd-Jones (1977). Researchers in higher education assessment have explored in recent years the use of criteria and rubrics, specifically involving students in self- and peer-assessment activities (Bloxham & West, 2004; Orsmond, Merry & Reiling, 2000; Rust, Price & O’Donovan, 2003; Woolf, 2004). Many books on assessment in higher education advocate analytic grading, and provide practitioners with detailed operational guidelines. Examples are Freeman and Lewis (1998), Huba and Freed (2000), Morgan, Dunn, Parry, and O’Reilly (2004), Stevens and Levi (2004), Suskie (2004), and Walvoord and Anderson (1998). For the most part, both the underlying principles and the various methods of implementation have been accepted uncritically. In this chapter, the sufficiency of analytic grading as a general approach for relevant classes of student works is
4 Transforming Holistic Assessment and Grading
47
called into question, on both theoretical and practical grounds. The basic reason is that it sets up appraisal frameworks that are, in principle, sub-optimal. Although they work adequately for some grading decisions, they do a disservice to others by unnecessarily constraining the scope of appraisals. The assumption that using preset criteria is unproblematic has had two inhibiting effects. First, teachers typically have not felt free to acknowledge, especially to students, the existence or nature of certain limitations they encounter. Second, there has been little or no imperative to explore and develop alternative ways forward. The theme of this chapter is developed around five propositions. The first four are dealt with relatively briefly; the fifth is assigned a section of its own. The driving principle is that if students are to achieve consistently high levels of performance, they need to develop a conceptualisation of what constitutes quality as a generalised attribute (Sadler, 1983). They also need to be inducted into evaluating quality, without necessarily being bound by tightly specified criteria. This approach mirrors the way multi-criterion judgments are typically made by experienced teachers. It is also an authentic representation of the ways many appraisals are made in a host of everyday contexts by experts and non-experts alike. Equipping students with evaluative insights and skills therefore contributes an important graduate skill. All five propositions are taken into account in the second half of the chapter, which outlines an approach to the assessment of complex student productions.
Applicable Types of Assessment Tasks The types of tasks to which this chapter applies are those that require divergent or ‘open’ responses from students. Divergent tasks provide opportunities for learners to demonstrate sophisticated cognitive abilities, integration of knowledge, complex problem solving, critical reasoning, original thinking, and innovation. Producing a response requires abilities in both design and production, allowing considerable scope for creativity. There are no formal techniques or recipes which, if followed precisely, would lead to high-quality responses. There is also no single correct or best answer, result or solution. Common formats for divergent responses include field and project reports, seminar presentations, studio and design productions, specialised artefacts, professional performances, creative works, term papers, essays, and written assignments. In assessing achievement across a broad range of disciplines and professions, divergent responses predominate. Within each genre, student works may take quite different forms, yet be of comparable quality. This characteristic is regarded as highly desirable in many domains of higher education. Determining the quality of divergent types of works requires skilled, qualitative judgments using multiple criteria. A qualitative judgment is one made directly by a person, the person’s brain being both the source and the instrument for appraisal (Sadler, 1989). The judgment cannot be reduced to a set
48
D.R. Sadler
of measurements or formal procedures that lead to the ‘correct’ appraisal. Qualitative judgments are unavoidable in many fields of higher education, and both holistic and analytic grading are based on them. The two approaches differ primarily in their granularity. Holistic grading involves appraising student works as integrated entities; analytic grading requires criterion-by-criterion judgments. Historically, a steady swing has occurred away from holistic and towards analytic judgments, and then a further trend has occurred within analytic judgments. When scoring guides and marking schemes first became common, the focus tended to be on either the inclusion or omission of specific content, or the structure of the response. For a written piece, this structure could be Introduction, Statement of the problem, Literature review, Development of an argument or position, and Conclusion. The subsequent shift has concentrated on properties or dimensions related to quality. Regardless of focus, all analytic grading schemes introduce formal structure into the grading process, ostensibly to make it more objective and thus reduce the likelihood of favouritism or arbitrariness.
The First Four Propositions Already referred to briefly above, Proposition 1 is that students need to develop the capacity to monitor the quality of their work during its actual production (Sadler, 1989). In relation to creating responses to assessment tasks, this capability needs to be acquired as a course proceeds. Teaching therefore needs to be designed so as to make specific provision for its development. As the learning sequence progresses, students’ understanding of quality needs not only to grow but also to become broadly consonant with that held by the teacher. This is partly because the teacher usually has a strong say in the final grade, and partly because the teacher’s feedback does not make much sense otherwise. But there are deeper implications. Ultimately, the concept of quality needs to relate to works that graduates will produce after their formal studies are completed, as they demonstrate professional expertise. This implies, therefore, that the teacher’s frame of reference about quality should reflect the conventions and expectations evident in other relevant environments such as the arts and professions, industry, and elsewhere in academia. Self-monitoring means that students make conscious judgments on their own, without help from teachers or peers. It entails being weaned away from ongoing dependence on external feedback, irrespective of its source or character. Among other things, self-monitoring requires an appreciation of what makes a work of high quality. It also requires enough evaluative skill to compare, with considerable detachment, the quality of what the producer is creating with what would be needed for it to be of high quality. For selfmonitoring to have any impact on an emerging work, the student also needs a repertoire of alternative moves upon which to draw at any pertinent point or
4 Transforming Holistic Assessment and Grading
49
stage in the development. Otherwise the work cannot be improved. This in turn necessitates that the student becomes sensitive to where those ‘pertinent points’ are, as they arise, during construction. Many of the students whose usual levels of performance are mediocre are hampered by not knowing what constitutes work of high quality. This sets an upper bound on their ability to monitor the quality of their own developing work (Sadler, 1983). Raising students’ knowledge about high quality and their capability in self-monitoring can lead to a positive chain of events. These are improved grades, increased intrinsic satisfaction, enhanced motivation and, as a consequence, higher levels of achievement. Within the scope of a single course, it is obviously not realistic to expect learners to become full connoisseurs. But as a course proceeds, the learners’ judgments about the quality of their own works should show progressively smaller margins of error. Self-monitoring raises selfawareness and increases the learner’s metacognition of what is going on. Attaining this goal is not intrinsically difficult, but it does require that a number of specific conditions be met. Not to achieve the goal, however, represents a considerable opportunity loss. Proposition 2 is that students can develop evaluative expertise in much the same way as they develop other knowledge and skills, including the substantive content of a course. Skilled appraisal is just one of many types of expertise, although it seldom features explicitly among course objectives. Initially, a key tool for developing it is credible feedback, primarily from the teacher and peers. Feedback usually takes the form of descriptions, explanations or advice expressed in words. Preset criteria coupled with verbal feedback stem from a desire to tell or inform students. It might be thought that the act of telling would serve to raise the performance ceiling for learners, but just being told is rarely an adequate remedy for ignorance. The height of the ceiling depends on what students make of what is ‘told’ to them. The next step along the path can be taken when relevant examples complement the verbal descriptions (Sadler 1983, 1987, 1989, 2002). Examples provide students with concrete referents. Without them, explanatory comments remain more or less abstract, and students cannot interpret them with certainty. If the number of examples is small, they need to be chosen as judiciously as examples are for teaching. If examples are plentiful, careful selection is not as critical, provided they cover a considerable range of the quality spectrum. Even more progress can be made if teachers and learners actively discuss the descriptions and exemplars together. Reading verbal explanations, seeing pertinent exemplars, and engaging in discourse provide categorically different cognitive inputs. But the best combination of all three still does not go far enough. The remaining element is that students engage in making evaluative decisions themselves, and justifying those decisions. No amount of telling, showing or discussing is a substitute for one’s own experience (Sadler 1980, 1989). The student must learn how to perceive works essentially through the eyes of an informed critic, eventually becoming ‘calibrated’. Learning
50
D.R. Sadler
environments are self-limiting to the extent that they fail to make appropriate provision for students to make, and be accountable for, serious appraisals. Proposition 3 is that students’ direct evaluative experience should be relevant to their current context, not translated from another. The focus for their experience must therefore be works of a genre that is substantially similar to the one in which they are producing. Apart from learning about quality, closely and critically examining what others have produced in addressing assessment tasks expands the student’s inventory of possible moves. These then become available for drawing upon to improve students’ own work. This is one of the reasons peer assessment is so important. However, merely having students engage in peer appraisal in order to make assessment more participatory or democratic is not enough. Neither is treating students as if they were already competent assessors whose appraisals deserve equal standing with those of the teacher, and should therefore contribute to their peers’ grades. The way peer assessment is implemented should reflect the reasons for doing it. Learners need to become reasonably competent not only at assessing other students’ works but also at applying that knowledge to their own works. Proposition 4 is that the pedagogical design must function not only effectively but also efficiently for both teachers and students. There is obviously little point in advocating changes to assessment practices that are more labour intensive than prevailing procedures. Providing students with sufficient direct evaluative experience can be time consuming unless compensating changes are made in other aspects of the teaching. This aspect is taken up later in the chapter.
The Fifth Proposition Ideally, students should learn how to appraise complex works using approaches that possess high scholarly integrity, are true to the ways in which high-quality judgments are made professionally, and have considerable practical potential for improving their own learning. Proposition 5 is that students in many higher education contexts should learn how to make judgments about the quality of emerging and finished works holistically rather than using analytic schemes. The case for this proposition is developed by first analysing the rationale for analytic judgments, and then mounting a critique of the method generally. In recent years, analytic grading schemes using preset criteria have been advocated as superior to holistic appraisals. The rationale for this is more often implied than stated. Basically, it is that such systems: a) improve consistency and objectivity in grading, because the appraisal process is broken down into smaller-scale judgments; b) make transparent to students, as an ethical obligation, the key qualities that will be taken into account;
4 Transforming Holistic Assessment and Grading
51
c) encourage students to attend to the assessment criteria during development of their work, so the criteria can play a product-design role which complements the assessment task specifications; d) enable grading decisions to be made by comparing the quality of a student’s work with fixed criteria and standards rather than to the learner’s previous level of achievement, the performance of others in the class, or the teacher’s personal tastes or preferences; and e) provide accurate feedback more efficiently, with less need for the teacher to write extensive comments. These arguments appear sound and fair. Who would argue against an assessment system that provides more and better feedback, increases transparency, improves accountability, and achieves greater objectivity, all with no increase in workload? On the other hand, the use of preset criteria accompanied by a rule for combining judgments is not the only way to go. In the critique below, a number of issues that form the basis of Proposition 5 are set out. The claim is that no matter how comprehensive and precise the procedures are, or how meticulously they are followed, they can, and for some student works do, lead to deficient or distorted grading decisions. This is patently unfair to those students. In the rest of this section, the case for holistic judgments is presented in considerable detail. Although any proposal to advocate holistic rather than analytic assessments might be viewed initially as taking a backward step, this chapter is underwritten by a strong commitment to students and their learning. Specifically, the aim is to equip students to work routinely with holistic appraisals; to appreciate their validity; and to use them in improving their own work. The five clauses in the rationale above are framed specifically in terms of criteria and standards. This wording therefore presupposes an analytic model for grading. By the end of this chapter, it should become apparent how the ethical principles behind this rationale can be honoured in full through alternative means. The wording of the rationale that corresponds to this alternative would then be different.
Beginning of the Case Whenever a specific practice becomes widespread in a field of human activity, each implementation of it contributes to its normalisation. The message that this practice is the only or preferred approach does not have to be communicated explicitly. Consistent uncritical use sends its own strong signals. In this section, two particular analytic assessment schemes, analytic rating scales and analytic rubrics, are singled out for specific attention. Both are common in higher education, and both are simple for students to understand and apply. With analytic rating scales, multiple criteria (up to 10 or more for complex works) are first specified. Each criterion has an associated scale line defined. In
52
D.R. Sadler
use, the appraiser makes a qualitative judgment about the ‘strength’ or level of the work on each criterion, and marks the corresponding point on the scale line. If a total score is required, the relative importance of each criterion is typically given a numerical weighting. The sub-scores on the scales are multiplied by their respective weightings. Then all the weighted scores are summed, and the aggregate either reported directly or turned into a grade using a conversion table. An analytic rubric has a different format. Using the terminology in Sadler (1987, 2005), a rubric is essentially a matrix of cross-tabulated criteria and standards or levels. (This nomenclature is not uniform. Some rubrics use qualities/criteria instead of criteria/standards as the headings.) Each standard represents a particular level on one criterion. Common practice is for the number of standards to be the same for all criteria, but this is not strictly necessary. A rubric with five criteria, each of which has four standards, has a total of 20 cells. Each cell contains a short verbal description that sets out a particular strength of the work on the corresponding criterion. This is usually expressed either as a verbal quantifier (how much), or in terms of sub-attributes of the main criterion. For each student work, the assessor identifies the single cell for each criterion that best seems to characterise the work. The rubric may also include provision for the various cell standards to carry nominated ranges of numerical values that reflect weightings. A total score can then be calculated and converted to a grade. Holistic rubrics form a less commonly used category than the two above. They associate each grade level with a reasonably full verbal description, which is intended as indicative rather than definitive or prescriptive. These descriptions do not necessarily refer to the same criteria for all grade levels. Holistic rubrics are essentially different from the other two and have a different set of limitations, particularly in relation to feedback. They are not considered further here. In most analytic schemes, there is no role for distinctly global primary appraisals. At most, there may be a criterion labelled overall assessment that enjoys essentially the same status as all the other criteria. Apart from the technical redundancy this often involves, such a concession to valuing overall quality fails to address what is required. Students therefore learn that global judgments are normally compounded from smaller-scale judgments. The learning environment offers regular reinforcement, and the stakes for the student are high. Grading methods that use preset criteria with mechanical combination produce anomalies. The two particular anomalies outlined below represent recurring patterns, and form part of a larger set that is the subject of ongoing research. These anomalies are detected by a wide range of university teachers, in a wide range of disciplines and fields, for a wide range of assessment task types and student works. The same anomalies are, however, invisible to learners, and the design of the appraisal framework keeps them that way. Therein lies the problem.
4 Transforming Holistic Assessment and Grading
53
Anomaly 1 Teachers routinely discover some works for which global impressions of their quality are categorically at odds with the outcomes produced by conscientious implementation of the analytic grading scheme. Furthermore, the teacher is at a loss to explain why. A work which the teacher would rate as ‘‘brilliant’’ overall may not be outstanding on all the preset criteria. The whole actually amounts to more than the sum of its parts. Conversely, a work the teacher would rate as mediocre may come out extremely well on the separate criteria. For it, the whole is less than the sum of its parts. This type of mismatch is not confined to educational contexts. Whenever a discrepancy of this type is detected, teachers who accept as authoritative the formula-based grade simply ignore their informal holistic appraisal without further ado. Other teachers react differently. They question why the analytic grade, which was painstakingly built from component judgments, fails to deliver the true assessment, which they regard as the holistic judgment. In so doing, they implicitly express more confidence in the holistic grade than in the analytic. To reconcile the two, they may adjust the reported levels on the criteria until the analytic judgment tells the story it ‘‘should’’. At the same time, these teachers remain perplexed about what causes such anomalies. They feel especially troubled about the validity and ethics of what could be interpreted by others as fudging. For these reasons, they are generally reluctant to talk about the occurrence of these anomalies to teaching colleagues or to their students. However, many are prepared to discuss them in secure research environments. What accounts for this anomaly? There are a number of possible contributing factors. One is that not all the knowledge a person has, or that a group of people share, can necessarily be expressed in words (Polanyi, 1962). There exists no theorem to the effect that the domain of experiential or tacit knowledge, which includes various forms of expertise, is co-extensive and isomorphic with the domain of propositional knowledge (Sadler, 1980). Another possible factor is the manner in which experts intuitively process information from multiple sources to arrive at complex decisions, as summarised in Sadler (1981). Certain holistic appraisals do not necessarily map neatly onto explicit sets of specified criteria, or simple rules for combination. Further exploration of these aspects lies outside the scope of this chapter. It would require tapping into the extensive literature on the processes of human judgment, and into the philosophy of so-called ineffable knowledge.
Anomaly 2 This anomaly is similar to the first in that a discrepancy occurs between the analytically derived grade and the assessor’s informal holistic appraisal. In this case, however, the teacher knows that the source of the problem is with a
54
D.R. Sadler
particular criterion that is missing from the preset list. That criterion may be important enough to set the work apart, almost in a class of its own. To simplify the analysis, assume there is just one such criterion. Also assume that the teacher has checked that this criterion is not included on the specified list in some disguised form, such as an extreme level on a criterion that typically goes under another name, or some blend of several criteria. Strict adherence to the analytic grading rule would disallow this criterion from entering, formally or informally, into either the process of grading or any subsequent explanation. To admit it would breach the implicit contract between teacher and student that only specified criteria will be used. Why do sets of criteria seem comprehensive enough for adequately appraising some works but not others? Part of the explanation is that, for complex works, a specified set of criteria is almost always a selection from a larger pool (or population) of criteria. By definition, a sample does not fully represent a population. Therefore, applying a fixed sample of criteria to all student works leaves open the possibility that some works may stand out as exceptional on the basis of unspecified criteria. This applies particularly to highly divergent, creative or innovative responses. Arbitrarily restricting the set of criteria for these works introduces distortion into the grading process, lowering its validity. To illustrate this sampling phenomenon, consider a piece of written work such as an essay or term paper, for which rubrics containing at least four criteria are readily available. Suppose the teacher’s rubric prescribes the following criteria:
relevance coherence presentation support for assertions.
Behind these sits a much larger pool of potentially valid criteria. An idea of the size of this pool can be obtained by analysing a large number of published sets. One such (incomplete) collection was published in Sadler (1989). In alphabetical order, the criteria were: accuracy (of facts, evidence, explanations); audience (sense of); authenticity; clarity; coherence; cohesion; completeness; compliance (with conventions of the genre); comprehensiveness; conciseness (succinctness); consistency (internal); content (substance); craftsmanship; depth (of analysis, treatment); elaboration; engagement; exemplification (use of examples or illustrations); expression; figures of speech; flair; flavour; flexibility; fluency (or smoothness); focus; global (or overall) development; grammar; handwriting (legibility); ideas; logical (or chronological) ordering (or control of ideas); mechanics; novelty; objectivity (or subjectivity, as appropriate); organization; originality (creativity, imaginativeness); paragraphing; persuasiveness; presentation (including layout); punctuation (including capitalization); readability; referencing; register; relevance (to task or topic); rhetoric (or rhetorical effectiveness); sentence structure; spelling; style; support for assertions; syntax; tone; transitions; usage; vocabulary; voice; wording.
4 Transforming Holistic Assessment and Grading
55
The majority of these would be familiar to experienced teachers, but dealing with them as separate properties is not at all straightforward. In the abstract, the listed criteria may appear to represent distinct qualities. When they come to be applied, however, some seem to merge into others. The reasons are manifold. First, they are uneven in their scope, some being broad, others narrow. Second, even common criteria often lack sharp boundaries and standardised interpretations, being defined differently by different assessors. When their meanings are probed, vigorous debate usually ensues. Some interpretations are contextually dependent, being defined differently by the same teacher for different assessment tasks. Third, some are subtle or specialised. An example is flair, which is relevant to writing and many other creative arts, and appears to capture a special and valued characteristic that is hard to describe. Fourth, some criteria are effectively alternatives to, or nested within, others. In addition, one cluster of criteria may have the same coverage as another cluster, but without one-toone correspondence. Finally, suppose it were possible to assemble and clarify the whole population of relevant criteria. Any attempt to use them all would be unworkable for assessors and students alike. The obvious way out of this bind is to restrict the list of criteria to a manageable number of, say, the most important, or the most commonly used. In contexts similar to that of written works, such restriction necessarily leaves out the majority.
A Way Forward: Design for Teaching and Learning Should the teacher disclose the existence of anomalies to students and, in so doing, expose the weaknesses of analytic grading? A substantial part of the rationale for prior specification is that students are entitled to have advance knowledge of the basis for their teachers’ appraisals. Traditionally, holistic grading allowed the teacher to keep the reasons more or less private. Given the anomalies above, it is ironic that strictly honouring a commitment to preset criteria can be achieved only at the expense of non-disclosure of anomalies and other limitations once the grading is completed. The two anomalies above, along with other deficiencies not covered in this chapter, raise doubts about the sufficiency of relying on analytic frameworks that use prespecified criteria. These difficulties are structural, and cannot be dealt with by making templates more elaborate. If there is to be a way forward, it has to approach the problem from a different direction. The rest of this chapter is devoted to outlining a design that has dual characteristics. It seeks to reinstate holistic appraisals for grading a large and significant class of complex student works. It also allows for at least some criteria to be specified in advance – without grade determination being dependent on following an inflexible algorithm. Although this is a step back from grading all responses to an assessment task by fixing both the criteria and the combination formula, the commitment to validity, openness and disclosure can remain undiminished.
56
D.R. Sadler
This goal can be achieved by shifting the agenda beyond telling, showing and discussing how appraisals of students’ works are to be made. Instead, the process is designed to induct students into the art of making appraisals in a substantive and comprehensive way. Such a pedagogical approach is integral to the development of a valid alternative not only to analytic grading but also to holistic grading as done traditionally. This induction process can function as a fundamental principle for learningoriented assessment. Properly implemented, it recognises fully the responsibility of the teacher to bring students into a deep knowledge of how criteria actually function in making a complex appraisal, and of the need for the assessor to supply adequate grounds for the judgment. In the process, it sets learners up for developing the capability to monitor the quality of their own work during its development. In particular, it provides for an explicit focus on the quality of how the student work is coming together – as a whole – at any stage of development. The approach outlined below draws upon strategies that have been trialled successfully in higher education classes along with others that are worthy of further exploration and experimentation. Major reform of teaching and learning environments requires significant changes to approaches that have become deeply embedded in practice over many years. Because the use of preset criteria has gained considerable momentum in many academic settings, transition problems are to be expected. Some ideas for identifying and managing these are also included.
Developing Expertise The general approach draws from the work of Polanyi (1962). It could be described as starting learners on the path towards becoming connoisseurs. There is a common saying that goes, ‘‘I do not know how to define quality, but I know it when I see it’’. In this statement, to know quality is to recognise, seize on, or apprehend it. To recognise quality ‘‘when I see it’’ means that I can recognise it – in particular cases. The concrete instance first gives rise to perception, then to recognition. Formally defining quality is a different matter altogether. Whether it is possible for a particular person to construct a definition depends on a number of factors, such as their powers of abstraction and articulation. There is a more general requirement, not limited to a particular person. Are there enough similarly classified cases that share enough key characteristics to allow for identification, using inductive inference, of what they have in common? The third factor is touched on in the account of the first anomaly described above. It is that some concepts appear to be, in principle, beyond the reach of formal definition. Many such concepts form essential elements of our everyday language and communication, and are by no means esoteric. Given these three factors, there is no logical reason to assume that creating a formal definition is always either worth the effort, or even possible.
4 Transforming Holistic Assessment and Grading
57
On the other hand, it is known empirically that experts can recognise quality (or some other complex characteristic) independently of knowing any definition. This is evidence that recognition can function as a fundamentally valid, primary act in its own right (Dewey, 1939). It is also what makes connoisseurship a valuable phenomenon, rather than one that is just intriguing. Basically, connoisseurship is a highly developed form of competence in qualitative appraisal. In many situations, the expert is able to give a comprehensive, valid and carefully reasoned explanation for a particular appraisal, yet is unable to do so for the general case. In other situations, an explanation for even a particular case is at best partial. To accept recognition as a primary evaluative act opens the door to the development of appraisal explanations that are specifically crafted for particular cases without being constrained by predetermined criteria that apply to all cases. Holistic recognition means that the appraiser reacts or responds (Sadler, 1985), whereas building a judgment up from discrete decisions on the criteria is rational and stepwise. That is the key distinction in intellectual processing. The idea of recognising quality when it is observed immediately strikes a familiar chord with many people who work in both educational and non-educational contexts where multi-criterion judgments are required. Giving primacy to an overall assessment has its counterparts in many other fields and professions. This does not imply that grading judgments should be entirely holistic or entirely analytic, as if these were mutually exclusive categories. The two approaches are by no means incompatible, and how teachers use them is a matter of choice. To advocate that a teacher should grade solely by making global judgments without reference to any criteria is as inappropriate as requiring all grades to be compiled from components according to set rules. Experienced assessors routinely alternate between the two approaches in order to produce what they consider to be the most valid grade. This is how they detect the anomalies above, even without consciously trying. In doing this, they tend to focus initially on the overall quality of a work, rather than on its separate qualities. Among other things, these assessors switch focus between global and specific characteristics, just as the eye switches effortlessly between foreground (which is more localised and criterion-bound) and background (which is more holistic and open). Broader views allow things to be seen in perspective, often with greater realism. They also counter the atomism that arises from breaking judgments up into progressively smaller elements in a bid to attain greater precision. Inducting students into the processes of making multi-criterion judgments holistically, and only afterwards formulating valid reasons for them, requires a distinctive pedagogical environment. The end goal is to bring learners at least partly into the guild of professionals who are able to make valid and reliable appraisals of complex works using all the tools at their disposal. Among other things, students need to learn to run with dual evaluative agendas. The first involves scoping the work as a whole to get a feel for its overall quality; the second is to pay attention to its particular qualities.
58
D.R. Sadler
In some learning contexts, students typically see only their own works. Such limited samples cannot provide a sufficient basis for generalisations about quality. For students to develop evaluative expertise requires that three interconnected conditions be satisfied. First, students need exposure to a wide variety of authentic works that are notionally within the same genre. This is specifically the same genre students are working with at the time. The most readily available source of suitable works consists of responses from other students attempting the same assessment task. As students examine other students’ works through the lens of critique, they find that peer responses can be constructed quite differently from one another, even those that the students themselves would judge to be worth the same grade. They also discover, often to their surprise, works that do not address the assessment task as it was specified. They see how some of these works could hardly be classified as valid ‘‘responses’’ to the assessment task at all, if the task specifications were to be taken literally. This phenomenon is common knowledge among higher education teachers, and is a source of considerable frustration to them. The second condition is that students need access to works that range across the full spectrum of quality. Otherwise learners have difficulty developing the concept of quality at all. To achieve sufficient variety, the teacher may need to supplement student works from the class with others from outside. These may come from different times or classes, or be specially created by the teacher or teaching colleagues. The third condition is that students need exposure to responses from a variety of assessment tasks. In a fundamental sense, a developing concept of quality cannot be entirely specific to a particular assessment task. In the process of making evaluations of successive works and constructing justifications for qualitative judgments, fresh criteria emerge naturally. The subtlety and sophistication of these criteria typically increase as evaluative expertise develops. A newly relevant criterion is drawn – on demand – from the larger pool of background or latent criteria. It is triggered or activated by some property of a particular work that is noteworthy, and then added temporarily to the working set of manifest criteria (Sadler, 1983). The work may exhibit this characteristic to a marked degree, or to a negligible degree. On either count, its relevance signals that it needs to be brought into the working set. As latent criteria come to the fore, they are used in providing feedback on the grade awarded. Latent criteria may also be shared with other students in the context of the particular work involved. Starting with a small initial pool of criteria, extending it, and becoming familiar with needs-based tapping into the growing pool, constitute key parts of the pedagogical design. This practice expands students’ personal repertoires of available criteria. It also reinforces the way criteria are translated from latent to manifest, through appraising specific works. As additional criteria need to be brought into play, a class record of them may be kept for interest, but not with a view to assembling a master list. This is because the intention is to provide students with experience in the latent-to-manifest translation process, and the limitations inherent in using fixed sets of criteria.
4 Transforming Holistic Assessment and Grading
59
Students should then come to understand why this apparently fluid approach to judging quality is not unfair or some sort of aberration. It is, in a profound sense, rational, normal and professional. Although it is important that students appreciate why it is not always possible to specify all the criteria in advance, certain criteria may always be relevant. In the context of written works, for example, some criteria are nearly always important, even when not stated explicitly. Examples are grammar, punctuation, referencing style, paragraphing, and logical development, sometimes grouped as mechanics. These relate to basic communicative and structural features. They facilitate the reader’s access to the creative or substantive aspects, and form part of the craft or technique side of creative work. Properly implemented, they are enablers for an appraisal of the substance of the work. A written work is difficult to appraise if the vocabulary or textual structure is seriously deficient. In stark contrast to the student’s situation, teachers are typically equipped with personal appraisal resources that extend across all of the aspects above. Their experience is constantly refreshed by renewed exposure to a wide range of student works, in various forms and at different levels of quality (Sadler, 1998). This is so naturally accepted as a normal part of what being a teacher involves that it hardly ever calls for comment. It is nevertheless easy to overlook the importance of comparable experience for students grappling with the concept of quality. Providing direct evaluative experience efficiently is a major design element. It is labour intensive, but offsets can be made in the way teaching time is deployed. Evaluative activity can be configured to be the primary pedagogical vehicle for teaching a considerable proportion of the substantive content of a course. For example, in teaching that is structured around a lecture-tutorial format, students may create, in their own time, responses to a task that requires them to employ specific high-order intellectual skills such as extrapolating, making structural comparisons, identifying underlying assumptions, mounting counter-arguments, or integrating elements. These assessment tasks should be strictly formative, and designed so that students can respond to them successfully only as they master the basic content. Tutorial time is then spent having students make appraisals about the quality of, and providing informed feedback on, multiple works of their peers, and entering into discussions about the process. In this way, student engagement with the substance of the course takes place through a sequence of produce and appraise rather than study and learn activities. Adaptations are possible for other modes of teaching, such as studio and online.
Challenges of Transition Complex learning, regardless of the field, requires multiple attempts (practice), in a supportive, low-stakes environment, with good feedback. To make progress on the road to connoisseurship is to replace initially inconsistent degrees
60
D.R. Sadler
of success in appraisal, and therefore self-monitoring, by progressively higher levels of expertise. The pedagogical environment in many higher education institutions is not set up in ways that facilitate this type of learning. The necessary changes cut across a number of well-established policies and practices, and therefore require careful management. Six obstacles or potential sources of resistance are identified below. The first three are outlined only; the second three are discussed in more detail. Obstacle 1 is a view held by many students and teachers, overtly or by default, that virtually every course exercise or requirement should contribute towards the course grade. Unless something counts, so the thinking goes, it is not worth doing. Once this climate is established, teachers know that students will put little effort into any exercise that does not carry credit towards the course grade. Student and teacher positions reinforce each other, setting up a credit accumulation economy. The development of evaluative expertise requires production, peer appraisal and peer feedback in a context where there is neither credit nor penalty for trial and error, experimentation, or risk taking. If these are to become normalised as legitimate processes in learning, ways have to be found to subvert the credit accumulation economy. Obstacle 2 arises with teachers who, on occasion, use grading for purposes other than reporting exclusively on each student’s achievement. Teachers may inflate a grade to reward a student who has made an outstanding effort, or who has shown a marked improvement. Such rewards compromise the meaning of grades and retard the development of evaluative expertise. Students and faculty have to be hard-nosed in their focus on quality alone, including the match between the assessment task specifications and the nature of the student responses. They should therefore rigorously exclude all non-achievement influences from the assessment environment. Obstacle 3 arises through institutional policy, grading practices in other courses, or both. If all or most other courses use rubrics, for example, students may be wary of any grading method that appears to be unsystematic, subjective or unfair. Students may also be reluctant to engage in peer appraisal unless they have a rubric in front of them. Many students have internalised the principle that using preset criteria is automatically the only or the best way to appraise complex outcomes. Such conditioning has to be explicitly unlearned (Sadler, 1998). Some ground may be made by explaining the role of appraisal in learning to produce high-quality works. To develop expertise in this domain follows the same basic principles that are used in developing other complex skills or forms of expertise. These involve gathering or receiving information (being told), processing that information in the light of actual examples (being shown), and applying the principles for oneself (doing). Obstacles 4–6 refer specifically to the appraisal processes described in this chapter, and are conceptually connected. They are expressed particularly strongly in some cultures. Obstacle 4 is the belief that appraisal or grading is a teacher role, not a student role. The teacher is perceived not only as the authoritative figure for course content but also as the only person who has
4 Transforming Holistic Assessment and Grading
61
the knowledge and experience to assign grades. Student peers are simply not qualified; they may judge too harshly or too leniently, or give feedback that is superficial and lacking in credibility. Obstacle 5 has to do with the students’ perceptions of themselves. They may feel ill-equipped to grade the works of peers. This is true initially, of course, but changing both the actuality and the self-perception are two of the course goals. The rationale for using other students’ works is not just one of convenience. It ensures that the student responses, which are used as raw material, are as authentic as it is possible to get. This case may be easier to establish with students if the Course Outline includes an objective specifically to that effect, and a complementary statement along the following lines: The learning activities in this course involve self- and peer-assessment. This is an important part of how the course is taught, and how one of its key objectives will be attained. Other students will routinely be making judgments about the quality of your work, and you will make judgments about theirs. The students whose work you appraise will not necessarily be those who appraise yours. If this aspect of the teaching is likely, in principle, to cause you personal difficulty, please discuss your concern with the person in charge of the course within the first two weeks of term.
Obstacle 6 is students’ fear of exposure, loss of face or impending sense of humiliation among their peers. This may be because they lack experience, status or skill. These feelings are personal, about themselves and about how confident they are. Unless students are already familiar with engaging in peer assessment, they may appear to accept the logic behind a transition to a different pedagogy, but retain their reservations and reluctance. Such students need reassurance that just starting on this path is likely to be the hardest part. Once they become accustomed to it, they typically find it highly rewarding, and their learning improves. Learning the skills of appraisal and self-monitoring can be compared with the ways in which many other skills are learned. They are not easy for the uninitiated, and learners may feel embarrassed at their early attempts if they know others will become aware of their efforts. By contrast, when young children are learning to speak, their first bumbling attempts to say a few words are not treated with scorn. The children are encouraged and cheered on whenever they get it right. They then repeat the performance, maybe again and again. Translate this into the higher education context. As soon as students realise that they are making progress, their confidence grows. They accept new challenges and become motivated to try for more. Furthermore, they derive joy and satisfaction from the process. If that could be made a more widespread phenomenon, it is surely a goal worth striving for.
Conclusion The practice of providing students with the assessment criteria and their weightings before they respond to an assessment task is now entrenched in higher education. The rationale contains both ethical and practical elements. Most
62
D.R. Sadler
analytic rubrics and similar templates fix the criteria and the rule for combining separate judgments on those criteria to produce the grade. This practice is rarely examined closely, but in this chapter it is shown to be problematic, in principle and in practice. To address this problem, the unique value of holistic judgments needs to be appreciated, with openness to incorporating criteria that are not on a fixed list. To maintain technical and personal integrity, a number of significant shifts in the assessment environment are necessary. The commitment to mechanistic use of preset criteria needs to be abandoned. Teachers and students need to be inducted into more open ways of making grading decisions and justifying them. Students need to be provided with extensive guided experience in making global judgments of works of the same types they produce themselves. Ultimately, the aim is for learners to become better able to engage in self-monitoring the development of their own works. Shifting practice in this direction requires a substantially different alignment of pedagogical priorities and processes. It also requires specific strategies to overcome traditions and sources of potential resistance. The aim is to turn the processes of making and explaining holistic judgments into positive enablers for student learning. Acknowledgment I am grateful to Gordon Joughin for his constant support and critical readings of different versions of this chapter during its development. His many suggestions for improvement have been invaluable.
References Bloxham, S., & West, A. (2004). Understanding the rules of the game: marking peer assessment as a medium for developing students’ conceptions of assessment. Assessment & Evaluation in Higher Education, 29, 712–733. Braddock, R., Lloyd-Jones, R., & Schoer, L. (1963). Research in written composition. Urbana, Ill.: National Council of Teachers of English. Burke, E. (1759). A philosophical enquiry into the origin of our ideas of the sublime and beautiful, 2nd ed. London: Dodsley. (Facsimile edition 1971. New York: Garland). Chi, M. T. H., Glaser, R., & Farr, M. J. (Eds.), (1988). The nature of expertise. Hillsdale, NJ: Lawrence Erlbaum. Dewey, J. (1939). Theory of valuation. (International Encyclopedia of Unified Science, 2 (4)). Chicago: University of Chicago Press. Ericsson, K. A., & Smith, J. (Eds.), (1991). Toward a general theory of expertise: Prospects and limits. New York: Cambridge University Press. Freeman, R., & Lewis, R. (1998). Planning and implementing assessment. London: Kogan Page. Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting the focus from teaching to learning. Needham Heights, Mass: Allyn & Bacon. Lloyd-Jones, R. (1977). Primary trait scoring. In C. R. Cooper & L. Odell (Eds.), Evaluating writing: Describing, measuring, judging. Urbana, Ill.: National Council of Teachers of English. Meehl, P. E. (1996). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence (New Preface). Lanham, Md: Rowan & Littlefield/Jason Aronson. (Original work published 1954).
4 Transforming Holistic Assessment and Grading
63
Morgan, C., Dunn, L., Parry, S., & O’Reilly, M. (2004). The student assessment handbook: New directions in traditional and online assessment. London: RoutledgeFalmer. Orsmond, P., Merry, S., & Reiling, K. (2000). The use of student derived marking criteria in peer and self-assessment. Assessment & Evaluation in Higher Education, 25, 23–38. Polanyi, M. (1962). Personal knowledge. London: Routledge and Kegan Paul. Rust, C., Price, M., & O’Donovan, R. (2003). Improving students’ learning by developing their understanding of assessment criteria and processes. Assessment & Evaluation in Higher Education, 28, 147–164. Sadler, D. R. (1980). Conveying the findings of evaluative inquiry. Educational Evaluation and Policy Analysis, 2(2), 53–57. Sadler, D. R. (1981). Intuitive data processing as a potential source of bias in naturalistic evaluations. Educational Evaluation and Policy Analysis, 3(4), 25–31. Sadler, D. R. (1983). Evaluation and the improvement of academic learning. Journal of Higher Education, 54, 60–79. Sadler, D. R. (1985). The origins and functions of evaluative criteria. Educational Theory, 35, 285–297. Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of Education, 13, 191–209. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education: Principles, Policy & Practice, 5, 77–84. Sadler, D. R. (2002). Ah! . . . So that’s ‘Quality’. In P. Schwartz & G. Webb (Eds.), Assessment: case studies, experience and practice from higher education (pp. 130–136). London: Kogan Page. Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher education. Assessment and Evaluation in Higher Education, 30, 175–194. Stevens, D. D., & Levi, A. J. (2004). Introduction to rubrics: an assessment tool to save grading time, convey effective feedback and promote student learning. Sterling, Va: Stylus Publishing. Suskie, L. (2004). Assessing student learning: A common sense approach. Boston, Mass: Anker Publishing. Walvoord, B. E., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment. Etobicoke, Ontario: John Wiley. Woolf, H. (2004). Assessment criteria: Reflections of current practices. Assessment & Evaluation in Higher Education, 29, 51–493.
Chapter 5
Faulty Signals? Inadequacies of Grading Systems and a Possible Response Mantz Yorke
Grades are Signals Criticism of grading is not new. Milton, Pollio and Eison (1986) made a fairly forceful case that grading was suspect, but to little avail – not surprisingly, since a long-established approach is difficult to dislodge. There is now more evidence that grading does not do what many believe it does (especially in respect of providing trustworthy indexes of student achievement), so the time may be ripe for a further challenge to existing grading practices. Grade is inherently ambiguous as a term, in that it can be used in respect of raw marks or scores, derivatives of raw marks (in the US, for example, the letter grades determined by conversion of raw percentages), and overall indexes of achievement. In this chapter the ambiguity is not entirely avoided: however, an appreciation of the context in which the term ‘grade’ is used should mitigate the problem of multiple meaning. Grades are signals of achievement which serve a number of functions including the following:
informing students of their strengths and weaknesses; informing academic staff of the success or otherwise of their teaching; providing institutions with data that can be used in quality assurance and enhancement; and
providing employers and others with information germane to recruitment. For summative purposes, grading needs to satisfy a number of technical criteria, amongst which validity and reliability are prominent. (For formative purposes, the stringency of the criteria can be relaxed – a matter that will not be pursued here.) A detailed study of grading in Australia, the UK and the US (Yorke, 2008) suggests that the signals from grading are not always clear – indeed, the stance taken in this chapter is that they are generally fuzzy and lack robustness. M. Yorke Department of Educational Research, Lancaster University, LA1 4YD, UK e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_5, Ó Springer ScienceþBusiness Media B.V. 2009
65
66
M. Yorke
Threats to the Robustness of Grading Felton and Koper (2005, p. 562) refer to grades as ‘‘inherently ambiguous evaluations of performance with no absolute connection to educational achievement’’. This chapter argues that the charge holds, save in a relatively small number of circumstances where the intended learning and the assessment process can be very tightly specified (some computer-marked work falls into this category) – and even then there might be argument about the validity of the sampling from the universe of possible learning outcomes. What is the evidence for the charge?
Sampling Ideally, assessment is based on a representative sampling of content reflecting the expected learning outcomes for the curriculum. The word representative indicates that not every aspect of the learning outcomes can be accommodated – something that dawned rather slowly on those responsible for the system of National Vocational Qualifications that was promoted in the early 1990s in the UK, and which sought comprehensiveness of coverage (see Jessup, 1991). Sampling from the curriculum reflects the preferences of interested parties – of the institution and, in some instances, of stakeholders from outside. The validity of the assessment depends on the extent to which the assessment process captures the achievements expected in the curriculum – a perspective based on internal consistency. Interested parties from outside the academy are likely to be more interested in the extent to which the assessment outcomes predict performance in, say, the workplace, which is a very different matter. With increased interest by governments in the ability of graduates to demonstrate their effectiveness in the world beyond academe, the assessment of workbased and work-related learning is of increasing importance. However, such assessment raises a number of challenges, some of which are discussed later in this chapter: for now, all that is necessary is to note the complexity that this introduces in respect of sampling and validity.
Grading Scales There is considerable variation in grading scales, from the finely-grained percentage scale to the quite coarse scales of grade-points used in some Australian universities. Where percentages are used, there seem to be norms relating to both national and disciplinary levels. Regarding the national level, there is a broad gradation from the high percentages typically awarded in US institutions to the more modest percentages typical of practice in the UK, with Australian practice lying in between. In the UK and Australia, the raw percentage is
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
67
typically used in classifying performances whereas in the US percentage scores are converted into grade-points for cumulation into the grade-point average (GPA). Studies undertaken by the Student Assessment and Classification Working Group (SACWG) have drawn attention to the variation between subjects regarding the distribution of percentages: science-based subjects tend to have wider and flatter distributions than those in the humanities and social sciences, for example1. Figure 5.1, which is based on marks in first-year modules
Statistics
Business 12
12
10
10
8
8
6
6
4
4
2
2
0
0 1
11
21
31
41
51
61
71
81
91
1
11
21
31
Sociology 12
10
10
8
8
6
6
4
4
2
2
0 11
21
31
41
51
51
61
71
81
91
71
81
91
71
81
91
Healthcare
12
1
41
61
71
81
91
0 1
11
21
Modern History 12
12
10
10
8
8
6
6
4
4
2
2
31
26%
41
51
61
Contract Law
0
0 1
11
21
31
41
51
61
71
81
91
1
11
21
31
41
51
61
Fig. 5.1 Mark distributions for six first-year modules Notes: The heights of the histogram bars are percentage frequencies, in order of facilitate comparisons. Vertical dotted lines indicate the median mark in each case. In the Contract Law module a bare pass (40%) was awarded to 26 per cent of the student cohort.
1
It is instructive to look at the statistics produced by the Higher Education Statistics Agency [HESA] regarding the profiles of honours degree classifications for different subject areas in the UK (see raw data for ‘Qualifications obtained’ at www.hesa.ac.uk/holisdocs/pubinfo/stud.htm or, more revealingly, the expression of these data in percentage format in Yorke, 2008, p. 118).
68
M. Yorke
in a post-1992 university in the UK, illustrates something of the variability that can exist in the distributions of marks awarded. The three modules on the left-hand side of Fig. 5.1 evidence broadly similar marking distributions, but the three on the right-hand side are quite different (though all show some indication of the significance of the bare pass percentage of 40 for grading practice). The distribution for Statistics uses almost the full range of percentages, no doubt because it is easy to determine what is correct and what is incorrect. In social sciences, matters are much less likely to be cut and dried. The Healthcare module has attracted high marks (the influence of a nurturing approach, perhaps?). The very large proportion of bare passes in Law suggests that the approach adopted in marking is at some variance with that in other subjects. In passing, it is worth noting that an early study by Yorke et al. (1996) found student marks in Law tending to be lower than for some other subjects – a finding that is reflected in the HESA statistics on honours degree classifications, and is unlikely to be attributable to lower entry standards. Figure 5.1 illustrates a more general point that is well known by those attending examination boards involving multiple subjects: that is, that the modules and subjects chosen by a student can have a significant influence on the grades they achieve. In the US, the conversion of raw percentages to grade-points allows some mitigation of the inter-module and inter-subject variation. One of the problems with percentage grading (which has attracted some comment from external examiners) is the reluctance in subjects in the arts, humanities and social sciences to award very high percentages. The issue is most apparent in modular schemes where examination boards have to deal with performances from a variety of subject areas. Some institutions in the UK have adopted grading scales of moderate length (typically of between 16 and 25 scalepoints) in order to encourage assessors to use the top grades (where appropriate) and hence avoid the psychological ‘set’ of assuming that 100 per cent implies absolute perfection. The issue is, of course, what signifies an excellent performance for a student at the particular stage they have reached in their programme of study. Whilst there is some evidence that scales of moderate length do encourage greater use of the highest grades, it is unclear why this does not carry over into the honours degree classification (Yorke et al., 2002). A factor likely to exert some influence on the distribution of grades is the extent to which assessment is norm- or criterion-referenced.
Norm- and Criterion-Referencing The distinction between norm- and criterion-referenced assessment is often presented as being fairly clear-cut. Conceptually, it is. It is when actual assessment practice is examined that the sharpness of the distinction becomes a blur. An outcomes-based approach to curriculum and pedagogy
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
69
is a modern, softer restatement of the ‘instructional objectives’ approach that was most clearly expressed in the work of Mager (1962) and which led to the promotion of mastery or competence-based learning. Under criterionreferencing, every student could achieve the stated objectives to a high standard: the consequence would be that grades could bunch at the top end of the scale. The contrast with a grading system that judges a student’s achievement against those of their peers, rather than in absolute terms, is clear. There should be no surprise when norm-referencing and criterionreferencing produce radically different distributions of grades. However, the ‘upper-ended’ distributions possible under criterion-referencing do not fit with sedimented expectations of a normal distribution of grades, and can lead to unwarranted accusations of grade inflation. The blurring of the boundary between norm- and criterion-referencing has a number of possible causes, the first two of which are implicitly illustrated in Exhibit 5.1 below:
Criteria are often stated loosely. Criteria may be combined in variable ways. Stated criterion levels can be exceeded to varying extents by students. A grade distribution skewed towards the upper end may be seen as implausible and hence may be scaled back to something resembling a normreferenced distribution (for example, this can happen in the US when a raw percentage is converted into a letter grade). Assessors’ grading behaviour is tacitly influenced by norm-referencing. Different subject disciplines (or components of disciplines) adopt different approaches to grading. Individuals may vary in the approaches they bring to grading (see Ekstrom & Villegas, 1994, and Yorke, Bridges, & Woolf, 2000, for evidence on this point).
Approaches to Grading Some assessors approach grading analytically, specifying components of a performance and building up a total mark from marks awarded for the individual components. The components can be quite tightly specified (particularly in science-based curricula) or more general. Hornby (2003) refers to this approach as ‘menu marking’. Others adopt a holistic approach, judging the overall merit of the work before seeking to ascribe values (numerical or verbal) to the component parts. It is likely that assessment practice is not often as polarised as this, and that it is a matter of tendency rather than absoluteness. One problem is that assessment based on the two polar approaches can give significantly discrepant marks, and there is some evidence that assessors juggle their marking until they arrive at a mark or grade with which they feel comfortable (Baume & Yorke, with Coffey, 2004; Hawe, 2003; Sadler, Chapter 4). Assessors sometimes find that the learning
70
M. Yorke
objectives or expected outcomes specified for the work being marked do not cover everything that they would like to reward (see Sadler, Chapter 4 and, for empirical evidence, Webster, Pepper & Jenkins, 2000, who found assessors using unarticulated criteria when assessing dissertations). Unbounded tasks, such as creative work of varying kinds, are particularly problematic in this respect. Some would like to add ‘bonus marks’ for unspecified or collateral aspects of achievement. Walvoord and Anderson (1998), for example, suggest holding back a ‘‘fudge factor’’ of 10 percent or so that you can award to students whose work shows a major improvement over the semester. Or you may simply announce in the syllabus and orally to the class that you reserve the right to raise a grade when the student’s work shows great improvement over the course of the semester. (p. 99)
This suggestion confuses two purposes that assessment is expected to serve – to record actual achievement and to encourage students. An extension of the general point is the reluctance to fail students, often because the assessor wants to give the student another chance to demonstrate that they really can fulfil the expectations of the course (see, for example, Brandon & Davies, 1979; Hawe, 2003). This may be more prevalent in subject areas relating to public service (teaching, nursing and social work, for example) in which there tends to be an underlying philosophy of nurturing, and in which an espoused commitment to public service is regarded as a positive attribute. Assessors’ approaches to grading are also influenced by the circumstances of the assessment, such as the number of items to be marked, the time available, whether all the submitted work is marked in one timeslot or spread out over a number of slots, personal fatigue, and so on.
The Relationship Between Grade and Meaning The relationship between the grade and its meaning is unclear. In general, a mark or grade summarises a multidimensional performance, as Exhibit 5.1 indicates.
First Class It is recognised in all marking schemes that there are several different ways of obtaining a first class mark. First class answers are ones that are exceptionally good for an undergraduate, and which excel in at least one and probably several of the following criteria:
comprehensive and accurate coverage of area; critical evaluation; clarity of argument and expression; integration of a range of materials; depth of insight into theoretical issues; originality of exposition or treatment.
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
71
Excellence in one or more of these areas should be in addition to the qualities expected of an upper second. Exhibit 5.1 An extract from a list of descriptors of levels of student achievement Upper second class Upper second class answers are a little easier to define since there is less variation between them. Such answers are clearly highly competent and a typical one would possess the following qualities:
generally accurate and well-informed; reasonably comprehensive; well-organised and structured; displaying some evidence of general reading; evaluation of material, though these evaluations may be derivative; demonstrating good understanding of the material; clearly presented. Source: Higher Education Quality Council (1997a, p. 27).
It is noticeable that the expansiveness of the criteria tails off as the level of performance decreases, reflecting an excellence minus approach to grading that often appears in lists of this type. The criteria for the lower levels of passing are inflected with negativities: for example, the criteria for third class in this example are all stated in terms of inadequacy, such as ‘‘misses key points of information’’. An outsider looking at these criteria could be forgiven for assuming that third class was equivalent to failure. In contrast, what might a threshold plus approach look like? Performances tend not to fit neatly into the prescribed ‘boxes’ of criteria, often exceeding the expectations established in terms of some criteria and failing to reach others. Whilst labelling some criteria as essential and others as desirable helps, it does not solve all the problems of assigning a grade to a piece of work. Some have thought to apply fuzzy set theory to assessment, on the grounds that this allows a piece of work to be treated as having membership, with differing levels of intensity, of the various grading levels available (see relatively small scale research studies by Biswas, 1995; and Echauz & Vachtsevanos, 1995, for example). Whilst this has conceptual attractiveness, the implementation of the approach in practical settings offers challenges that are very unlikely to be met. A further issue has echoes of Humpty Dumpty’s claim regarding language in Alice’s Adventures in Wonderland – words varying in meaning according to the user. Webster et al. (2000) found evidence of the different meanings given to terms in general assessment usage, exemplifying the variation in academics’ use of analysis and evaluating. There is a further point which has an obvious relevance to student learning – that the recipients of the words may not appreciate the meaning that is intended by an assessor (e.g., Chanock, 2000). If one of the aims of higher education is to assist students to internalise standards, feedback comments may not be as successful as the assessor may imagine.
72
M. Yorke
More generally, a number of writers (e.g. Sadler, 1987, 2005; Tan & Prosser, 2004; Webster et al., 2000; Woolf, 2004) have pointed to the fuzziness inherent in constructs relating to criteria and standards, which suggests that, collectively, conceptions of standards of achievement may be less secure than many would prefer.
The Nature of the Achievement A grade by itself says nothing about the kind of performance for which it was awarded – an examination, an essay-type assignment, a presentation, and so on. Bridges et al. (2002) presented evidence to support Elton’s (1998) contention that rising grades in the UK were to some extent associated with a shift in curricular demand from examinations towards coursework. Further, as Knight and Yorke (2003, pp. 70–1) observe, a grade tells nothing of the conditions under which the student’s performance was achieved. For some, the performance might have been structured and ‘scaffolded’ in such a way that the student was given a lot of guidance as to how to approach the task. Other students might have achieved a similar level of achievement, but without much in the way of support. Without an appreciation of the circumstances under which the performance was achieved, the meaning – and hence the predictive validity – of the grade is problematic. The problem for the receiver of the signal is that they have little or no information about how the grading was performed, and hence are unable to do other than come to an appreciation of a student’s performance that is influenced by their own – possibly faulty – presuppositions about grading.
The Cumulation of Grades Many set considerable store by the overall grade attained by a student, whether in the form of a GPA or an honours degree classification. As noted above, what is often overlooked is that the profile of results obtained by a student may contain performances of very different character: for example, in Business Studies there may well be a mixture of essay-type assessments, analyses of case study material, exercises based on quantitative methods, and presentations of group work. Performances can vary considerably according to the nature of the task. The overall grade is often determined by averaging the percentages or grades awarded for the curriculum components that have been taken. This makes the assumption that these scores can be treated as if they were consistent with an interval scale, on which Dalziel (1998) has cast considerable doubt. Some institutions in the UK avoid the metrical difficulties, noted by Dalziel, of averaging or summation by using ‘mapping rules’ for converting the profile of module grades into an overall honours degree classification – for example, by stipulating that a majority of module grades must reach the level
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
73
of the classification under consideration whilst none of the remainder falls below a specified level. In addition to the metrical problems, other factors weaken the validity of the overall index of performance. Astute students can build into their programmes the occasional ‘easy’ module of study (see Rothblatt’s, 1991, p. 136, comment on this), or – in the US – can opt to take some modules on a pass/fail assessment basis, knowing that undifferentiated pass grades are not counted in the computation of GPA. The differences between subjects regarding the awarding of grades were noted earlier. Suppose a student were to take a joint honours programme in computing and sociology, where the former is associated with wide mark spreads and the latter with narrow spreads. Assume that in one subject the student performs well whereas in the other the student produces a middling performance. If they do well in computing, the ‘leverage’ on the overall grade of the computing marks may be sufficient to tip the balance in favour of a higher classification, whereas if the stronger performance is in sociology the leverage of the better performance is may be insufficient to tip the balance in the same way.2 Lastly, institutions vary considerably in the general methodologies they use when cumulating achievements into a single index. Whilst most institutions in the US use the GPA, those in Australia and the UK have varying approaches.3 Below the level of the general methodology, sources of variation include
the modules that ‘count’ in the determining of an overall index of achievement;
rules for the inclusion of marks/grades for retaken modules; weighting of performances at different levels of the programme; and approaches to dealing with claims from students for discretionary treatment because of personal circumstances that may have affected their performance.4
Grade Inflation Grade inflation is widely perceived to be a problem, though the oft-expressed concern has to be tempered by the kind of cool analysis provided by Adelman (forthcoming). Adelman’s analysis of original transcript data suggests that commentators may be allowing themselves to be led by data from elite institutions, whereas data from other institutions (which make up the vast bulk of US 2
If the comparison involves a poor and a middling performance, then correspondingly the subject with the wider spread will exert the stronger effect on the overall classification. 3 See AVCC (2002) in the case of Australia and also Yorke (2008) for details of the UK and a summary of practice in Australia and the US. 4 The relevance of these depends on the system being considered: all do not apply to all systems. Brumfield (2004) demonstrates in some detail the variation within higher education in the US.
74
M. Yorke Table 5.1 Possible causes of rising grades
Causes of grade inflation
Causes of non-inflationary rises in grades
Grading practices
Curriculum design (especially the approach to assessment) Students making ‘strategic’ choices regarding their programmes of study Improved teaching Improved student motivation and/or learning Changes in participation in higher education
Students making ‘strategic’ choices regarding their programmes of study Easing of standards Avoidance of awarding low grades Giving students a ‘helping hand’ Student evaluations of teaching Political and economic considerations at the level of the institution Maintaining an institution’s position vis-a`vis others
higher education) are much less amenable to the charge of grade inflation. A similar conclusion was reached by Yorke (2008) for institutions in England, Wales and Northern Ireland. Whatever the merits of the argument regarding grade inflation, the problem is that those outside the system find difficulty in interpreting the signals that grading sends. Rising grades are not necessarily an indication of grade inflation. Space considerations preclude a detailed discussion of this issue which is treated at length in Yorke (2008). Table 5.1 summarises distinctions between causes of rising grades that may or may not be inflationary, some of which have already been flagged. Note that student choice appears in both columns, depending on the view one takes regarding students’ exercise of choice.
The Political Context This brief critique of grading has, up to this point, largely ignored the context of contemporary higher education. Governments around the world, drawing on human capital theory, emphasise the links between higher education and economic success, although there are differences in the extent to which they press higher education institutions to incorporate an explicit focus on workforce development or employability. Following the Dearing Report (National Committee of Inquiry into Higher Education, 1997) in the UK there has been increased political interest in the preparation of graduates to play a part in the national economic life.5 This can be seen in exhortations to institutions to involve students in work-based learning, and to encourage 5
This is not new, since the point was made in the Robbins Report of 1963 (Committee on Higher Education, 1963).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
75
students to document and reflect on their developing capabilities in the two spheres of academic study and work environments. Whilst institutions have responded, some within them find the direct link with employment to be unpalatable, seeing it as some sort of surrender by higher education to the demands of employers or as part of a narrowly-conceived ‘skills agenda’. However, the Enhancing Student Employability Co-ordination Team in the UK (ESECT) and others have made a specific case for relating employability to the kind of ‘good learning’ that would generally be seen as appropriate in higher education6. There has been some shift of emphasis in many first degree programs in the UK in the direction of achievements that are not disciplinary-specific but, in a lifelong learning perspective, this is arguably of less significance than may be believed (Yorke, 2003).
Rebalancing the Demand This blurring of the distinction between academic study and the workplace implies a rebalancing of the demands on students, especially in respect of bachelors degrees but also in the more overtly work-based foundation degrees7 recently developed in some institutions in the UK. The late Peter Knight referred to the work arena8 as demanding wicked competences which are ‘‘achievements that cannot be neatly pre-specified, take time to develop and resist measurement-based approaches to assessment’’ (Knight, 2007a, p. 2). As Knight pointed out, a number of aspects of employability could be described as wicked competences (see the list in Yorke & Knight, 2004/06, p. 8), as could the components of ‘graduateness’ listed by the Higher Education Quality Council (1997b, p. 86). Whilst the rebalancing of content and process is contentious to some, the greater (but perhaps less recognised) challenge lies in the area of assessment. Assessment of performance in the workplace is particularly demanding since it is typically multidimensional, involving the integration of performances. Eraut (2004) writes, in the context of medical education but clearly with wider relevance: treating [required competences] as separate bundles of knowledge and skills for assessment purposes fails to recognize that complex professional actions require more than several different areas of knowledge and skills. They all have to be integrated together in larger, more complex chunks of behaviour. (p. 804) 6
There is a developing body of literature on this theme, including the collection of resources available on the website of the Higher Education Academy (see www.heacademy.ac.uk/ resources/publications/learningandemployability) and Knight and Yorke (2003, 2004). 7 Two years full time equivalent, with a substantial amount of the programme being based in the workplace (see Higher Education Funding Council for England, 2000). 8 He also pressed the relevance of ‘wicked’ competences to studying in higher education, but this will not be pursued here.
76
M. Yorke
It is perhaps not unreasonable to see in Eraut’s comment an assessmentoriented analogue of the ‘Mode 2’ production of knowledge via multi-disciplinary problem-solving, which is contrasted with ‘Mode 1’ in which problems are approached with reference to the established theoretical and empirical knowledgebase of a single discipline (for an elaboration, see Gibbons et al., 1994). Knight (2007a) investigated perceptions of informants in six subject areas (Accounting; Early Years Teaching; Nursing; Secondary Teaching; Social Work and Youth Work) regarding the following, all of which are characterised by their complexity:
developing supportive relationships; emotional intelligence; group work; listening and assimilating; oral communication; professional subject knowledge; relating to clients; self-management (confidence and effectiveness); and ‘taking it onwards’ – acting on diagnoses (in social work).
Knight was surprised that the general perception of his informants was that these were not difficult to assess. Subsequent interviews with a sub-sample did not allow Knight to rule out the possibility that the methodology he used was inadequate, but the more likely explanation is that the respondents believed (rather unquestioningly) that the assessment methods they used were adequate for their purposes. It is, after all, difficult to place long-established practices under critical scrutiny unless something fairly drastic happens to precipitate action. Assessment of work-based learning cannot be subject to the same degree of control as assessment of discipline-specific studies. Expected learning outcomes for work-based activity can be stated in only general terms since workplace situations vary considerably. Hence there is a requirement for the assessor to exercise professional judgement regarding the student’s performance in the workplace, and the potential tension in tutor-student relationships is acknowledged. The assessment situation may become more fraught if workplace assessment involves a person in authority over the placement student, and especially so if they double up the role of mentor and assessor. One of Knight’s recommendations following his study of wicked competences is that Any interventions to enhance the assessment of ‘wicked’ competences should begin by helping colleagues to appreciate the inadequacies of current practices that are typically – and wrongly – assumed to be ‘good enough’. This is a double challenge for innovators. Not only does assessment practice have to be improved, but colleagues need to be convinced of the need to improve it in the first place. (Knight, 2007a, p. 3)
The argument above regarding the necessity of assessing performances in the workplace on a post hoc basis, but with reference to broad criteria, has a much
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
77
wider applicability. Those involved in assessing creative achievements have long acknowledged the point, with Eisner’s (1979) work on ‘connoisseurship’ being influential. However, when assessments in higher education are analysed, at their heart is generally found some sort of professional judgement. Marks and grades are typically signals of professional judgement, rather than measurements. In agglomerating varied aspects of performance in a single index, such as the mark or grade for a module, or for a collection of modules, they conceal rather than reveal, and are of only limited value in conveying where the student has been particularly successful and where less successful. This poses a problem for the world outside the academy: how might it become better informed as to graduates’ achievements?
What Can be Warranted? Knight (2006) made a powerful case that summative assessments were ‘local’ in character, meaning that they related to the particular circumstances under which the student’s performance was achieved. Hence any warranting by the institution would not necessarily be generalisable (pp. 443–4). He also argued that there were some aspects of performance that the institution would not be in any position to warrant, though it might be able to attest that the student did undertake the relevant activity (for example, a placement).9 Part of the problem of warranting achievement is because of a widely-held, but often unacknowledged, misperception that assessments are tantamount to measurements of student achievement. This is particularly detectable in the insouciant way in which marks or grades are treated as steps on interval scales when overall indices of achievement are being constructed. However, as Knight (2006, p. 438) tartly observes, ‘‘True measurement carries invariant meanings’’. The local character of grades and marks, on his analysis, demolishes any pretence that assessments are measurements. Knight is prepared to concede that, in subjects based on science or mathematics, it is possible to determine with accuracy some aspects of student performance (the kinds of achievement at the lower end of the Bloom, 1956; and Anderson & Krathwohl, 2001, taxonomies, for instance). However, as soon as these achievements become applied to real-life problems, wicked competences enter the frame, and the assessment of achievement moves from what Knight characterises as ‘quasi-measurement’ to judgement. Where the subject matter is less susceptible to quasi-measurement, judgement is perforce to the fore, even though there may be some weak quasi-measurement based on marking schemes or weighted criteria. 9
See also Knight and Yorke (2003, p. 56) on what can be warranted by an institution, and what not.
78
M. Yorke
Knight’s (2007a, 2007b) line regarding summative assessment is basically that, because of the difficulties in warranting achievement in a manner that allows the warrant some transfer-value to different contexts, attention should be concentrated on the learning environment of the student. A learning environment supportive of the development of wicked competences could be accorded greater value by employers than one that was less supportive. Knight’s argument is, at root, probabilistic, focusing on the chances of students developing the desired competences in the light of knowledge of the learning environment.
A Summary of the Position so Far 1. Marks and grades are, for a variety of reasons, fuzzier signals than many believe them to be. 2. They signal judgements more than they act as measures. 3. Whilst judgements may have validity, they will often lack in precision. 4. Judgements may be the best that assessors can achieve in practical situations. 5. If assessments generally cannot be precise, then it is necessary to rethink the approach to summative assessment.
Imagining an Alternative The Honours Degree Classification in the UK Over the past couple of years, debate has taken place in the UK about the utility of the honours classification system that has long been in use for bachelors degrees (Universities UK & GuildHE, 2006; Universities UK & Standing Conference of Principals, 2004, 2005). This has led to suggestions that the honours degree classification is no longer appropriate to a massified system of higher education, and that it might be replaced by a pass/fail dichotomisation, with the addition of a transcript of the student’s achievement in curricular components (this is broadly similar to the Diploma Supplement adopted across the European Union).10 However, the variously constituted groups considering the matter have not concentrated their attention on the assessments that are cumulated into the classification.
10 ‘Pass/not pass’ is probably a better distinction, since students who do not gain the required number of credits for an honours degree would in all probability have gained a lesser number of credits (which is a more positive outcome than would be signified by ‘fail’) There has subsequently been a retreat from the conviction that the classification was no longer fit for purpose (see Universities UK & GuildHE, 2007).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
79
Abandoning Single Indexes of Achievement Imagine that there were a commission of inquiry into summative assessment practice which concluded that the kinds of summative assessment typically used in higher education were for various reasons11 insufficiently robust for the purposes to which they are put and, further, that increases in robustness could not be obtained without a level of cost that would be prohibitive. In the light of this hypothetical review, it would be readily apparent that not much would be gained by tinkering with existing assessment methodologies and that a radically different approach might offer the prospect of progress. Whilst Knight’s (2007a, 2007b) line on assessment – that the focus should be on the learning context rather than straining at the gnat of warranting individuals’ achievements – makes sense at the conceptual level, its adoption would probably not satisfy external stakeholders. Unless there were a close link between stakeholder and institution (likely in only some specific cases), the stakeholders may be ill-informed about the learning environment and hence revert to using the public reputation of the institution as the criterion, thereby unquestioningly reinforcing the reputational status quo. In practice, they are likely to call for something ‘closer to the action’ of student performance. Three sources of information about student performance are
assessments of various kinds that have been undertaken within the institution;
judgements made by assessors regarding workplace performance (the assessors might come from the institution and/or the workplace); and
information from students themselves regarding their achievements, backed up by evidence. Whilst all of these will be fallible in their different ways, it may be possible for an employer (say) to develop an appreciation of an applicant’s strengths and weaknesses through qualitative triangulation. This would be a more costly and time-consuming process than making an initial sifting based on simplistic parameters such as overall grading and/or institution attended, followed by a finer sifting and interview. Hence the proposition might be given an abrupt rejection on the grounds of impracticability. However, those with a wider perspective might take a different view, seeing that the costs could be offset in two ways:
the employer would probably develop a more appropriate shortlist from which to make the final selection, and
the chances of making an (expensive) inappropriate selection would be decreased. A couple of other considerations need to be mentioned. First, professional body accreditation might be an issue. However, if the profile of assessments 11
Some of these have been mentioned above. A fuller analysis can be found in Yorke (2008).
80
M. Yorke
(both in curricular design and student achievement) were demonstrably met, then accreditation should in theory not be a difficulty. Second, with globalisation leading to greater mobility of students, the transfer-value of achievements could be a problem. However, there are problems already. National systems have very varied approaches to grading (Karran, 2005). The European Credit Transfer and Accumulation System (ECTS) bases comparability on norm-referencing whereas some institutions adopt a predominantly criterion-referenced approach to assessment. Further, international conversion tables prepared by World Education Services may mislead12 (as an aside, a passing grade of D from the US does not figure in these tables). It might make more sense, for the purpose of international transfer, to adopt a pass/not pass approach supported by transcript evidence of the person’s particular strengths instead of trying to align grade levels from different systems.
Claims-Making Following Recommendation 20 of the Dearing Report (National Committee of Inquiry into Higher Education, 1997, p. 141), students in the UK are expected to undertake personal development planning [PDP] and to build up a portfolio of achievements to which they can refer when applying for jobs. At present, uptake has been patchy, partly because academics and students tend to see this as ‘just another chore’, and because the latter see no direct relationship to learning and assessment, and hence the benefit is seen as slender when compared with the effort required. (One student, responding to a survey of the first year experience in the UK wrote: ‘‘I . . . felt the PDP compulsory meetings were a total waste of time – sorry!’’) If, however, students were required to make a claim for their award, rather than have the award determined by some computational algorithm, PDP would gain in potency. The requirement would encourage the metacognitive activities of reflection and self-monitoring (see Sadler, Chapter 4). Claims-making would also restore to summative assessment a programme-wide perspective on learning and achievement that has tended to get lost in the unitisation of modular schemes in the UK. Requiring students to claim for their award would, in effect, ask the student to answer the question: ‘‘How have you satisfied, through your work, the aims stated for your particular programme of study?’’ (Yorke, 1998, p. 181)13. The 12
See the tables available by navigating from www.wes.org/gradeconversionguide/ (retrieved August 8, 2007). Haug (1997) argues, rather as Knight (2006) does in respect of ‘local’ assessment, that a grade has to be understood in the context of the original assessment system and the understanding has to be carried into the receiving system. This requires more than a ‘reading off’ of a grade awarded on a scale in one system against the scale of the other, since simplistic mathematical conversions may mislead. 13 The argument is elaborated in Knight and Yorke (2003, pp. 159ff).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
81
multidimensionality of ‘graduateness’ as depicted by the Higher Education Quality Council (1997b) and in the literature on employability suggests that students from the same cohort might make quite different cases for their award whilst fulfilling the broad expectations set out for it. For example, one might centre the claim on a developed capacity to relate the disciplinary content to practical situations whereas another might opt to make a case based on high levels of academic achievement. A student’s claim could be required to consist of not only their record of achievements in curricular components,14 but also to incorporate evidence from learning experiences such as those generated through work placement. Wicked competences could be more clearly brought into the picture. The preparation of a claim would assist the student in making applications for jobs (as most will want to do), and the institution in preparing supporting references. When the claim is used prospectively, as in applying for a job, relevant extra-curricular experience could also be brought into play since this could indicate to a potential employer some attributes and achievements that it might value but which are not highlighted in the higher education experience. The claims-making approach is not limited to students who enter higher education straight from school, since it can be adapted to the needs of older students who bring greater life-experience to their studies. A mind-set on graduates aged around 21 needs to be avoided, particularly in the UK since demographic data suggest a fairly sharp decline in the number of people aged 18 from the year 2011 (Bekhradnia, 2006), and hence imply the possibility of a system-wide entry cohort of rising age, with a greater number bringing to their studies experience of employment that is more than casual in character. One point (similar to that which Sadler makes in Chapter 4) needs to be made here – that the formalisation of claims-making would need to be no more onerous in toto than the assessment approach it replaced. The claims procedure would need to be streamlined so that assessors were not required, unless there were a particular reason to do so, to work through a heap of evidence in a personal portfolio. Examination boards might require less time to conduct their business, and the role of external examiners (where these exist) would need to be reviewed.
Claims-Making and Learning It is only now that this tributary of argument joins the main flow of this book. Summative assessment is very often focused on telling students how well they have achieved in respect of curricular intentions – in effect, this is a judgement from on high, usually summarised in a grade or profile of grades. 14
In the interests of making progress, a number of quite significant reservations regarding the robustness of summative assessment of academic achievement would need to be set aside.
82
M. Yorke
Claims-making would bring students more into the picture, in that they would have to consider, on an ongoing basis and at the end of their programmes, the range of their achievements (some – perhaps many – of which will as a matter of course be grades awarded by academics). They would have to reflect on what they have learned (or not), the levels of achievement that they have reached, and on what these might imply for their futures. Knowing that an activity of this sort was ‘on the curricular agenda’ would prompt the exercise of reflectiveness and self-regulation which are of enduring value. In a context of lifelong learning, would not the involvement of students in claims-making be more advantageous to all involved, than students merely being recipients of ex cathedra summative judgements?
References Adelman, C. (forthcoming). Undergraduate grades: A more complex story than ‘inflation’. In L. H. Hunt (Ed.), Grade inflation and academic standards. Albany: State University of New York Press. Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching and assessment. New York: Addison Wesley Longman. AVCC [Australian Vice Chancellors’ Committee]. (2002). Grades for honours programs (concurrent with pass degree), 2002. Retrieved October 10, 2007, from http://www.avcc.edu.au/ documents/universities/key_survey_summaries/Grades_for_Degree_Subjects _Jun02.xls Baume, D., & Yorke, M., with Coffey, M. (2004). What is happening when we assess, and how can we use our understanding of this to improve assessment? Assessment and Evaluation in Higher Education, 29(4), 451–77. Bekhradnia, B. (2006). Demand for higher education to 2020. Retrieved October 14, 2007, from http://www.hepi.ac.uk/downloads/22DemandforHEto2020.pdf Biswas, R. (1995). An application of fuzzy sets in students’ evaluation. Fuzzy sets and systems, 74(2), 187–94. Bloom, B. S. (1956). Taxonomy of educational objectives, Handbook 1: Cognitive domain. London: Longman. Brandon, J., & Davies, M. (1979). The limits of competence in social work: The assessment of marginal work in social work education. British Journal of Social Work, 9(3), 295–347. Bridges, P., Cooper, A., Evanson, P., Haines, C., Jenkins, D., Scurry, D., Woolf, H., & Yorke, M. (2002). Coursework marks high, examination marks low: Discuss. Assessment and Evaluation in Higher Education, 27(1), 35–48. Brumfield, C. (2004). Current trends in grades and grading practices in higher education: Results of the 2004 AACRAO survey. Washington, DC: American Association of Collegiate Registrars and Admissions Officers. Chanock, K. (2000). Comments on essays: Do students understand what tutors write? Teaching in Higher Education, 5(1), 95–105. Committee on Higher Education. (1963). Higher education [Report of the Committee appointed by the Prime Minister under the chairmanship of Lord Robbins, 1961–63]. London: Her Majesty’s Stationery Office. Dalziel, J. (1998). Using marks to assess student performance: Some problems and alternatives. Assessment and Evaluation in Higher Education, 23(4), 351–66. Echauz, J. R., & Vachtsevanos, G. J. (1995). Fuzzy grading system. IEEE Transactions on Education, 38(2), 158–65.
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response
83
Eisner, E. W. (1979). The educational imagination: On the design and evaluation of school programs. New York: Macmillan. Ekstrom, R. B., & Villegas, A. M. (1994). College grades: An exploratory study of policies and practices. New York: College Entrance Examination Board. Elton, L. (1998). Are UK degree standards going up, down or sideways? Studies in Higher Education, 23(1), 35–42. Eraut, M. (2004). A wider perspective on assessment. Medical Education, 38(8), 803–4. Felton, J., & Koper, P. T. (2005). Nominal GPA and real GPA: A simple adjustment that compensates for grade inflation. Assessment and Evaluation in Higher Education, 30(6), 561–69. Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1994). The new production of knowledge: The dynamics of science and research in contemporary societies. London: Sage. Haug, G. (1997). Capturing the message conveyed by grades: Interpreting foreign grades. World Education News and Reviews, 10(2), 12–17. Hawe, E. (2003). ‘It’s pretty difficult to fail’: The reluctance of lecturers to award a failing grade. Assessment and Evaluation in Higher Education, 28(4), 371–82. Higher Education Funding Council for England. (2000). Foundation degree prospectus. Bristol: Higher Education Funding Council for England. Retrieved October 14, 2007, from http://www.hefce.ac.uk/pubs/hefce/2000/00_27.pdf Higher Education Quality Council. (1997a). Assessment in higher education and the role of ‘graduateness’. London: Higher Education Quality Council. Higher Education Quality Council. (1997b). Graduate standards programme: Final report (2 vols). London: Higher Education Quality Council. Hornby, W. (2003). Assessing using grade-related criteria: A single currency for universities? Assessment and Evaluation in Higher Education, 28(4), 435–54. Jessup, G. (1991). Outcomes: NVQs and the emerging model of education and training. London: Falmer. Karran, T. (2005). Pan-European grading scales: Lessons from national systems and the ECTS. Higher Education in Europe, 30(1), 5–22. Knight, P. (2006). The local practices of assessment. Assessment and Evaluation in Higher Education, 31(4), 435–52. Knight, P. (2007a). Fostering and assessing ‘wicked’ competences. Retrieved October 10, 2007, from http://www.open.ac.uk/cetl-workspace/cetlcontent/documents/460d1d1481d0f.pdf Knight, P. T. (2007b). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer term (pp. 72–86). Abingdon, UK: Routledge. Knight, P. T., & Yorke, M. (2003). Assessment, learning and employability. Maidenhead, UK: Society for Research in Higher Education and the Open University Press. Knight, P., & Yorke, M. (2004). Learning, curriculum and employability in higher education. London: RoutledgeFalmer. Mager, R.F. (1962). Preparing objectives for programmed instruction. Belmont, CA: Fearon. Milton, O., Pollio, H. R., & Eison, J. (1986). Making sense of college grades. San Francisco, CA: Jossey-Bass. National Committee of Inquiry into Higher Education (1997). Higher education in the learning society. NICHE Publications, Middlesex. Rothblatt, S. (1991). The American modular system. In R. O. Berdahl, G., C. Moodie, & I. J. Spitzberg, Jr. (Eds.), Quality and access in higher education: Comparing Britain and the United States (pp. 129–141). Buckingham, England: SRHE and Open University Press. Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of Education, 13(2), 191–209. Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher education. Assessment and Evaluation in Higher Education, 30(2), 176–94.
84
M. Yorke
Tan, K. H. K., & Prosser, M. (2004). Qualitatively different ways of differentiating student achievement: A phenomenographic study of academics’ conceptions of grade descriptors. Assessment and Evaluation in Higher Education, 29(3), 267–81. Universities UK & GuildHE. (2006). The UK honours degree: provision of information – second consultation. London: Universities UK & GuildHE. Retrieved October 6, 2007, from http://www.universitiesuk.ac.uk/consultations/universitiesuk/ Universities UK & GuildHE. (2007). Beyond the honours degree classification: The Burgess Group final report. London: Universities UK and GuildHE. Retrieved 17 October, 2007, from http://bookshop.universitiesuk.ac.uk/downloads/Burgess_final.pdf Universities UK & Standing Conference of Principals. (2004). Measuring and recording student achievement. London: Universities UK & Standing Conference of Principals. Retrieved October 5, 2007, from http://bookshop.universitiesuk.ac.uk/downloads/ measuringachievement.pdf Universities UK & Standing Conference of Principals. (2005). The UK honours degree: provision of information. London: Universities UK & Standing Conference of Principals. Retrieved October 5, 2007, from http://www.universitiesuk.ac.uk/consultations/universitiesuk/ Walvoord, B. E., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment. San Francisco: Jossey-Bass. Webster, F., Pepper, D., & Jenkins, A. (2000). Assessing the undergraduate dissertation. Assessment and Evaluation in Higher Education, 25(1), 71–80. Woolf, H. (2004). Assessment criteria: Reflections on current practices. Assessment and Evaluation in Higher Education, 29(4), 479–93. Yorke, M. (1998). Assessing capability. In J. Stephenson & M. Yorke (Eds.), Capability and quality in higher education (pp. 174–191). London: Kogan Page. Yorke, M. (2003). Going with the flow? First cycle higher education in a lifelong learning context. Tertiary Education and Management, 9(2), 117–30. Yorke, M. (2008). Grading student achievement: Signals and shortcomings. Abingdon, UK: RoutledgeFalmer. Yorke, M., Barnett, G., Bridges, P., Evanson, P., Haines, C., Jenkins, D., Knight, P., Scurry, D., Stowell, M., & Woolf, H. (2002). Does grading method influence honours degree classification? Assessment and Evaluation in Higher Education, 27(3), 269–79. Yorke, M., Bridges, P., & Woolf, H. (2000). Mark distributions and marking practices in UK higher education. Active Learning in Higher Education, 1(1), 7–27. Yorke, M., Cooper, A., Fox, W., Haines, C., McHugh, P., Turner, D., & Woolf, H. (1996). Module mark distributions in eight subject areas and some issues they raise. In N. Jackson (Ed.), Modular higher education in the UK (pp. 105–7). London: Higher Education Quality Council. Yorke, M., & Knight, P. T. (2004/06). Embedding employability into the curriculum. York, England: The Higher Education Academy. Retrieved October 14, 2007, from http://www. heacademy.ac.uk/assets/York/documents/ourwork/tla/employability/id460_embedding_ employability_into_the_curriculum_338.pdf
Chapter 6
The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects Filip Dochy
Introduction Assessment has played a crucial role in education and training since formal education commenced. Certainly, assessment of learning has been seen as the cornerstone of the learning process since it reveals whether the learning process results in success or not. For many decades, teachers, trainers and assessment institutes were the only partners seen as crucial in the assessment event. Students were seen as subjects who were to be tested without having any influence on any other aspect of the assessment process. Recently, several authors have called our attention to what is often termed ‘new modes of assessment’ and ‘assessment for learning’. They stress that assessment can be used as a means to reinforce learning, to drive learning and to support learning – preferably when assessment is not perceived by students as a threat, an event they have to fear, the sword of Damocles. These authors also emphasize that the way we assess students should be congruent with the way we teach and the way students learn within a specific learning environment. As such, the ‘new assessment culture’ makes a plea for integrating instruction and assessment. Some go even further: students can play a role in the construction of assessment tasks, the development of assessment criteria, and the scoring of performance can be shared amongst students and teachers. New modes of assessment that arise from such thinking are, for example, 908 or 1808 feedback, writing samples, exhibitions, portfolio assessments, peer- and co-assessment, project and product assessments, observations, text- and curriculum-embedded questions, interviews, and performance assessments. It is widely accepted that these new modes of assessment lead to a number of benefits in terms of the learning process: encouraging thinking, increasing learning and increasing students’ confidence (Falchikov, 1986, 1995).
F. Dochy Centre for Educational Research on Lifelong Learning and Participation, Centre for Research on Teaching and Training, University of Leuven, Leuven, Belgium e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_6, Ó Springer ScienceþBusiness Media B.V. 2009
85
86
F. Dochy
However, the scientific measurement perspectives of both teachers and researchers, stemming from the earlier, highly consistent framework, where assessment was separate from instruction and needed to be uniformly administered, can form a serious hindrance to a wider introduction of these new assessment methods. As Shepard (1991) and Segers, Dochy and Cascallar (2003) indicated, instruction derives from the emergent constructivist paradigm, while testing has its roots in older paradigms. So, we argue, the traditional criteria for evaluating the quality of assessment need to be critically revised. The question that needs to be asked is whether we can maintain traditional concepts such as reliability and validity, or if these concepts need to be considered more broadly in harmony with the development of the new assessment contexts. In this chapter, attention is paid to the new evolution within assessment and especially to the consequences of this new approach to screening the edumetric quality of educational assessment.
About Testing: What has Gone Wrong? A quick review of the research findings related to traditional testing and its effects shows the following. First of all, a focus on small scale classroom assessment is needed instead of a focus on large scale high stakes testing. Gulliksen (1985), after a long career in measurement, stated that this differentiation is essential: ‘‘I am beginning to believe that the failure to make this distinction is responsible for there having been no improvement, and perhaps even a decline, in the quality of teachermade classroom tests. . .’’ (p. 4). Many investigations have now pointed to the massive disadvantages of large scale testing (Amrein & Berliner, 2002; Rigsby & DeMulder, 2003): students learn to answer the questions better, but may not learn more; a lot of time goes into preparing such tests; learning experiences are reduced; it leads to de-professionalisation of teachers; changes in content are not accompanied by changes in teaching strategies; teachers do not reflect on their teaching to develop more effective practices; effort and motivation of teachers are decreasing; teachers question whether this is a fair assessment for all students; and the standards don’t make sense to committed teachers or skilled practitioners. Such external tests also influence teachers’ assessments. They often emulate large scale tests on the assumption that this represents good assessment practice. As a consequence, the effect of feedback is to teach the weaker student that he lacks ability, so looses confidence in his own learning (Black & Wiliam, 1998). One of my British colleagues insinuated: ‘‘A-levels are made for those who can’t reach it, so they realise how stupid they are’’. In the past decade, research evidence has shown that the use of summative tests squeezes out assessment for learning and has a negative impact on motivation for learning. Moreover, the latter effect is greater for the less successful
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
87
students and widens the gap between high and low achievers (Harlen & Deakin Crick, 2003; Leonard & Davey, 2001). Our research also shows convincingly that students’ perceptions of the causes of success and failure are of central importance in the development of motivation for learning (Struyven, Dochy, & Janssens, 2005; Struyven, Gielen, & Dochy, 2003). Moreover, too much summative testing does not only affect students’ motivation, but also that of their teachers. High stakes tests result in educational activities directed towards the content of the tests. As a consequence, the diversity of learning experiences for students is reduced and teachers use a small range of instructional strategies. The latter leads teachers to deprofessionalization (Rigsby & DeMulder, 2003). Firestone and Mayrowitz (2000) state: ‘‘What was missing . . . was the structures and opportunities to help teachers reflect on their teaching and develop more effective practices’’ (p. 745). Recently, the finding that assessment steers learning has gained a lot of attention within educational research (Dochy, 2005; Dochy, Segers, Gijbels, & Struyven, 2006; Segers et al., 2003). But motivation has also been investigated as one of the factors that is in many cases strongly influenced by the assessment or the assessment system being used.
Characteristics of New Assessment Modes The current assessment culture can be characterized as follows (Dochy, 2001; Dochy & Gijbels, 2006). There is a strong emphasis on the integration of assessment and instruction. Many assessment specialists take the position that appropriately used educational assessments can be seen as tools that enhance the instructional process. Additionally, there is strong support for representing assessment as a tool for learning. The position of the student is that of an active participant who shares responsibility in the process, practices self assessment, reflection and collaboration, and conducts a continuous dialogue with the teacher. Students participate in the development of the criteria and the standards for evaluating their performance. Both the product and process are being assessed. The assessment takes many forms, all of which could generally be referred to as unstandardised assessments embedded in instruction. There is often no time pressure, and a variety of tools that are used in real life for performing similar tasks are permitted. The assessment tasks are often interesting, meaningful, authentic, challenging and engaging, involving investigations of various kinds. Students also sometimes document their reflections in a journal and use portfolios to keep track of their academic/vocational growth. Reporting practices shift from a single score to a profile, i.e. from quantification to a portrayal (Birenbaum, 1996). New assessment modes such as observations, text- and curriculum-embedded questions, interviews, over-all tests, simulations, performance assessments,
88
F. Dochy
writing samples, exhibitions, portfolio assessment, product assessments, and modes of peer-and co-assessment have been investigated increasingly in recent years (Birenbaum & Dochy, 1996; Dochy & Gijbels, 2006; Segers et al., 2003; Topping, 1998) and a set of criteria for new assessment practices has been formulated (Birenbaum, 1996; Feltovich, Spiro, & Coulson, 1993; Glaser, 1990; Shavelson, 1994). Generally, five characteristics are shared amongst these new assessments. Firstly, good assessment requires that students construct knowledge (rather than reproduce it). The coherence of knowledge, its structure and interrelations are targets for assessment. Secondly, the assessment of the application of knowledge to actual cases is the core goal of these so-called innovative assessment practices. This means assessing the extent to which students are able to apply knowledge to solve real life problems and take appropriate decisions. Thirdly, good assessment instruments ask for multiple perspectives and context sensitivity. Students not only need to know ‘what’ but also ‘when’, ‘where’ and ‘why’. This implies that statements as answers are not enough; students need to have insight into underlying causal mechanisms. Fourthly, students are actively involved in the assessment process. They have an active role in discussing the criteria, in administering the assessment or fulfilling the task and sometimes in acting as a rater for peers or self. Finally, assessments are integrated within the learning process and are congruent with the teaching method and learning environment.
Effects of Assessment on Learning: Pre- and Post-assessment Effects Investigation into the effects of assessment on learning is often summarized as consequential validity (Boud, 1995; Sambell, McDowell, & Brown, 1997). Consequential validity asks what the consequences are of the use of a certain type of assessment on education and on the students’ learning processes, and whether the consequences that are found are the same as the intended effects. The research that explicitly looks into the effects of assessment is now catching up with the research on traditional testing (Askham, 1997; Birenbaum, 1994; Boud, 1990; Crooks, 1988; Dochy & Moerkerke, 1997; Frederiksen, 1984; Gibbs, 1999; Gielen, Dochy, & Dierick, 2007; Gielen, Dochy, et al., 2007; McDowell, 1995; Sambell et al., 1997; Scouller, 1998; Tan, 1992; Thomas & Bain, 1984; Thomson & Falchikov, 1998; Trigwell & Prosser, 1991a, 1991b). The influence of formative assessment is mainly due to the fact that the results are looked back upon after assessment, as well as the learning processes upon which the assessment is based. This makes it possible to adjust learning (postassessment effect). A special form of feedback is that which students provide for themselves, by using metacognitive skills while answering or solving assessment
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
89
tasks. In this case, the student is then capable of drawing conclusions – after or even during assessment – about the quality of his learning behaviour (self-generated or internal feedback). He can then resolve to do something about it. The influence of summative assessment is less obvious since the postassessment effects of summative assessment are often minor. Besides, a student often does not know after the assessment what he did wrong and why. A student who passes is usually not interested in how he performed and in what ways he could possibly improve. The influence of summative assessment on learning behaviour can, instead, be called pro-active. These are preassessment effects. Teachers are often not aware of these effects. There is evidence that these pre-assessment effects on learning behaviour outweigh the post-assessment effects of feedback (Biggs, 1996). An important difference between the pre- and post-assessment effects is that the latter are intentional, whilst the first are more in the nature of side effects, because summative assessment intends to orientate, to select or to certify. Nevo (1995), however, points out the existence of a third effect of assessment on learning. Students also learn during assessment because, at that moment, they often have to reorganize their acquired knowledge and they have to make links and discover relationships between ideas that they had not previously discovered while studying. When assessment incites students to thought processes of a higher cognitive nature, it is possible that assessment becomes a rich learning experience for them. This goes for both formative and summative assessment. We call this the plain assessment effect. This effect can spur students to learn (with respect to content), but it does not really affect learning behaviour, except for the self-generated feedback that we discussed before. A further question that can be put is which elements of assessment could be responsible for these pre- and post-assessment effects: for example, the content of assessment, the type, the frequency, the standards that are used, the content of feedback, and the way the feedback is provided (Crooks, 1988; Gielen, Dochy, & Dierick, 2003, 2007).
The Pre-assessment Effects of Summative Assessment This influence of assessment on learning, which is called the pre-assessment effect, is discussed by several authors who use different terminology. Frederiksen and Collins (1989) discuss systemic validity. The backwash effects are discussed by Biggs (1996), and the feed-forward function by Starren (1998). Boud (1990) refers to an effect of the content of assessment on learning. He states that students are used to focusing on those subjects and assimilation levels that form part of assessment (and thus can bring in marks), to the detriment of the others. They are also encouraged to do this by the
90
F. Dochy
consequences linked to assessment (Biggs, 1996; Elton & Laurillard, 1979; Marton & Sa¨ljo, ¨ 1976; Scouller, 1998; Scouller & Prosser, 1994; Thomas & Bain, 1984). When external tests are used, we see that teachers are highly influenced by them in their instruction – what is called teaching to the test. In theory it is not problematic that instruction and learning focus on assessment (see above), if the range of those tests were not limited to what can be easily tested (reproduction and skills of a lower order). At this point the problematic character of assessment-driven instruction or what Birenbaum calls test-driven instruction becomes clear. Frederiksen (1984) reveals that in many tests the problems to be solved are well-structured, even though the aim of education is (or should be) to prepare pupils and students for real problem solving tasks, which are mostly always ill-structured: . . . Ill-structured problems are not found in standardized achievement tests. If such an item were found, it would immediately be attacked as unfair. However, this would only be unfair if schools do not teach students how to solve ill-structured problems. To make it fair, schools would have to teach the appropriate problem-solving skills. Ability to solve ill-structured problems is just one example of a desirable outcome of education that is not tested and [therefore] not taught. (Frederiksen, 1984, p. 199)
In the literature, pre-assessment effects are also ascribed to the way in which an assessment is carried out. Sometimes, the described negative influence of tests is immediately linked to a certain mode of assessment, namely multiple choice examinations. Nevertheless, it is important to pay attention to the content of assessment, apart from the type, since this test bias can also occur with other types of assessment. The frequency of assessment can also influence learning behaviour. Tan (1992) revealed that the learning environment’s influence on learning behaviour can be so strong that it cuts across the students’ intentions. Tan’s research reveals that the majority of students turn to a superficial learning strategy in a system of continued summative assessment. They did not do this because they preferred it and had the intention to do it, but because the situation gave them incentives to do so. As a result of the incongruity between learning behaviour and intention, it turned out that many students were really dissatisfied with the way in which they learnt. They had the feeling that they did not have time to study thoroughly and to pay attention to their interests. In this system, students were constantly under pressure. As a result they worked harder but their intrinsic motivation disappeared and they were encouraged to learn by heart. Crooks (1988) also drew attention to the fact that, in relation to the frequency of assessment, it is possible that frequent assessment is not an aid – it could even be a hindrance – when learning results of a higher order are concerned, even when assessment explicitly focuses on it. According to him, a possible explanation could be that students need more breathing space in order to achieve in-depth learning and to reach learning results of a higher order. We should note that Crooks’ discussion is of summative, and not formative, assessment.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
91
Summative assessment, however, does not always affect the learning process in a negative way: ‘‘Backwash is no bad thing’’, as Biggs (1996) emphasizes. More than that, assessment can have a positive effect on learning (cf. McDowell, 1995; Dochy, Moerkerke, & Martens, 1996; Dochy & McDowell, 1997). McDowell (1995) points out that it is obvious that students learn and behave differently in courses with assessment which requires higher order learning and an authentic/original task than when they are examined in a traditional way. When assessment is made up of an exam paper, then there is no sense in memorising (also see Vermunt, 1992). Students also point out this difference themselves. By using interviews, Sambell et al. (1997) investigated students’ perceptions of the effects of assessment on their learning behaviour. Many students found that traditional assessment methods had a negative effect on their learning process. They think that the quality of their learning was often ‘‘tarnished’’, because they consciously turned to ‘‘inferior’’ learning behaviour in order to be prepared for the kind of assessment that only led to short term learning. These experiences contrast with their experiences of assessment that channelled their efforts in the search for comprehension and focused on the processes of critical questioning and analysing. In general, this was perceived as a level of learning that satisfied them more. It is important to note that assessment often pays more inherent attention to the formative function, which partly comes under the category of postassessment effects. However, new types of assessment also fail sometimes. McDowell (1995) refers to the example of a group task that forms part of, or is included, in the final assessment. As a result there are conflicting pressures between learning and the production of the product. When a student aims to improve his weaknesses during production and focuses his learning on these points, he risks jeopardizing his final results, as well as those of the group. McDowell concludes that, with formal summative assessment, students tend to play safe and to mainly use their strong points, whereas a personal breakthrough regarding learning often contains an element of risk. In order to encourage this, it is therefore also necessary to build in formative assessment. Pre-assessment effects work in two directions. They can influence learning behaviour in either a positive or a negative way. New types of assessment that fit into the assessment culture explicitly try to take into account this aspect of the consequential validity of assessment.
The Post-assessment Effects of Assessment-for-Learning on the Learning Process In what way does formative assessment play its role with regard to learning behaviour and are there also differences in the effect depending on the type of formative assessment? Assessment for learning, or formative assessment, does not always have positive effects on learning behaviour. Askham (1997) points
92
F. Dochy
out that it is an oversimplification to think that formative assessment always results in in-depth learning and summative assessment always leads to superficial learning: Some tutors are moving to a more formative approach by setting short answer tests on factual information, undertaken at regular intervals throughout a course of study. This clearly satisfies the need for feedback and does provide the opportunity for improvement, but may encourage only the surface absorption of facts and superficial knowledge. (Askham, 1997, p. 301)
The content and type of tasks can, in practice, have a negative effect, as can formative assessment, and thus undo the positive effect of feedback. If formative assessment suits the learning goals and the instruction, it can result in the refining of less effective learning strategies of students and confirming the results of students who are doing well. Based upon their review of 250 research studies, Black and Wiliam (1998) conclude that formative assessment can be effective in improving students’ learning. In addition, when the responsibility for assessment is handed over to the student, the formative assessment process seems to be more effective. The most important component of formative assessment, as far as influence on learning behaviour is concerned, is feedback. Although the ways in which feedback can be given and received are endless, we can differentiate between internal, self-generated and external feedback (Butler & Winne, 1995). The latter form can be further split up into immediate or delayed feedback, global feedback or feedback per criterion, with or without suggestions for improvement. The characteristics of effective feedback, as described by Crooks (1988), have already been discussed. If we want to know why feedback works, we have to look into the functions of this feedback. Martens and Dochy (1997) indicate the two main functions of feedback which can influence students’ learning behaviour. The first, the cognitive function, implies that information is provided about the learning process and the thorough command of the learning goals. This feedback has a direct influence on the knowledge and views of students (confirmation of correct views, improvement or restructuring of incorrect ones), but can also influence the cognitive and metacognitive strategies that students make use of during their learning process. Secondly, feedback can be interpreted as a positive or a negative confirmation. The impact of feedback as confirmation of the student’s behaviour depends on the student’s interpretation (e.g. his attribution style). A third function of formative assessment that has nothing to do with feedback, but that supports the learning process, is the activation of prior knowledge through e.g. prior knowledge tests. This is also a post-assessment effect of assessment. Martens and Dochy (1997) investigated the effects of prior knowledge tests and progress tests with feedback for students in higher education. Their results revealed that the influence of feedback on learning behaviour is limited. An effect that was apparent was that 24% of the students reported that they studied certain parts of the course again, or over a longer period, as a result of the
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
93
feedback. Students who were kept informed of the progress they made seemed to study longer in this self study context. As far as method of study and motivation are concerned there were, in general, no differences from the control group. They conclude that adult students are less influenced by feedback from one or more assessments. Negative feedback did not seem to have dramatically negative consequences for motivation. There is, however, a tendency towards interaction with perceived study time. Possible explanations for the unconfirmed hypotheses are that the content and timing of the feedback were not optimal and that the learning materials were self-supporting. Here again we note that the contents, the type/mode and the frequency of assessment are important factors influencing the impact of formative assessment on learning behaviour. Dochy, Segers, and Sluijsmans (1999) also point out the importance of the student’s involvement in assessment. The most important factor, that is unique to formative assessment, is the feedback component. This, however, does not always seem to produce the desired effects. Feedback contents, timing, and the fact that it is attuned to the student’s needs, seem again to be important. We refer again to the characteristics of effective feedback from Crooks. Biggs (1998) asks us, however, not to overlook summative assessment in this euphoria about formative assessment. First of all, the effects of summative assessment on learning behaviour are also significant (see above). Secondly, research from Butler (1988) reveals that formative and summative assessment can also interact in their effects on learning behaviour and can only be separated artificially because students always feel the influence of both and because instruments can also fulfil both functions. Crooks (1988) and Tan (1992), on the other hand, quote various investigations that point out the importance of the separation of opportunities for feedback and summative assessment. Also, when assessment counts as well, students seem to pay less attention to feedback and they learn less from it. On the basis of research by McDowell (1995), we have already mentioned a disadvantage of the combination of both. Black (1995) also has reservations about the combination of formative and summative assessment. When summative assessment is done externally, it has to be separated from the formative aspect because the negative backwash effects can be detrimental to learning and inconvenient to the good support of the learning process. But even when summative assessment is done internally, completely or partially, by the teacher himself, the relationship between both functions has to be taken care of. Primarily, the function of assessment must be set down and consequently decisions about shape and method can be made. Struyf, Vandenberghe, and Lens (2001), however, point out that in practice this will be mixed up anyway and that formative assessment will be the basis for a summative judgement on what someone has achieved. What matters to us in this discussion is that the teacher or lecturer eventually pays attention to the consequential validity of assessment, whether it is formative, summative, or even both.
94
F. Dochy
Searching for New Quality Indicators of Assessment Edumetric indicators, such as validity and reliability, are traditionally used to evaluate the quality of educational assessment. The validity question refers to the extent to which assessment measures what it purports to measure. Does the content of assessment correlate with the goals of education? Reliability was traditionally defined as the extent to which a test measures consistently. Consistency in test results demonstrated objectivity in scoring: the same results were obtained if the test was judged by another person or by the same person at another time. The meaning of the concept of reliability was determined by the then prevailing opinion that assessment needs to fulfil, above all, a selection function. As mentioned above, fairness in testing was aligned with objectivity. Striving to achieve objectivity in testing and comparing scores resulted in the use of standardised testing forms, such as multiple-choice tests. These kinds of tests, that in practice above all measure the reproduction of knowledge, are now criticised for their negative influence on instruction. Because great weight was attributed to test results at a management level, tutors attuned their education to the content and the level of knowledge that was asked for in the test. As a consequence, lower levels of cognitive knowledge were more likely to be attended to (Dochy & Moerkerke, 1997). As a reaction to the negative effects of multiple-choice tests on education, new assessment modes were developed. These new assessment modes judge students on their performances when using knowledge creatively to solve domain-specific problems (Birenbaum, 1996). Assessment tasks are real-life problems, or authentic representations of real-life problems. Since an important goal of higher education today is to educate students in using their knowledge to solve real-life problems, new assessment modes seem more valid than standardised tests. Indeed, traditional tests always assume that student answers are an indication of competence in a specific domain. When using authentic assessment, interpretation of answers is not needed because assessment is still a direct indication of the required competence. Precisely because of their authentic, non-standardised character, these assessment modes score unfavourably on a conventional reliability measurement, however, because the starting-points are not the same. Taking the unique characteristics of new assessment modes into consideration, the traditional method of measuring reliability can be questioned. In the first place, a conventional reliability measurement of new assessment modes that are, in contrast to traditional tests, not standardised, may give a ‘‘false’’ picture of the results. Secondly, using new assessment modes inevitably implies that different assessors judge distinct knowledge and skills in different ways, at different times. Also, one could question whether validity should not receive a higher priority (Dochy, 2001). As Frank (personal communication, July 13, 2001) points out,
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
95
In the textile world in which I spent most of my working life, many of the consumer tests have been designed with reliability in mind but not validity. One simple example of this is a crease test which purports to test the tendency of a material to get creased during use. The problem is that the test is carried out under standard conditions at a temperature of twenty degrees and a relative humidity of sixty-five percent. When worn the clothing is usually at a higher temperature and a much higher humidity and textile fabrics are very sensitive to changes in these variables but still they insist on measuring under standard conditions, thus making the results less valid for the user.
It is clear that a new assessment culture cannot be evaluated on the basis of criteria from a previous era. To do justice to the uniqueness of new assessment modes, the traditionally used criteria need to be expanded and other, more suitable, criteria for evaluating the quality of assessment need to be developed.
Revised Edumetric Criteria for Evaluating Assessment: A Review Various authors have proposed ways to extend the criteria, techniques and methods used in traditional psychometrics (Cronbach, 1989; Frederiksen & Collins, 1989; Haertel, 1991; Kane, 1992; Linn, 1993; Messick, 1989). In this section, an integrated overview is given. Within the literature on quality criteria for evaluating assessment, a distinction can be made between authors who present a more expanded vision on validity and reliability (Cronbach, 1989; Kane, 1992; Messick, 1989) and those who propose specific criteria, sensitive to the characteristics of new assessment modes (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2007; Dierick & Dochy, 2001; Frederiksen & Collins, 1989; Haertel, 1991; Linn, Baker, & Dunbar, 1991).
Construct Validity as a Unitary Concept for Evaluating the Quality of Assessment Within the classical tradition, validity encompassed three components: content, criterion and construct validity aspects. The content validity aspect investigates how well the range and types of tasks used in assessment are an appropriate reflection of the knowledge domain that is being measured. The term construct is a more abstract concept that refers to psychological thinking processes underlying domain knowledge, for example, the reasoning processes involved in solving an algebra problem. Measuring construct validity involves placing the construct that is being measured within a conceptual network and estimating relationships between measurements of related constructs through statistical analysis. The extent to which there is a correlation between scores on tests that are measuring the same construct is called criterion
96
F. Dochy
validity. Evidence that a judgement is criterion valid is used as an argument for the construct validity of an assessment. Because new assessment modes use complex multidisciplinary problems, it is not clear how to measure the validity of assessment in a psychometric way. Within the new assessment culture, a more realistic approach must be sought for measuring construct validity. Since the psychometric approach has been renounced, there is a need to redefine the term construct. It can be argued that this term can best be replaced by the term competence. Indeed, goals within higher education today tend to be formulated in terms of attainable competencies. Research has also demonstrated that the use of an assessment form is not only determined by the kind of knowledge and skills measured, but also by broader effects on the nature and content of education (Birenbaum & Dochy, 1996). Some examples are the negative influences that standardised assessment can have on the kind of knowledge that is asked for, on the way education is offered and, by outcome, on the way students learn the subject matter. New modes of assessment attempt to counteract these negative influences. It can be argued that assessing higher-order skills will lead to learning those kinds of knowledge and skills. It has indeed appeared that exams have the most influence on the learning activities of students. To evaluate the suitability of an assessment form, it is important not only to ask if the assessment is an appropriate measure of the proposed knowledge and skills, but also to ask if use of the assessment has achieved the intended effects (Messick, 1989). The arguments above have led to critical reflection on the traditional way of validating new assessment modes. This has resulted in the use of the term construct validity as a unified criterion for evaluating the quality of assessment. The authors of the Standards for Educational and Psychological Testing define construct validity as ‘‘a unitary concept, requiring multiple lines of evidence, to support the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores’’ (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1985, p. 9). In this construct validity criterion, the three traditional aspects for evaluating validity are integrated. To describe this construct validating process, a connection is sought with the interpretative research tradition. Authors such as Kane (1992) and Cronbach (1989) use an argument-based approach for validating assessment, whereby arguments are sought to support the plausibility of the corresponding interpretative argument with appropriate evidence (i) for the inferences and assumptions made in the proposed interpretative argument, and (ii) for refuting potential counter arguments. Messick (1994) offers the most elaborated vision of construct validity. He describes this concept as an evaluative summary of both evidence for the actual, as well as potential, consequences of score interpretation and use. Following his argument, the concept of validity encompasses six distinguishable parts: content, substantive, structural, external, generalisability, and consequential aspects of construct validity, that conjointly function as general criteria for all
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
97
educational assessment. The content aspect of validity means that the range and type of tasks used in assessment must be an appropriate reflection (content relevance, representativeness) of the construct domain. Increasing achievement levels in assessment tasks should reflect increases in expertise in the construct domain. The substantive aspect emphasizes the consistency between the processes required for solving the tasks in an assessment, and the processes used by domain experts in solving tasks (problems). Furthermore, the internal structure of assessment – reflected in the criteria used in assessment tasks, the interrelations between these criteria, and the relative weight placed on scoring these criteria – should be consistent with the internal structure of the construct domain. If the content aspect (relevance, representativeness of content and performance standards) and the substantive aspect of validity is guaranteed, score interpretation, based on one assessment task should be generalisable to other tasks which assess the same construct. The external aspect of validity refers to the extent to which the assessment scores’ relationships with other measures and non-assessment behaviours reflect the expected high, low, and interactive relationships. The consequential aspect of validity includes evidence and rationales for evaluating the intended and unintended consequences of score interpretation and use. The view that the three traditional validity aspects are integrated within one concept for evaluating assessment was also suggested by Messick (1989). The aspect most stressed recently, however, is evaluating the influence of assessment on education. Additionally, the concept of reliability becomes part of the construct validating process. Indeed, for new modes of assessment, it is not important to select the scores of students following a normal curve. The most important question is how reliable is the judgement that a student is competent or not? The generalised aspect of validity investigates the extent to which the decision that a student is competent can be generalised to other tasks, and thus is reliable. Measuring reliability then can be interpreted as a question of the accuracy of the generalisation of assessment results to a broader domain of competence. In the following section we discuss whether generalisability theory, rather than classical reliability theory, can be used to express the extent of the reliability of accurate assessment.
Measuring Reliability: To What Extent Can Accuracy of Assessment Be Generalised? In classical test theory, reliability can be interpreted in two different ways. On the one hand reliability can be understood as the extent to which agreement between raters is achieved. In the case of performance assessment, reliability can be described in terms of agreement between raters on one specified task and on
98
F. Dochy
one specified occasion. The reliability can be raised, according to this definition, by using detailed procedures and standards to structure the judgements obtained. Heller, Sheingold, & Mayford (1998), however, propose that measurements of inter-rater reliability in authentic assessment do not necessarily indicate whether raters are making sound judgements and also do not provide bases for improving the technical quality of a test. Differences between ratings sometimes represent more accurate and meaningful measurement than absolute agreement would. Suen, Logan, Neisworth and Bagnato (in press) argue that the focus on objectivity among raters, as a desirable characteristic of an assessment procedure, leads to a loss of relevant information. In high stake decisions, procedures which include ways of weighing high quality information from multiple perspectives may lead to a better decision than those in which information from a single perspective is taken into account. On the other hand, reliability refers to the consistency of results obtained from a test when students are re-examined with the same test and the same rater on a different occasion. According to this concept, reliability can be improved by using tasks which are similar in format and context. This definition of reliability in terms of consistency over time is, in the opinion of Bennet (1993), problematic: between the test and the retest the world of a student will change and that will affect the re-examination. Therefore, it is difficult to see how an assessor can assure that the same aspects of the same task can be assessed in the same way on different occasions. Assessors will look primarily for developments rather than consistency in scores. Taking the above definitions into account, it looks as though reliability and the intention of assessment are opposed to each other. Achieving a high reliability in terms of the student’s knowledge and skills by means of making concessions will lead to a decrease in content validity. In the new test culture, the vision of weighing information from multiple perspectives may result in a better decision than using information taken from a single perspective. In that case, different test modes can be used and the resulting information from these different tests can be used to reach accurate decisions. Classical reliability theory cannot be used in line with the vision of assessment modes to express the reliability of a test because assessment is about determining the real competence of students with a set of tests and tasks and not trying to achieve a normally distributed set of results. Additionally, with classical reliability theory you only can examine the consistency of a test on a certain occasion, or the level of agreement between two raters. Assessment involves measuring the competencies of students on different occasions and in different ways, possibly by different raters. Assessment has to be considered as a whole and not as a set of separate parts. Thus, the reliability of the assessment as a whole is far more interesting than the reliability of the separate parts. It is far more useful to examine in what way and to what extent the students’ behaviours can be generalised (or transferred) to, for example, professional reality and the necessary tasks, occasions and raters required for that purpose.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
99
A broader view of reliability has led to the development of generalisability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972). Instead of asking how accurate the observed scores are, as a reflection of the true scores, generalisability theory asks how accurately the observed scores allow one to generalise the behaviour of a student in a well-defined universe. What is possible with generalisability theory that is not possible with reliability theory? Classical reliability theory tries to answer the question of how accurately observed scores are consistent with the true scores. The more the scores from a test correspond with the hypothetical true scores, the smaller the error and the higher the reliability. The observed score is subdivided into a true score part and an error part. The error contains all sorts of possible sources of variation in the score (for example the items, the occasions, the raters). Because of the complexity and the open ended character of the tasks in assessment, larger error is possible (Cronbach, Linn, Brennan, & Haertel, 1997). The different sources of error cannot be discriminated using classical theory. Generalisability theory can, if well defined, discriminate between different sources of error. It is even possible to identify and estimate the size of (and the interactions between) the different sources of error simultaneously. In this way, the researcher can get an answer to the question regarding the extent to which the observed data can be generalised to a well-defined universe. In other words, the extent to which the measurement corresponds with reality and which components or interactions between the components cause inaccuracy. Various researchers have tried to demonstrate the working and the benefits of generalisability theory. Shavelson, Webb and Rowley (1989) and Brennan and Johnson (1995), for example, show which components can be integrated using generalisability theory, and what high or low variations in these components mean for the measurement (the so-called G-study). With this information, assessment can be optimised. By varying the number of tasks, the raters or the occasions, one can estimate the reliability in the case of different measurement procedures and in this way construct an efficient and, as far as possible, reliable assessment (the so-called D-study). Fan and Chen (2000) show another example of applying generalisability theory. They demonstrate that inter-rater reliability, when compared with classical reliability theory, often gives an overestimation of the true reliability. Generalisability theory can compute much more accurate values.
Specific Criteria for Evaluating the Quality of New Assessment Modes Apart from an expansion of the traditional validity and reliability criteria, new criteria can be suggested for evaluating the quality of assessment: transparency, fairness, cognitive complexity and authenticity of tasks, and directness of assessment
100
F. Dochy
(Dierick, Dochy, & Van de Watering, 2001; Dierick, Van de Watering, & Muijtjens, 2002; Frederiksen & Collins, 1989; Gielen et al., 2003; Haertel, 1991; Linn, et al., 1991). These criteria were developed to highlight the unique characteristics of new assessment modes. An important characteristic of new assessment modes is the kind of tasks that are used. Authentic assessment tasks question the higher order skills that stimulate a deeper learning approach by students. The first criterion that distinguishes new assessment modes from traditional tests is the extent to which assessment tasks are used to measure problem solving, critical thinking and reasoning. This criterion is called cognitive complexity (Linn et al., 1991). To judge whether assessment tasks meet this criterion, we can analyse whether there is consistency between the processes required for solving the tasks and those used by experts in solving such problems. Next, it is necessary to take into account students’ familiarity with the problems and the ways in which students attempt to solve them (Bateson, 1994). Another criterion for evaluating assessment tasks is authenticity. Shepard (1991) describes authentic tasks as the best indicators of attainment of learning goals. Indeed, traditional tests always assume an interpretation from student answers to competence in a specific domain. When using authentic assessment, interpretation of answers is not needed, because assessment is already a direct indication of competence. The criterion authenticity of tasks is closely related to the directness of assessment. Powers, Fowles, and Willard (1994) argue that the extent to which teachers can judge competence directly is relevant evidence of the directness of assessment. In their research, teachers were asked to give a global judgement about the competence ‘‘general writing skill’’ for writing tasks, without scoring them. Thereafter, trained assessors scored these works, following predetermined standards. Results indicate that there was a clear correlation between the global judgement of competence by the teachers and the marks awarded by the assessors. When scoring assessment, the criterion of fairness (Linn, et al. 1991) plays an important role. The central question is whether students have had a fair chance to demonstrate their real ability. Bias can occur for several reasons. Firstly, because tasks are not congruent with the received instruction/education. Secondly, because students do not have equal opportunities to demonstrate their real capabilities on the basis of the selected tasks (e.g. because they are not accustomed to the cultural content that is asked for), and thirdly, because of prejudgment in scoring. Additionally, it is also important that students understand the criteria that are used in assessment. ‘‘Meeting criteria improves learning’’: communicating these criteria to students when the assignments are given improves their performances, as they can develop clear goals to strive for in learning (Dochy, 1999). A final criterion that is important when evaluating assessment is the transparency of the scoring criteria that are used. Following Frederiksen and Collins
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
101
(1989), the extent to which students can judge themselves and others to be as reliable as trained assessors provides a good indicator for this criterion. Though these criteria seem to be good indicators for evaluating the quality of new assessments, they cannot be seen as completely different from those formulated by Messick (1994). The difference is that these criteria are more concrete and more sensitive to the unique characteristics of new assessment modes. They specify how validity can be evaluated in an edumetric, instead of a psychometric, way. The criteria authenticity of tasks and cognitive complexity can be seen as further specifications of Messick’s (1994) content validity aspect. Authenticity of tasks means that the content and the level of tasks need to be an adequate representation of the real problems that occur within the construct/competence domain that is being measured. To investigate the criterion of cognitive complexity, we need to judge whether solving assessment tasks requires the same thinking processes that experts use for solving domain-specific problems. This criterion corresponds to what Messick calls substantial validity. The criteria directness and transparency are relevant in the context of the consequential validity of assessment. The way that competence is assessed, directly or from an interpretation of student’s answers, has an immediate effect on the nature and content of education and the learning processes of students. In addition, the transparency of the assessment methods used influences the learning process of students, as their performance will improve if they know exactly which assessment criteria will be used (see the above-mentioned argument). The criterion fairness forms part of both Messick’s (1994) content validity aspect and his internal structural aspect. To give students a fair chance to demonstrate what their real capabilities are, the tasks offered need to be varied so that they contain the whole spectrum of knowledge and skills needed for the competence measured. Also important is that the criteria used for assessing a task, and the weight that is given, will be an adequate reflection of the criteria used by experts for assessing competence in a specific domain. On the question of whether there is really a gap between the old and the new assessment stream in formulating criteria for evaluating the quality of assessment, it can be argued that there is a clear difference in approach and background. Within psychometrics, the criteria validity and reliability are interpreted differently.
Evaluating New Assessment Modes According to the New Edumetric Approach If we integrate the most important changes within the assessment field with regard to the criteria for evaluating assessment, conducting a quality assessment inquiry involves a comprehensive strategy that addresses the evaluation of:
102
1. 2. 3. 4.
F. Dochy
the validity of assessment tasks; the validity of performance assessment scoring; the generalisability of assessment; and the consequential validity of assessment.
During this inquiry, arguments will be found that support or refute the construct validity of assessment. Messick (1994) suggested that two questions must be asked whenever a decision about the quality of assessment is made: Is the assessment any good as a measure of the characteristics it is intended to assess? and Should the assessment be used for the proposed purpose? In evaluating the first question, evidence of the validity of the assessment tasks, of the assessment performance scoring, and the generalisability of the assessment, must be considered. The second question evaluates the adequacy of the proposed use (intended and unintended effects) against alternative means of serving the same purpose. In the evaluative argument, the evidence obtained during the validity inquiry will be considered, and carefully weighed, to reach a conclusion about the adequacy of assessment use for the specific purpose. In Table 6.1, an overview is given of questions that can be used as guidelines to collect supporting evidence for, and to examine possible threats to, construct validity. Table 6.1 A framework for collecting supporting evidence for, and examining possible threats to, construct validity Procedure Review questions regarding 1. Establish an explicit conceptual framework for the assessment = Construct definition: (content and cognitive specifications)
Purpose Frame of reference for reviewing assessment tasks or items with regard to the purport construct/ competence 2. Identify rival explanations for the observed performance + Collect multiple types of evidence on the rival explanations of assessment performance
1. Validity of the Assessment Tasks Judging how well the assessment matches the content and cognitive specifications of the construct /competency that is measured + A. Does the assessment consist of a representative set of tasks that cover the spectrum of knowledge and skills needed for the construct/competence being measured? B. Are the tasks authentic in that they are representative of the real life problems that occur within the construct/ competence domain that is being measured? C. Do the tasks assess complex abilities in that the same thinking processes are required for solving the tasks that experts use for solving domain-specific problems? 2. Validity of Assessment Scoring Searching evidence for the appropriateness of the inference from the performance to an observed score + A. Is the task congruent with the received instruction/ education? B. Do all students have equal opportunities to demonstrate their capabilities on the basis of the selected tasks?
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
Procedure Purpose Identifies possible weaknesses in the interpretation of scoring
+ Provides a basis for refuting alternative explanations for performance or for revising the assessment, if necessary.
103
Table 6.1 (continued) Review questions regarding – Can all students demonstrate their performance in the selected mode of assessment? – Are students accustomed to the cultural content that is asked for? – Are students sufficiently motivated to perform well? – Can students use the necessary equipment? – Do any students have inappropriate advantages? C. Do the scoring criteria relate directly to the particular construct/competence that the task is intended to measure? D. Are the scoring criteria clearly defined (transparent)? E. Do students understand the criteria that are used to evaluate their performance? 3. The Generalisability of the Assessment Score Searching for evidence of the appropriateness of the inference from the observed score to a conclusion about expected performance in the competence/construct domain. Within the current vision of assessment, the traditional concept of reliability is replaced by the notion of generalisability. Measuring reliability then becomes a question of accuracy of generalisation, or transfer of assessment. The goal of using generalisability theory in reliability inquiry is to explain the consistency/ inconsistency notion in scoring and to focus on understanding and reducing possible sources of measurement error. s2 Xptr = s2p + s2t + s2r +s2 pt + s2 tr +s2 pr + ptr (total variance ) p=person (student); t= task; r = rater Generalisability coefficient (reliability coefficient) : Ratio: universal score variance / observed score variance
Searching for evidence : What does the assessment claim to do?
4. Consequences of Test Use
Techniques
Investigating whether the actual consequences are also the expected consequences How do students prepare themselves for education? What kind of learning strategies do students use? Which kind of knowledge is measured? Does assessment stimulate the development of various skills?
observation of instructional strategies comparison of the quality of learning results obtained with former test methods with learning results using this new form of assessment.
104
F. Dochy
Procedure
Table 6.1 (continued) Review questions regarding Does assessment stimulate students to apply their knowledge in realistic situations? Are long term effects perceived? Is breath and depth in learning actively rewarded, instead of merely by chance? Does making expectations and criteria explicit stimulate independence? Is relevant feedback provided for progress?
presenting statements of expected (and unexpected) consequences of assessment to the student population holding semistructured key group interviews
Promises for new forms of assessment encourage high quality learning and active participation encourage instructional strategies and techniques that foster reasoning, problem-solving, and communication have no detrimental effect on instruction because they evaluate the cognitive skill of interest directly (directness of assessment) give feedback opportunities formulating clear criteria improves performance of students (transparency of assessment) encourage a sense of ownership, personal responsibility, independence and motivation ameliorate the learning climate What are the effects on the system of using the (Dochy et al., 1999; assessment, other than what Marcoulides & Simkin, 1991; the assessment claims? Sambell et al., 1997; Riley, 1995; Topping, 1998; . . .) Overall purpose: verify that the inferences made from the assessment are sound. Findings should be reported in the form of an evaluative argument that integrates the evidence for the construct validity of the assessment 1. Is the assessment any good as measure of the construct it is interpreted to assess? 2. Should the assessment be used for the proposed use?
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
105
Validity of Assessment Tasks Used What are the arguments in support of the construct validity of new assessment modes? Assessment development begins with establishing an explicit conceptual framework that describes the construct domain being assessed: content and cognitive specifications should be identified. During the first stage, validity inquiry judges how well assessment matches the content and cognitive specifications of the construct being measured. The defined framework can then be used as a guideline to select assessment tasks. The following aspects are important to consider. Firstly, the tasks used must be an appropriate reflection of the construct or, according to new assessment modes, the competence that needs to be measured. Next, with regard to the content, the tasks should be authentic in that they are representative of real life problems that occur in the knowledge domain being measured. Finally, the cognitive level needs to be complex, so that the same thinking processes are required that experts use for solving domain-specific problems. New assessment modes score better on these criteria than standardised tests, precisely because of their authentic and complex problem characteristics.
Validity of Assessment Scoring The next aspect that needs to be investigated is whether the assessment is valid. The fairness criterion plays an important role in this. This requires on the one hand that the assessment criteria fit and are appropriately used, so that they are an appropriate reflection of the criteria used by experts and that appropriate weightings are given for assessing different competences. On the other hand, it requires that students need to have a fair chance to demonstrate their real abilities. Possible problems that can occur are, firstly, that relevant assessment criteria could be lacking, so certain competence aspects do not get enough attention. Secondly, irrelevant, personal assessment criteria could be used. Because assessment measures the ability of students at different times, in different ways, by different judges, there is less chance that these problems will occur with new modes of assessment, since any potential bias in judgement will be outweighed. As a result of this, the totality of all the assessments will give a more precise picture of the real competence of a person than standardised assessment, where the decision of whether a student is competent is reduced to one judgement at one time.
Generalisability of Assessment This step in the validating process investigates to what extent assessment can be generalised to other tasks that measure the same construct. This indicates that score interpretation is reliable and supplies evidence that assessment really measures the intended construct.
106
F. Dochy
Problems that can occur are under-representation of the construct, and variance which is irrelevant to the construct. Construct under-representation means that assessment is too limited, so that important construct dimensions cannot be measured. In the case of variance which is irrelevant to the construct, the assessment may be too broad, thus containing systematic variance that is irrelevant for measuring the construct (Dochy & Moerkerke, 1997). In this context, the breadth of the construct or the purported competence needs to be defined before a given interpretation is considered reliable and validity can be discussed. Messick (1994) argues that the validated interpretation gives meaning to the measure in the particular instance, and to evidence of the generality of interpretation over time and across groups and settings, showing how stable, and thus reliable, that meaning is likely to be. On the other hand, Frederiksen and Collins (1989) have moved away from the idea that assessment can only be called reliable if the interpretation can be generalised to a broader domain. They use another model where the fairness of the scoring is crucial for reliability, but where the replicability and generalizability of the performance are not. In any case, it can be argued that assessment where a number of authentic, representative tasks are used to measure a specific competence is less sensitive to the above-mentioned problems. The purport construct is, after all, directly measured. ‘‘Authentic’’ means that the tasks are realistic, for example, working with extensive case study material and not with a short case study that only lists relevant information.
Consequences of Assessment The last question that needs to be asked is what the consequences are of using a particular assessment form for instruction, and for the learning process of students. Linn et al. (1991) and Messick (1995) stressed not only the importance of the intended consequences, but also the unintended consequences, positive and negative. Examples of intended educational consequences concern the increased involvement and quality of the learning of students (Boud, 1995), improvements in reflection (Brown, 1999), improvements and changes in teaching method (Dancer & Kamvounias, 2005; Sluijsmans, 2002), increased feelings of ownership and higher performances (Dierick et al., 2001; Dierick et al., 2002; Farmer & Eastcott, 1995; Gielen et al., 2003; Orsmond, Merry, & Reiling, 2000), increased motivation and more direction in learning (Frederiksen & Collins, 1989), better reflection skills and more effective use of feedback (Brown, 1999; Gielen et al., 2003) and the development of self-assessment skills and lifelong learning (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2005; Boud, 1995). Examples of unintended educational consequences are, among other things, surface learning approaches and rote learning (e.g., Nijhuis, Segers, & Gijselaers, 2005; Segers, Dierick, & Dochy, 2001), increased test anxiety
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
107
(e.g., Norton et al., 2001) and stress (e.g., Evans, McKenna, & Oliver, 2005; Pope, 2001, 2005), cheating and tactics to impress teachers (e.g., Norton et al., 2001), gender bias (e.g., Langan et al., 2005), prejudice and unfair marking (e.g., Pond, Ui-Haq, & Wade, 1995), lower performances (e.g., Segers & Dochy, 2001), and resistance towards innovations (e.g., McDowell & Sambell, 1999). Consequential validity investigates whether the actual consequences of assessment are also the expected consequences. This is a very important question, as each form of assessment also steers the learning process. This can be made clear by presenting statements of expected (and unexpected) consequences of assessment to the student population, or by holding semi-structured key group interviews. Using this last method, unexpected effects also become clear. When evaluating consequential validity, the following aspects are important to consider: what students understand the requirements for assessment to be; how students prepare themselves for education; what kind of learning strategies are used by students; whether assessment is related to authentic contexts; whether assessment stimulates students to apply their knowledge to realistic situations; whether assessment stimulates the development of various skills; whether long term effects are perceived; whether effort, instead of mere chance is actively rewarded; whether breath and depth in learning is rewarded; whether independence is stimulated by making expectations and criteria explicit; whether relevant feedback is provided for progress; and whether competencies are measured, rather than just the memorising of facts.
The Way Forward: Practical Guidelines From this chapter, we can derive the following practical guidelines to be kept in mind:
The use of summative tests squeezes out assessment for learning and has a negative impact on motivation for learning.
Some research reveals that the majority of students turn to a superficial learning strategy in a system of continued summative assessment.
Using more new modes of assessment supports the recent views on student learning.
Assessment steers learning. The consequential validity of assessments should be taken into account when designing exams. Consequential validity asks what the consequences are of the use of a certain type of assessment on education and on the students’ learning processes, and whether the consequences that are found are the same as the intended effects. The influence of formative assessment is mainly due to the fact that the results are looked back upon after assessment, as well as the learning processes upon which the assessment is based.
108
F. Dochy
In using formative and summative modes of assessment, there are often conflicting pressures between learning and the production of the product.
Formative assessment can be effective in improving students’ learning, and when the responsibility for assessment is handed over to the student, the formative assessment process seems to be even more effective. The most important component of formative assessment is feedback. One of the merits of implementing peer assessment as a tool for learning is that it can actually increase the consequential validity of the parent-assessment method to which it is attached by making it more feasible to include challenging and authentic tasks in one’s assessment system; by helping to make the assessment demands more clear to the students; by providing a supplement or a substitute for formative staff assessment; and finally, by supporting the response to teacher feedback. Our empirical studies showed that qualitative formative peer assessment (referred to as peer feedback), applied as a tool for learning, is no inferior form of feedback. Peer feedback might be considered as a worthy substitute for staff feedback. It might even lead to higher performance than staff feedback when it is extended with measures to enhance the mindful reception of feedback by means of an a priori question form or an a posteriori reply form administered to the assessee (Gielen et al., 2007). Moreover, it is important to stimulate or support students in providing more constructive feedback in order to raise the effectiveness of peer feedback. In order to be considered ‘‘constructive’’, feedback should be specific, appropriate to the assessment criteria, contain positive as well as negative comments, and include some justifications, suggestions and thought-provoking questions. Assessees who receive this type of feedback make better revisions, resulting in greater progress between the draft and the final version of the essay. Examples of measures that can be taken to enhance the constructiveness of peer feedback are increasing the social interaction between peers; training peer assessors in providing constructive feedback; training assessees on how to make sure themselves that they receive the feedback they need; or installing a quality control system in which student-assessors are rewarded for good feedback or punished for clearly poor feedback. Feedback can reach a high level of constructiveness without necessarily being correct or complete. Peer feedback and staff feedback can play a complementary role. Peer feedback can be more specific, and is better in activating, motivating and coaching students; staff feedback is more trustworthy, and it helps to understand the assessment requirements and the structure of the course. Finally, practitioners should notice that implementing peer assessment as a tool for learning does not necessarily result in a saving of time (Gielen et al., 2007). Apart from an expansion of the traditional validity and reliability criteria, new criteria can be suggested for evaluating the quality of assessment: transparency, fairness, cognitive complexity and authenticity of tasks, and directness of assessment.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
109
In conducting a quality assessment inquiry, we should evaluate: the validity of assessment tasks; the validity of performance assessment scoring; the generalisability of assessment; and the consequential validity of assessment. Since this may be time consuming, support for lecturers from a central support department and co-operation in using new modes of assessment between schools and universities are interesting tracks to follow.
Conclusion In this chapter, it is argued that student centred education implies a new way of assessing students. Assessment no longer means ‘‘testing’’ students. New assessment modes differ in various aspects from the characteristics of traditional tests. Students are judged on their actual performance in using knowledge in a creative way to demonstrate competence. The assessment tasks used are authentic representations of real-life problems. Often, students are given responsibility for the assessment process, and both knowledge and skills are measured. Furthermore, in contrast to traditional tests, assessment modes are not standardised. Finally, assessment implies that at different times, different knowledge and skills are measured by different raters. For new assessment modes the most important question, with regard to quality, is: How justified is the decision that a person is competent? Because of these specific characteristics, the application of traditional theoretical quality criteria is not essential. Therefore, in this article attention has been paid to the consequences of these developments for screening the edumetric quality of assessment. It can be concluded that the testing of the theoretical meaning of validity and reliability is no longer tenable but needs, at least, to be expanded and changed. For new assessment modes, it is important to portray the following quality aspects: cognitive complexity, authenticity of tasks, fairness, transparency of the assessment procedure, and the influence of assessment on education.
References American Educational Research Association, American Psychological Association, National Council on Measurement in Education (1985). Standards for educational and psychological testing. Washington, DC: Author. Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning Education Policy Analysis Archives, 10(18). Retrieved August 15, 2007, from http://epaa. asu.edu/epaa/v10n18/. Askham, P. (1997). An instrumental response to the instrumental student: assessment for learning. Studies in Educational Evaluation, 23(4), 299–317. Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2007). Teachers’ opinions on quality criteria for competency assessment programmes. Teaching and Teacher Education. Retrieved August 15, 2007, from http://www.fss.uu.nl/edsci/ images/stories/pdffiles/Baartman/baartman%20et%20al_2006_teachers%20opinions% 20on%20quality%20criteria%20for%20caps.pdf
110
F. Dochy
Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2005). The wheel of competency assessment. Presenting quality criteria for competency assessment programmes. Paper presented at the 11th biennial Conference for the European Association for Research on Learning and Instruction (EARLI), Nicosia, Cyprus. Bateson, D. (1994). Psychometric and philosophic problems in ‘authentic’ assessment, performance tasks and portfolios. The Alberta Journal of Educational Research, 11(2), 233–245. Bennet, Y. (1993). Validity and reliability of assessments and self-assessments of work-based learning assessment. Assessment & Evaluation in Higher Education, 18(2), 83–94. Biggs, J. (1996). Assessing learning quality: Reconciling institutional, staff and educational demands. Assessment & Evaluation in Higher Education, 21(1), 5–16. Biggs, J. (1998). Assessment and classroom learning: A role for summative assessment? Assessment in Education: Principles, Policy & Practices, 5, 103–110. Birenbaum, M. (1994). Toward adaptive assessment – The student’s angle. Studies in Educational Evaluation, 20, 239–255. Birenbaum, M. (1996). Assessment 2000. In: M. Birenbaum & F. Dochy, (Eds.). Alternatives in assessment of achievement, learning processes and prior knowledge. Boston: Kluwer Academic. Birenbaum, M., & Dochy, F. (Eds.) (1996). Alternatives in assessment of achievement, learning processes and prior knowledge. Boston: Kluwer Academic. Black, P., (1995). Curriculum and assessment in science education: The policy interface. International Journal of Science Education, 17(4), 453–469. Black, P. & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practices, 5(1), 7–74. Boud, D. (1990). Assessment and the promotion of academic values. Studies in Higher Education, 15(1), 101–111. Boud, D. (1995). Assessment and learning: Contradictory or complementary? In P. Knight (Ed.), Assessment for learning in higher education (pp. 35–48). London: Kogan Page. Brennan, R. L., & Johnson, E. G. (1995). Generalisability of performance assessments. Educational Measurement: Issues and Practice, 11(4), 9–12. Brown, S. (1999). Institutional strategies for assessment. In S. Brown, & A. Glasner (Eds.), Assessment matters in higher education: Choosing and using diverse approaches (pp. 3–13). Buckingham: The Society of Research into Higher Education/Open University Press. Butler, D. L. (1988). A critical evaluation of software for experiment development in research and teaching. Behaviour-Research-Methods, Instruments and Computers, 20, 218–220. Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical synthesis. Review of Educational Research, 65(3), 245–281. Cronbach, L. J. (1989). Construct validation after thirty years. In R. L. Linn (Ed.), Intelligence: Measurement, theory and public policy (pp. 147–171). Chicago: University of Illinois Press. Cronbach, L. J., Gleser, G. C., Nanda, H. & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley. Cronbach, L. J., Linn, R. L., Brennan, R. L. & Haertel, E. H. (1997). Generalizability analysis for performance assessments of students’ achievement or school effectiveness. Educational and Psychological Measurement, 57(3), 373–399. Crooks, T. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438–481. Dancer, D., & Kamvounias, P. (2005). Student involvement in assessment: A project designed to assess class participation fairly and reliably. Assessment and Evaluation in Higher Education, 30, 445–454. Dierick, S., & Dochy, F. (2001). New lines in edumetrics: New forms of assessment lead to new assessment criteria. Studies in Educational Evaluation, 27, 307–329.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
111
Dierick, S., Dochy, F., & Van de Watering, G. (2001). Assessment in het hoger onderwijs. Over de implicaties van nieuwe toetsvormen voor de edumetrie [Assessment in higher education. About the implications of new test forms for edumetrics]. Tijdschrift voor Hoger Onderwijs, 19, 2–18. Dierick, S., Van de Watering, G., & Muijtjens, A. (2002). De actuele kwaliteit van assessment: Ontwikkelingen in de edumetrie [The actual quality of assessment: Developments in edumetrics]. In F. Dochy, L. Heylen, & H. Van de Mosselaer (Eds.) Assessment in onderwijs: Nieuwe toetsvormen en examinering in studentgericht onderwijs en competentiegericht onderwijs [Assessment in education: New testing formats and examinations in student-centred education and competence based education] (pp. 91–122). Utrecht: Lemma BV. Dochy, F. (1999). Instructietechnologie en innovatie van probleemoplossen: over constructiegericht academisch onderwijs. Utrecht: Lemma. Dochy, F. (2001). A new assessment era: Different needs, new challenges. Research Dialogue in Learning and Instruction, 2(1), 11–20. Dochy, F. (2005). Learning lasting for life and assessment: How far did we progress? Presidential address at the EARLI conference 2005, Nicosia, Cyprus. Retrieved October 18, 2007 from http://perswww.kuleuven.be/u0015308/Publications/EARLI2005%20presidential% 20address%20FINAL.pdf Dochy, F., & Gijbels, D. (2006). New learning, assessment engineering and edumetrics. In L. Verschaffel, F. Dochy, M. Boekaerts, & S. Vosniadou (Eds.), Instructional psychology: Past, present and future trends. Sixteen essays in honour of Erik De Corte. New York: Elsevier. Dochy, F., & McDowell, L. (1997). Assessment as a tool for learning. Studies in Educational Evaluation, 23, 279–298. Dochy, F. & Moerkerke, G. (1997). The present, the past and the future of achievement testing and performance assessment. International Journal of Educational Research, 27(5), 415–432. Dochy F., Moerkerke G., & Martens R. (1996). Integrating assessment, learning and instruction: Assessment of domain-specific and domain transcending prior knowledge and progress. Studies in Educational Evaluation, 22(4), 309–339. Dochy, F., Segers, M., Gijbels, D., & Struyven, K. (2006). Breaking down barriers between teaching and learning, and assessment: Assessment engineering. In D. Boud & N. Falchikov (Eds.), Rethinking assessment for future learning. London: RoutledgeFalmer. Dochy, F., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer- and co-assessment in higher education: A review. Studies in Higher Education, 24(3), 331–350. Elton L. R. B., & Laurillard D. M. (1979). Trends in research on student learning. Studies in Higher Education, 4(1), 87–102. Evans, A. W., McKenna, C., & Oliver, M. (2005). Trainees’ perspectives on the assessment and self-assessment of surgical skills. Assessment and Evaluation in Higher Education, 30, 163–174. Falchikov, N. (1986). Product comparisons and process benefits of collaborative peer group and self-assessments. Assessment and Evaluation in Higher Education, 11(2), 146–166. Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in Education and Training International, 32(2), 395–430. Fan, X., & Chen, M. (2000). Published studies of interrater reliability often overestimate reliability: computing the correct coefficient. Educational and Psychological Measurement, 60(4), 532–542. Farmer, B., & Eastcott, D. (1995). Making assessment a positive experience. In P. Knight (Ed.), Assessment for learning in higher education (pp. 87–93). London: Kogan Page. Feltovich, P. J., Spiro, R. J. & Coulson, R. L. (1993). Learning, teaching, and testing for complex conceptual understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum. Firestone, W. A., & Mayrowitz, D. (2000). Rethinking ‘‘high stakes’’: Lessons from the United States and England and Wales. Teachers College Record, 102, 724–749.
112
F. Dochy
Frederiksen, J. R., & Collins, A. (1989). A system approach to educational testing. Educational Researcher, 18(9), 27–32. Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning. American Psychologist, 39(3), 193–202. Gibbs, G. (1999). Using assessment strategically to change the way students learn, In S. Brown & A. Glasner (Eds.), Assessment matters in higher education: Choosing and using diverse approaches, Buckingham: Open University Press. Gielen, S., Dochy, F., & Dierick, S. (2003). Evaluating the consequential validity of new modes of assessment: The influence of assessment on learning, including pre-, post-, and true assessment effects. In M. S. R. Segers, F. Dochy, & E. Cascallar (Eds.), Optimising new modes of assessment: In search of qualities and standards (pp. 37–54). Dordrecht/ Boston: Kluwer Academic Publishers. Gielen, S., Dochy, F., & Dierick, S. (2007). The impact of peer assessment on the consequential validity of assessment. Manuscript submitted for publication. Gielen, S., Tops, L., Dochy, F., Onghena, P., & Smeets, S. (2007). Peer feedback as a substitute for teacher feedback. Manuscript submitted for publication. Gielen, S., Dochy, F., Onghena, P., Janssens, S., Schelfhout, W., & Decuyper, S. (2007). A complementary role for peer feedback and staff feedback in powerful learning environments. Manuscript submitted for publication. Glaser, R. (1990). Testing and assessment; O Tempora! O Mores! Horace Mann Lecture, University of Pittsburgh, LRDC, Pittsburgh, Pennsylvania. Gulliksen H. (1985). Creating Better Classroom Tests. Educational Testing Service. Opinion papers, Reports – evaluative. Haertel, E. H. (1991). New forms of teacher assessment. Review of Research in Education, 17, 3–29. Harlen, W., & Deakin Crick, R. (2003). Testing and motivation for learning. Assessment in Education, 10(2), 169–207. Heller, J. I., Sheingold, K., & Mayford, C. M. (1998). Reasoning about evidence in portfolios: Cognitive foundations for valid and reliable assessment. Educational Assessment, 5(1), 5–40. Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535. Langan, A. M., Wheater, C. P., Shaw, E. M., Haines, B. J., Cullen, W. R., Boyle, J. C., et al. (2005). Peer assessment of oral presentations: Effects of student gender, university affiliation and participation in the development of assessment criteria. Assessment and Evaluation in Higher Education, 30, 21–34. Leonard, M, & Davey, C. (2001). Thoughts on the 11 plus. Belfast: Save the Children Fund. Linn, R. L. (1993). Educational assessment: Expanded expectations and challenges. Educational Evaluation and Policy Analysis, 15(1), 1–16. Linn, R. L., Baker, E., & Dunbar, S. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15–21. Martens, R., & Dochy, F. (1997). Assessment and feedback as student support devices. Studies in Educational Evaluation, 23(3), 257–273. Marton, F. & Sa¨ljo, ¨ R. (1976). On qualitative differences in learning. Outcomes and process. British Journal of Educational Psychology, 46, 4–11, 115–127. McDowell, L. (1995). The impact of innovative assessment on student learning. Innovations in Education and Training International, 32(4), 302–313. McDowell, L., & Sambell, K. (1999). The experience of innovative assessment: Student perspectives. In S. Brown, & A. Glasner (Eds.), Assessment matters in higher education: Choosing and using diverse approaches (pp. 71–82). Buckingham: The Society of Research into Higher Education/Open University Press. Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11. Messick, S. (1994). The interplay of evidence and consequences in the validation performance assessments. Educational Researcher, 23(2), 13–22.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects
113
Messick, S. (1995). Validity of psychological assessment. Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749. Nevo, D. (1995). School-based evaluation: A dialogue for school improvement. London: Pergamon. Nijhuis, J., Segers, M. R. S., & Gijselaers, W. (2005). Influence of redesigning a learning environment on student perceptions and learning strategies. Learning Environments Research: An International Journal, 8, 67–93. Norton, L. S., Tilley, A. J., Newstead, S. E., & Franklyn-Stokes, A. (2001). The pressures of assessment in undergraduate courses and their effect on student behaviours. Assessment and Evaluation in Higher Education, 26, 269–284. Orsmond, P., Merry, S., & Reiling, K. (2000). The use of student derived marking criteria in peer and self-assessment. Assessment & Evaluation in Higher Education, 25(1), 23–38. Pond, K., Ui-Haq, R., & Wade, W. (1995). Peer review: A precursor to peer assessment. Innovations in Education and Training International, 32, 314–323. Pope, N. (2001). An examination of the use of peer rating for formative assessment in the context of the theory of consumption values. Assessment and Evaluation in Higher Education, 26, 235–246. Pope, N. (2005). The impact of stress in self- and peer assessment. Assessment and Evaluation in Higher Education, 30, 51–63. Powers, D., Fowles, M., & Willard, A. (1994). Direct assessment, direct validation? An example from the assessment of writing? Educational Assessment, 2(1), 89–100. Rigsby, L. C., & DeMulder, E. K. (2003). Teachers voices interpreting standards: compromising teachers autonomy or raising expectations and performances. Educational Policy Analysis Archives, 11(44), Retrieved August 15, 2007, from http://epaa.asu.edu/epaa/ v11n44/ Sambell, K., McDowell, L., & Brown, S. (1997). But is it fair? An exploratory study of student perceptions of the consequential validity of assessment. Studies in Educational Evaluation, 23(4), 349–371. Scouller, K. (1998). The influence of assessment method on student’s learning approaches: Multiple choice question examination versus assignment essay. Higher Education, 35, 453–472. Scouller, K. M., & Prosser, M. (1994). Students’ experiences in studying for multiple choice question examinations. Studies in Higher Education, 19(3), 267–279. Segers, M., & Dochy, F. (2001). New assessment forms in problem-based learning: The valueadded of the students’ perspective. Studies in Higher Education, 26(3), 327–343. Segers, M. S. R., Dierick, S., & Dochy, F. (2001). Quality standards for new modes of assessment. An exploratory study of the consequential validity of the OverAll Test. European Journal of Psychology of Education, XVI, 569–588. Segers, M., Dochy, F., & Cascallar, E. (2003). Optimizing new modes of assessment: In search for qualities and standards. Boston: Kluwer Academic. Shavelson, R. J. (1994). Guest Editor Preface. International Journal of Educational Research, 21, 235–237. Shavelson, R. J., Webb, N. M. & Rowley, G. L. (1989). Generalizability Theory. American Psychologist, 44(6), 922–932. Shepard, L. (1991). Interview on assessment issues with Lorie Shepard. Educational Researcher, 20(2), 21–23. Sluijsmans, D. M. A. (2002). Student involvement in assessment: The training of peer assessment skills. Unpublished doctoral dissertation, Open University, Heerlen, The Netherlands. Starren H. (1998). De toets als hefboom voor meer en beter leren. Academia, Februari, 1998. Struyf, E., Vandenberghe, R., & Lens, W. (2001). The evaluation practice of teachers as a learning opportunity for students. Studies in Educational Evaluation, 27(3), 215–238.
114
F. Dochy
Struyven, K., Dochy, F., & Janssens, S. (2005). Students’ perceptions about evaluation and assessment in higher education: a review. Assessment and Evaluation in Higher Education, 30(4), 331–347. Struyven, K., Gielen, S., & Dochy, F. (2003). Students’ perceptions on new modes of assessment and their influence on student learning: the portfolio case. European Journal of School Psychology, 1(2), 199–226. Suen, H. K., Logan, C. R., Neisworth J. T. & Bagnato, S. (1995). Parent-professional congruence. Is it necessary? Journal of Early Intervention, 19(3), 243–252. Tan, C. M. (1992). An evaluation of continuous assessment in the teaching of physiology. Higher Education, 23(3), 255–272. Thomas, P., & Bain, J. (1984). Contextual dependence of learning approaches: The effects of assessments. Human Learning, 3, 227–240. Thomson, K., & Falchikov, N. (1998). Full on until the sun comes out: The effects of assessment on student approaches to studying. Assessment & Evaluation in Higher Education, 23(4), 379–390. Topping, K. (1998). Peer-assessment between students in colleges and universities. Review of Educational Research, 68(3), 249–276. Trigwell, K., & Prosser, M. (1991a). Improving the quality of student learning: The influence of learning context and student approaches to learning on learning outcomes. Higher Education, 22(3), 251–266. Trigwell, K., & Prosser, M. (1991b). Relating approaches to study and quality of learning outcomes at the course level. British Journal of Educational Psychology, 61(3), 265–275. Vermunt, J. D. H. M. (1992). Qualitative-analysis of the interplay between internal and external regulation of learning in two different learning environments. International Journal of Psychology, 27, 574.
Chapter 7
Plagiarism as a Threat to Learning: An Educational Response Jude Carroll
Introduction Plagiarism is widely discussed in higher education. Concern about the rising level and severity of cases of student plagiarism continues to grow. Worries about student plagiarism are heard in many countries around the world. This chapter will not rehearse the full range of issues linked to student plagiarism. Guidance on how it might be defined, on how students can be taught the necessary skills, and on how cases can be handled when they occur are easy to find on the web (see, for example, JISC-iPAS – the UK government sponsored Internet Plagiarism Advisory Service. Guidance is equally common in printed format (Carroll, 2007). Instead, the chapter explores the connections between learning and plagiarism and explains why this link should be central to discussions about the issue. The chapter also discusses how the link with learning differentiates the treatment of plagiarism within higher education from the way in which the issue is handled outside the academy. It argues that, in the former, discussions of plagiarism should centre on whether or not the submitted work warrants academic credit and not, as happens outside of higher education, on the integrity of the plagiarist. Actions designed to clarify what is meant by learning and to encourage students to do their own work are also likely to discourage plagiarism but this becomes a secondary and valuable offshoot of a pedagogic process rather than the goal of a catch and punish view of dealing with the issue. The chapter begins by reviewing why it is difficult to keep the focus on learning when so many pressures would distract from this goal. It then considers what theories of learning are especially useful in understanding why higher education teachers and administrators should be concerned about students who plagiarise and concludes with suggestions about how to encourage students to do their own work rather than to copy others’ efforts or commission others to do the work for them. J. Carroll The Oxford Centre for Staff and Learning Development, Oxford Brookes University, Oxford, UK e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_7, Ó Springer ScienceþBusiness Media B.V. 2009
115
116
J. Carroll
Media Coverage of Plagiarism Even a cursory review of media coverage post-2000 would show that plagiarism, both within and without higher education, has a prominent position in discussions of contemporary behavior and misbehavior. Macfarlane (2007) refers to current interest as ‘‘almost prurient curiosity’’ and as a modern ‘‘obsession’’ then, like many other commentators, lists recent examples ranging from the UK Prime Minister’s case for the Iraq war in 2002 which included unauthorized copying, to Dan Brown’s 2006 high profile court case on The Da Vinci Code. Such stories are common and the fallout significant for high profile cases. Journalists have been sacked, Vice Chancellors prompted to resign, novelists have had to withdraw books and musicians have gone to court to protect their work. Students are commonly referred to these cases when they are introduced to academic writing requirements and urged not to follow these examples and to attend to the consequences. For example, the 2007 WriteNow site in the UK (http://www.writenow.ac.uk/student_authorship.html) primarily encourages students to be authors rather than copiers and explains why this is important but it also warns them about plagiarism and showcases these instances. Resources from the site designed to encourage authorship include the statement, ‘‘Some people in universities hate students who plagiarize!’’ then suggests that teachers warn their students: There are quite a lot of people in universities who are fed up with reading student essays that have been pasted in from web sites. . . ... [S]tudents have to be extremely careful not to get themselves in a position where they are suddenly accused of cheating and risk having disciplinary action taken against them.
Connections between cheating and plagiarism such as these are common, but often they are not warranted.1 Uncoupling plagiarism and integrity is not an argument for rule breaking to be overlooked. When students plagiarise, they have not met some important requirements and therefore should not be awarded academic credit for the learning they have bypassed through plagiarism. Moreover, in some instances, additional penalties are warranted because the plagiarism also included serious cheating, a point which will be discussed in more detail below. By keeping the central focus on learning (rather than student cheating), it is possible to develop ways other than warning students about the dangers to deter them from adopting unacceptable methods for generating coursework. When cases of student plagiarism do occur, learning-centred procedures for dealing with it makes it more likely that decisions and penalties are fair and proportionate. However, only some ways of looking at learning are useful in explaining to students why their plagiarism is important and only some ways in which students themselves conceive of their responsibilities as learners help 1
The authorship presentation also mentions ‘‘carelessness’’ and ‘‘hurry’’ and the more neutral ‘‘pressure’’ to explain why plagiarism might occur.
7 Plagiarism as a Threat to Learning: An Educational Response
117
them make sense of academic regulations about citation and attribution. To that end, this chapter begins by exploring learning theories labelled as constructivist.
Constructing Meaning, Constructing Understanding Constructivist learning theories are often described as growing out of work done by the French thinker, Piaget (1928), who described how learners create schema which he said were formed when students link together perceptions they already hold with ideas they encounter and/or experiences and actions they have taken. Piaget saw learners as not simply acquiring new information but rather as actively combining and linking ideas, thereby making their own meaning(s). Atherton (2005), in his catalogue of learning theories, notes that Piaget’s overall insights spurred others to add their own. These include Dewey (1938) who is usually credited with encouraging the use of real life problemsolving in education and with valuing students’ critical thinking. Bruner (1960) focused on ways students might categorise knowledge and build hierarchies of meaning. Vygotsky (1962) added a further dimension arising from observing that children learned more effectively when working in collaboration with an adult however, ‘‘it was by no means always the case that the adult was teaching the children how to perform the task, but that the process of engagement with the adult enabled [the children] to refine their thinking or their performance to make it more effective’’ (Atherton, 2005). Vygotsky’s variant, social constructivism, stressed the importance of social interaction in building shared understandings (as opposed to purely personal ones). These are now a central plank of many activities designed to enhance students’ understanding of assessment requirements, including those linked to plagiarism (Rust, O’Donovan, & Price, 2005). All these theories and theoreticians share a belief that learning happens as each student links new ideas to concepts and information which they have already encountered. This active process is akin to building objects in the real world and thus the use of the construction metaphor in the label. Students show they have learned (as opposed to just knowing about things) as they transform information, use it in new ways or apply it to specific situations. Constructivists would argue that students do not show very much understanding by finding and reproducing others’ ideas or by quoting their words (although admittedly, selection, ordering and accurate reproduction is a form of use).
Different Ways to Show Learning In many educational settings, both teachers and students would be unfamiliar with the learning approach outlined in the last paragraph. In many tertiary classrooms around the world, teachers and learners are more likely to be
118
J. Carroll
occupied in learning tasks that involve matching well-selected questions to authoritative answers. In this sort of learning environment, a teacher’s job is to identify questions worth pursuing, then to select and vet resources (often in the form of text-books) in order to help students to answer the questions which teachers deem worthy of students’ attention. For some questions, teachers should tell students the most authoritative and reliable answers. In parallel with this style of teaching, the student must know what questions will be asked (usually in an examination) and what answers will be acceptable to assessors. Students must learn to retrieve and use acceptable answers quickly and accurately (often from memory) and be able to differentiate between good and not-so-good answers for particular settings and circumstances. This kind of learning is often described pejoratively (especially by constructivists) as mug and jug learning where the student is the empty mug and the teacher is the jug of expertise and information. The teacher is described as didactic, transmitting knowledge, and the student is often described as passive, absorbing knowledge then repeating it in examinations or descriptive papers rather than altering or evaluating it. Indeed, students in this kind of learning system report that they are penalised for changing or revising received information (Handa & Power, 2005; Robinson & Kuin, 1999). However, those who teach in such systems insist that simple reproduction is not what is required. Students are expected to understand what they present and teachers assume understanding is demonstrated by skilful reproduction. They cite examples where repetitive learning strategies lead to reliable recall and deep understanding (Au & Entwistle, 1999). Students say that they cannot retain information without understanding; they cannot use it unless they have a sense of its meaning, and that selecting information in itself to answer questions signals agreement and judgment. The purpose of this chapter is not to resolve which of the two approaches, the constructivist or the objectivist/reproductive, has more pedagogic merit. Nor is it possible to characterise any one educational system as requiring only constructivist or only reproductive learning. In any UK A-level classroom, you will find teachers priming students to answer examination questions that test their knowledge and also guiding them on writing coursework which is designed to allow students to demonstrate their own research and use of information. Hayes and Introna (2005) describe seeing Engineering lecturers in an Indian university dictating answers to examination questions (and students copying their words and the advice on correct punctuation for examination answers) then watching the same students engaging in problembased learning in another classroom. Didactic, reproductive learning is said to be the norm in what are often referred to as non-Western universities where teachers are likely to ask students ‘‘Show me you know about x’’-type questions. What is less often discussed is how frequently these types of questions also appear in assessment tasks in the UK, Australia, Canada, or the USA. In all these places, you will hear teachers bemoaning their
7 Plagiarism as a Threat to Learning: An Educational Response
119
students’ lack of initiative for reading outside of the set text and their preference for spoonfeeding. Nevertheless, universities in Anglophone countries and in what are often termed Western universities base their assessment criteria and judgements about whether or not the student has learned on assumptions that are more akin to constructivist than to reproductive thinking. This is especially true of coursework. Teachers’ underpinning constructivist assumptions explain why they are so exercised about student plagiarism.
Plagiarism and the Student’s Own Work In a plagiarised piece of work (or, what is more usual, plagiarised elements within a piece of student coursework), the student did not do the work of making meaning and transforming ideas, therefore he or she has offered no evidence of learning and cannot be awarded academic credit. Bypassing learning through plagiarism means the student bypasses the opportunity for their own development as well. A student who copies or pays someone to produce their coursework remains the same student at the end of the assessment, whereas one who did the work does not, even if the change is too small to be easily described.
Distractions in Maintaining the Link Between Plagiarism and Learning Learning does not usually get a prominent position in most commentaries about students’ plagiarism. Many institutions have Academic Integrity policies rather than ones with the word plagiarism in the title, presumably to underscore the place of academic values and beliefs. Certainly it makes logical sense to tell students what they must do and why it is important for them to do their own work before telling them what they must not do (plagiarise) and before laying out the potentially grim consequences if they breach regulations. However, a less welcome effect is to recast student plagiarism as a moral rather than a pedagogic issue. Serious cheating involving plagiarism does happen. However, most plagiarism is not cheating and the number of cases involving deliberate attempts to gain unfair advantage, to deceive assessors or to fraudulently obtain academic awards are a small or even very small percentage of overall plagiarism cases. It is sometimes difficult to believe this to be the case since newspapers stress the concerns and often fuel the perceived moral panic which surrounds the issue. Statistics, too, look worrying: ‘‘80% of students cheat’’ (Maslen, 2003) whilst downplaying details as to frequency, what percentage is through plagiarism, and any impact on students’ learning.
120
J. Carroll
Other studies have been more specific to plagiarism and they too look worrying. McCabe (2006) found that four in ten students regularly cut-andpaste without citation but did not investigate the extent of the practice in individual pieces of work. Narrowing the focus to deliberate cheating involving plagiarism, even lower rates emerge. Two per cent of students in one Canadian study admitted to commissioning others to write their coursework – but we do not know how often this happened. Actual experience is lower still as Lambert, Ellen, and Taylor (2006) found when investigating cases in New Zealand. They estimated that five per cent of students had a case dealt with informally by their teachers in the previous academic year and fewer than one per cent were managed formally, which they presumed constituted the more serious examples. In 2007, the official figures for cases in the UK, even for very large universities, were fewer than 100 for upwards of 20,000 students. In my own institution, we found deliberate cheating involving plagiarism in 0.015% of coursework submitted in one academic year, 2005/2006. Students in these cases had paid ghost writers, submitted work which had been done by someone at another university, paid professional programmers to create their code, or had copied-and-pasted substantial amounts from others’ work into their own without attribution. Serious cases revealed students cheating by manipulating their references; some had altered texts using the ‘‘find and replace’’ button in ways that were intended to create a false assumption in the assessor as to whose work was being judged. They had stolen others’ work or coerced others into sharing it. One student had gone to the library, borrowed someone’s thesis then ripped out the text, returned the cover, scanned the contents and submitted the work under his own name. (Note: we caught this particular student because he used his real name when taking out the original!) These are all unacceptable acts. Some are incompatible with a qualification and where this was proven, the students did not graduate. All of them are incompatible with awarding academic credit. However, the number of students involved in this type of deliberate cheating is very small – one case in 1500 submissions. Even if, in all these examples, the percentage of cheating cases were increased in some substantial way to adjust for under-detection (which undoubtedly occurred), this level of deliberate cheating does not threaten the overall value of a degree from our university nor does it support the view that our graduates’ skills and knowledge cannot be trusted. It is not cause for panic. It is not just the relatively small number of cases that should reassure those who might panic and dissuade those who focus too strongly on cheating. Most of the cases in the previous examples will be one-off instances. Only a small number of students use cheating as the regular or even exclusive means of gaining academic credit rather than, as most studies show, a one-off mishandling of assessment requirements). Szabo and Underwood (2004) found six per cent of students who admitted to deliberate plagiarism said they did so regularly. The two per cent of Canadian students purchasing essays probably did so only once although sites where students can commission others to do their
7 Plagiarism as a Threat to Learning: An Educational Response
121
coursework usually include a section for users’ comments and in these, satisfied customers sometimes explicitly state they will use the service again. A few sites actively promote regular use as, for example, one which offers to produce PhD dissertations chapter by chapter to allow discussion with supervisors. In every institution, there are undoubtedly instances where all (or almost all) of a student’s credit has been acquired by fraudulent means. However, in my own university, with a student cohort of 18,000, we deal with only a handful of cases per year which involve repeated acts by the same student.
Distractors not Linked to Worries about Cheating Although an over-emphasis on cheating and the resulting moral panic is probably the most significant distractor from dealing with plagiarism as a teaching and learning issue, another distraction is an over-emphasis on citation practices. Teachers can give the (usually mistaken) impression that adhering to referencing guidelines is more important than writing the text which citations are designed to support. I remember one mature student, looking back at his undergraduate career, who said, ‘‘I was tempted to forget the essay and just hand in the reference list since that was all teachers commented about.’’ As the WriteNow quote above reminds students, carelessness leads to risk and disciplinary action. Students’ worries about the risks of imperfect citation are not all misplaced. Students have been accused of plagiarism when using informal attributions along the lines of ‘‘As Brown says in his book’’ rather than a fully fledged Harvard citation. They worry that less-than-perfect citations might trigger disciplinary action. They are concerned whether their attempts to re-write complex academic prose into their own words, sometimes also in an unfamiliar language and nearly always in an unfamiliar disciplinary context will be sufficiently different to escape accusations of plagiarism. Yet Howard (2000) and Pecorari (2003), both of whom have studied students’ often clumsy attempts to paraphrase, refer to the result as patch writing and describe it as an often necessary step along the road to fully competent academic writing. Patch writing is plagiarism but, more importantly, it is a signal for more teaching and the need for more learning. Over-policing citation regulations are perhaps understandable acts by teachers who feel they must do something about plagiarism and are not sure what that something could be. They are correct that a student who sticks too closely to the original text or who is less than assiduous in their use of quotation marks, even if the student’s text makes it clear that he or she is using others’ ideas, has plagiarised and unless action is taken, will probably continue to do so. There may be utility in awarding a small penalty to ensure the student attends to the matter as a way of encouraging the student in future to adhere more closely to recognised conventions. However, these are issues of learning and apprenticeship, not of lack of integrity.
122
J. Carroll
Plagiarism, as Opposed to Cheating, Is Common Study after study confirms Park’s statement that ‘‘plagiarism is doubtless common and getting more so’’ (Park, 2003, p. 471). Many studies support the rule of thumb finding that at least ten per cent of students submit work that is ‘not their own’ because they have cut-and-pasted others’ words, or paraphrased badly, or leaned too heavily on the work of fellow students. This rule-of-thumb ten per cent rises steeply in some disciplines and in some kinds of assessment such as the traditional academic essay or the lab report. In the examples mentioned in previous paragraphs, there were ten recorded cases of negligent practice or academic malpractice involving students’ submitting others’ work for each recorded case of deliberate misconduct. This larger number of plagiarism cases reflects students’ relatively common difficulties with authorship and their varied interpretations of the requirement to ‘‘do your own work’’. If plagiarism is defined as ‘‘submitting someone else’s work as your own’’ then the converse, namely ‘‘submitting your own work’’, must be valuable. Students need to see that, in submitting their own work, they demonstrate to the teacher that they have learned. These two interrelated ideas – submitting your own work and showing your own learning – would seem to express clearcut aims, expressed in plain English. Indeed, most institutional approaches to deterring and dealing with plagiarism start with definitions that are in straightforward language, offering examples and instances where actions would be acceptable and unacceptable. However, quandaries remain which can only be resolved through discussion and example. Here are some examples of potential misunderstanding:
If students’ work is deemed plagiarised only when it is submitted and if
students are expected to generate that work over time, then when must a student stop sharing and co-operating with others and, instead, start working independently to ensure he or she is doing their own work? Where assessment criteria include marks for grammar and spelling, can students use a proof reader? Why is it acceptable to ask a librarian how to conduct a search when paying someone to create a list of sources is not? How much transformation is needed before the student can claim the result as ‘‘my own’’ even if the final draft acknowledges the original source? Can a student leave ten words unchanged? Eight? Five? Which is more valuable: a perfect assessment artefact or one ‘‘made by me’’? Does the teacher want the best that can be made or the best that I can make? If common knowledge need not be cited, how do I know what falls under that category? If everyone knows it, which everyone are we talking about?
In quandaries such as these (and many others), social constructivist learning theory is helpful in reminding staff and students that shared understanding grows
7 Plagiarism as a Threat to Learning: An Educational Response
123
through interaction, practice, and above all, through feedback. Declarative knowledge such as that provided by a student handbook can provide a definition or an hour’s plagiarism lecture can explain the rules but neither are much help for quandaries such as the above. Resolving them will need time, interaction and teachers’ tolerance with their students’ less-than-skilful attempts. Eventually, many students do master referencing and citation rules, thereby avoiding plagiarism. However, they may still regard doing the referencing as a strange or arbitrary academic obsession. What is the point, many wonder (and especially those who are non-native writers), of paraphrasing a piece of text in flawless English to create an awkwardly expressed, personal version? Students who know that they must change the original to avoid plagiarism may do so via the ‘‘find and replace’’ button in Word in order to create a ‘‘paraphrase’’ (sic). They may generate text by collecting and assembling others’ words then smoothing the joins to make it seem as if the final result is an authored (as opposed to scrap-booked) piece of work. Students who do these things (and many more) do not see their actions as breaching academic regulations because they have a particular view of themselves as learners and because their ideas about learning are often very different from those held by their teachers.
Students’ Ideas About Learning Many studies have attempted to get inside students’ perceptions and beliefs about learning and to understand the stages of their cognitive development. One explanation is especially useful in linking ideas about plagiarism to students’ understanding of learning. Perry (1970) derived insights from his work with American undergraduates over several decades to create an account of students’ cognitive development, starting out as dualist thinkers. Students at this stage of cognitive development regard answers as either right or wrong, true or false; they see learning as providing answers which their teachers or textbooks provide. For these dualist students, discussions of academic integrity policies and tutorials on academic writing are likely to make no sense. Citations and gathering others’ ideas appear unnecessary as they generally see judgments and evaluations as self-evident. Dualist students can often see paraphrasing as inexplicable and regard an assessment task where the answer may not exist and definitely cannot be found, but instead must be created by the student, as more akin to tricks the teacher is playing rather than elements of learning. Since such students must nevertheless be taught to adhere to academic regulations, Angelo (2006) suggests adopting a pragmatic approach. He suggests focussing on behaviours and acceptable practices, rather like teaching new drivers the rules of the Highway Code, instructing them on the best way to do an assignment and reminding them of the consequences should they not comply with rules against plagiarism.
124
J. Carroll
Students do usually move from the dualist position to the next level of cognitive development which Perry terms the ‘multiplistic’ stage. Studies on how rapidly students move to multiplistic thinking show wide variation. One study by Baxter Magolda (2004) which followed the cognitive development of a group of US undergraduate students over 16 years found some required several years to move to the next stage, a few never did so, and most managed to shift their thinking within their undergraduate study. At the multiplistic stage, students recognise multiple perspectives on an issue, situation or problem, however, they do not necessarily develop strategies for evaluating them. Students at this stage say, ‘‘Well, that’s your opinion and we are all entitled to our own views’’. A multiplistic thinker is likely to regard an essay that requires him or her to structure an argument as an invitation to document a disagreement. A student operating at this level is likely to find choosing between sources difficult and to view the concept of authority as clashing with their ‘‘it is all relative’’ thinking. Citation becomes a stumbling block to the free-flowing expression of the student’s own ideas and opinions. Again, as for dualist students, teaching about academic writing and avoiding plagiarism needs to focus on rules and acceptable behaviours whilst ensuring students know there are value-driven reasons for academic integrity policies. Only when students reach the stage Perry terms relativistic are requirements around plagiarism likely to be genuinely understood and acted upon. (Note: the previously mentioned longitudinal study found that not all students reach this point, even after several years of university study.) Relativistic students, as the term implies, see knowledge as linked to particular frames of reference and see their relationship to knowledge as detached and evaluative. Once students see learning in this way, they are able to review their own thinking and see alternative ways of making decisions. When they eventually reach a conclusion (and they see this as difficult), relativistic students are willing to accept that the position is personal, chosen and/or even defensible. Perry describes students at this level as recognising that those in authority (i.e. experts, authors of research publications and, indeed, their own teachers) generally arrive at their positions by similar routes to the student’s own and therefore are also open to review and questioning. For relativistic students, using others’ authority to support an argument, citing information gathered through research and reading, and paraphrasing as a way of showing that points have been understood and personalised seems almost second nature (albeit still tricky to do well as I myself find after many decades of practice). Work produced by relativistic students will be their own even if the resulting document resembles many which have been written before. The challenge comes when it is the teacher’s task to ensure that all students are able to comply with academic regulations that specify students must do their own work and avoid plagiarism. These rules apply to all students, including those with a dualistic view of knowledge, those who have studied previously in
7 Plagiarism as a Threat to Learning: An Educational Response
125
educational environments that valued the ability to memorise then reproduce accurately, and those who are used to accessing information rather than transforming it and becoming personally involved with its use. ‘‘Do your own work’’ may mean very different things to students who see the task as finding rather than making the answer: it may seem pointless to a student who believes they have found someone else’s answer, already perfectly formed, and it makes little sense to a student who seems to believe that the teacher who asks for a 3000 word essay is more interested in that number of words rather than the evidence they contain, more interested in the referencing practices than in assessing the document for evidence that the student has met the learning outcomes in their coursework.
A Reality Check and a Recommendation It is not possible to move students towards understanding and accepting constructivist, multiplistic perspectives at a rate faster than they are willing or able to go. Some may move very slowly indeed. Most students do not arrive in higher education with a fully developed set of academic skills which will equip them to tackle complex academic writing tasks. Perhaps they never did but certainly now, in the 21st century, as widening participation and increasing numbers of students travel around the world to study, many, if not most, students will not arrive at university able to locate sources, evaluate them, read texts for ‘‘useful bits’’, construct an argument, support their views with others’ authority, use in-text citation and find their way around a specific referencing system. Above all, teachers cannot assume that students’ motivations and values match those of their teachers. Students may not attend spontaneously to the things teachers see as important and many say that extrinsic motivations dominate their decisions. Good grades, rather than the learning stated in course outcomes, are what matters for future employment. Our students may be stretched, stressed, distracted and lacking in confidence in themselves as learners. They may be operating with a small vocabulary of English words and finding it difficult or impossible to do themselves justice in an unfamiliar language. They may be confused and alienated by new and unfamiliar academic expectations. The list could go on. The response, as far as plagiarism is concerned, is not despair nor panic, nor returning to exams nor raising the entry requirements. Instead, by linking plagiarism and learning, it is possible to adopt measures which shape and direct student efforts towards the learning outcomes so they do their own work because there is little alternative. The remainder of this chapter outlines what these strategic prompts and requirements might be, focusing primarily on assessment as this holds a key place in the overall management of student plagiarism (Macdonald & Carroll, 2006).
126
J. Carroll
Creating Do Your Own Work Assignments When students are presented with an assignment, they report asking themselves a series of questions about the task such as
Can I do this? Can I do it well (or well enough)? Has someone else already done it and if so, can I find the result? If I do it myself, what else in my life will suffer? Is it worth putting in the time and effort to do it or should I spend my time on other assignments? If I copy or find someone else’s answer, will I be caught? If I’m caught, what will happen? and so on. By asking themselves a similar series of questions when setting assessment tasks, teachers can create courses and assessment tasks that encourage learning and discourage plagiarism. In practice, reviewing one’s own assessments seems more difficult than doing this with and for one’s colleagues. Often the person who sets the assignment sees it as a valid and plausible task whereas someone else can identify opportunities for copying and collusion. Equally, interrogating assessment tasks becomes more likely if it is part of the normal programme/course design process rather than the responsibility of individual course designers or teachers. The following section suggests ways to interrogate course designs and assessment tasks, either individually or as a programme team.
Questions that Encourage Learning and Discourage Plagiarism 1. Are you sure the students will have been taught the skills they will need to display in the assessment? Answering this question positively requires you to be able to list the skills students will need, especially those usually classed as study skills or linked to academic writing. Necessary skills extend beyond the use of a specific referencing system to include those for locating and using information, for structuring an argument, and for using others’ authority to underpin the student’s own ideas and views. As well as generic skills, students will also need explicit teaching of disciplinespecific ways to handle an assessment. Some discipline-specific skills will be linked to the previous generic list which is largely concerned with writing. Writing an essay as a historian, a biologist or a paediatric nurse will have specific requirements which must be learned. Whilst some students are especially good at spotting cues and picking up these matters implicitly, most only become skilful through explicit instruction, practice and acting on feedback. For all students, learning to complete assessments in the discipline is part of learning the discipline itself.
7 Plagiarism as a Threat to Learning: An Educational Response
127
Students who lack the skills to complete assignments often feel they have no alternative but to plagiarise. 2. Have you designed a task that encourages students to invest time and effort? Have you designed ways to dissuade them from postponing assessment tasks to ‘the last minute’? One of the conditions named by Gibbs and Simpson (2004–2005) when they looked at ways in which assessment supports learning was time on task. Of course, time on its own will not always lead to learning valued by the teacher. Students could be distracted by unfocused reading, or side-tracked into creating an over-elaborate presentation. They could be staring at notes or spending hours transcribing lectures. However, unless students put in the time necessary, they cannot develop the understanding and personal development required for tertiary learning. Course design solutions which counter student procrastination and focus time on task include early peer review of drafts where students read each others’ work in progress then comment in writing, with the requirement that the student includes the peer feedback in their final submission by either taking the comments into account or explaining why they did not do so. Other ways to capture students’ time include asking them to log their efforts in on-line threaded discussion or to chunk tasks and monitor their completion. Ensuring activity is occurring is not the same as assessing it. Formative assessment is time-consuming and unrealistic, given teachers’ workloads. However, verifying that work has started by asking to see it and then signing and dating the student’s hard copy for later submission alongside the finished article can be relatively undemanding on teacher time and hugely encouraging to students to ‘get going’. Students who delay work until the last minute often see little alternative to plagiarism. 3. Are there opportunities for others to discuss and interact with the student’s assessment artefact during its production ? This is unlikely to happen unless you specifically design ways in which students can share and comment on each others’ work into the course. You might design in peer review and/or feedback as mentioned in the last section; alternatively, you could set aside some face-to-face time to observe and support the assessment work as part of scheduled class meetings. Evidence of activity from students’ peers, workplace mentors, or tutors can enrich the student’s own understanding as well as authenticate their efforts. 4. Turning to the task itself, can the student find the answer somewhere ? Does the answer already exist in some form? Verbs are often crucial here. Asking students to rank, plan, alter, invent or be ready to debate signals they must do the work whereas asking the student to show his or her knowledge or to demonstrate understanding of a theory or idea (‘‘What is the role of smoking cessation in the overall goal of improving public health?’’) usually needs a few on-line searches, some cut-and-paste and a bit of text-smoothing. Students tell me they can do this ‘‘work’’ in a short time – 30 minutes was the guess from a UK student who reported being given
128
J. Carroll
exactly the title involving smoking. Those with reasonable writing skills engaging in cut-paste-smooth ‘‘authorship’’ (sic) will rarely if ever be identified, leaving the non-native English speakers and poor writers to trigger the assessor’s suspicions. In general, all tasks that ask the student to discuss, describe or explain anything are more appropriately judged through examination, if at all, rather than through coursework. 5. Can the student copy someone else’s answer? In general, if the students sense that the answer already exists, either because the problem is well-known or the issue has been well-covered, then they see finding the answer as being a sensible way to allocate their time rather than as being plagiarism. Students report no difficulty finding answers if they suspect (or know) that the questions have not changed since the last time the course was run, even if the work was not returned to students. They describe easy ways to access others’ coursework or locate informal collections of past student work (and some collections are now located outside of particular universities). Copying between students is common if the assessment task/problem has one answer or a small number of ways in which the problem can be solved. 6. Is the brief clear about what will be assessed and which aspects of the work must be done by the student? Students often sub-contract parts of the work if they consider it to be less important or not relevant to the assessment decision. This means that the brief may need to specify whether or not proofreading is acceptable; who may help with locating and selecting sources; how much help from fellow students is encouraged; and conversely, where such help should stop. If these are not stated, a student must guess or make an effort to ask. Students say they shy away from taking such matters to teachers, assuming they should use tutorial time for ‘‘interesting’’ questions instead. Assessment criteria often state which aspects will be judged for a grade and students can usefully have these drawn to their attention as a way of clarifying which work will be assessed. 7. Does the question, in the way that it is posed, push students towards asking themselves, ‘‘How and where do I find that?’’ or ‘‘How do I make that?’’ Some of the factors that prompt one answer or the other have already been mentioned, such as novelty and clarity of the assessment brief. Here, the issue turns more to the nature of the task itself. Assessments that encourage making, and therefore learning, include those where students must include any of the following:
their own experience or data; recent and specific information such as legislation, current events or recently published material;
application of generic theories in specific, local settings or with particular individuals such as patients the student has cared for or experiments/projects the student has carried out;
7 Plagiarism as a Threat to Learning: An Educational Response
129
individual or individualised data; or reference to specific texts, notes or class activity. Where assessment briefs are general and dated (‘‘Analyse the Eurovision song contest using anthropological theory’’), students can see the assignment as a find it opportunity, involving cut-paste-smooth authorship. (As an aside, when writing this chapter, I made up this example question about Eurovision, basing it loosely on an examination question which I came across several years ago. Then, out of curiosity, I did a quick Google search and, based on less than a minute’s inspection, I found at least ten sources that would be suitable for ‘cutpaste-smooth’ authorship [sic].) By making the task more specific and recent (‘‘Analyse the 2008 Eurovision song contest . . .’’), you lessen the opportunities for finding ready-made answers as there has been less time to collect a corpus of data and little chance that it has already been evaluated. However, by re-shaping the question even further, you signal that it is an invitation to construct an answer. A question such as ‘‘Analyse x number of voting decisions by countries participating in the 2008 Eurovision song contest and using your knowledge of social identity. . ...’’ Faced with this task, the student must seek the voting data then apply social identity theory. Also, by noting which students choose which countries, you can see whether intra-student copying seems common. Alternatively, to lessen the chances that students copy from each other, you might include a personal experience or individual dimension to the assessment task, for example, ‘‘Draw at least three parallels between decisions made by xx nations in the 2008 Eurovision song contest and decisions you yourself have made between yy and zz date which, in both cases, could be construed as expressions of social identity. . .. . ..’’.
Will This Prevent Plagiarism? Pedagogic approaches to dealing with student plagiarism focus on students’ learning rather than on making assumptions about their character or on worrying too much about catching those who cheat. Approaches which prioritise learning engage students’ time and authenticate their effort. Such approaches aim to shift students’ understanding towards what is valued in their own learning – that is, making original work and transforming transmitted knowledge into higher cognitive levels of thinking where they create new understandings and analyse and evaluate knowledge. By designing in teaching and apprenticeship-type practice of academic skills and by designing out easy chances to copy and find answers, teachers encourage learning. However, these practices can never ensure that all students’ behaviour is in line with academic regulations. A few students will deliberately plagiarise and will not be dissuaded from doing so by learning-focused interventions. They may be influenced by fear of being caught and by the consequences of punishment (Norton, Newstead, Franklyn-Stokes, & Tilley, 2001). Some may be stuck in
130
J. Carroll
dualist views of learning, unable to see their responsibilities as being other than providing answers to teachers’ questions. However, for most students most of the time, using strategies that support and encourage learning and which discourage or remove opportunities for copying will tip them into doing their own work, whether they want to or not. Rethinking assessment and course design can only be effective if it operates in conjunction with other actions designed to deal with student plagiarism such as good induction, well-resourced skills teaching, written guidance, and procedures that are used and trusted by teaching staff. Thus, the institution as a whole needs an integrated series of actions to ensure its students are capable of meeting the requirement that they do their own work because it is only in this way that they do their own learning.
References Angelo, T. (2006, June). Managing change in institutional plagiarism practice and policy. Keynote address presented at the JISC International Plagiarism Conference, Newcastle, UK. Atherton, J. (2005) Learning and teaching: Piaget’s developmental theory. Retrieved August 11, 2007, from http://www.learningandteaching.info/learning/piaget.htm Au, C., & Entwistle, N. (1999, August). ‘Memorisation with understanding’ in approaches to studying: cultural variant or response to assessment demands? Paper presented at the European Association for Research on Learning and Instruction Conference, Gothenburg, Sweden. Baxter Magolda, M. (2004). Evolution of a constructivist conceptualization of epistemological reflection. Educational Psychologist, 30, 31–42. Bruner, J. (1960). The process of education. Cambridge, MA: Harvard University Press. Carroll, J. (2007). A handbook for deterring plagiarism in higher education. Oxford: Oxford Brookes University. Dewey, J. (1938). Experience and education. New York: Macmillan. Gibbs, G., & Simpson, C. (2004–2005). Conditions under which assessment supports students’ learning. Learning and Teaching in Higher Education, 1, 3–31. Hayes, N., & Introna, L. (2005), Cultural values, plagiarism, and fairness: When plagiarism gets in the way of learning. Ethics & Behavior, 15(3), 213–231. Handa, N., & Power, C. (2005). Land and discover! A case study investigating the cultural context of plagiarism. Journal of University Teaching and Learning Practice, 2(3b), 64–84. Howard, R. M. (2000). Sexuality, textuality: The cultural work of plagiarism. College English, 62(4), 473–491. JISC-iPAS Internet Plagiarism Advisory Service Retrieved October 11, 2007, from www. jiscpas.ac.uk/index. Lambert, K., Ellen, N., & Taylor, L. (2006). Chalkface challenges: A study of academic dishonesty amongst students in New Zealand tertiary institutions. Assessment and Evaluation in Higher Education, 31(5), 485–503. McCabe, D. (2006, June). Ethics in teaching, learning and assessment. Keynote address at the JISC International Plagiarism Conference, Newcastle, UK. Macdonald, R., & Carroll, J. (2006). Plagiarism: A complex issue requiring a holistic approach. Assessment and Evaluation in Higher Education, 31(2), 233–245. Macfarlane, R. (2007, March 16). There’s nothing original in a case of purloined letters. Times Higher Education Supplement, p. 17. Maslen, G. (2003, January 23). 80% admit to cheating. Times Higher Education Supplement.
7 Plagiarism as a Threat to Learning: An Educational Response
131
Norton, L.S., Newstead, S.E., Franklyn-Stokes, A., & Tilley, A. (2001). The pressure of assessment in undergraduate courses and their effect on student behaviours. Assessment and Evaluation in Higher Education, 26(3), 269–284. Park, C. (2003). In other (people’s) words: Plagiarism by university students – literature and lessons. Assessment and Evaluation in Higher Education, 28(5), 471–488. Pecorari, D. (2003). Good and original: Plagiarism and patch writing in academic secondlanguage writing. Journal of Second Language Writing, 12, 317–345. Perry, W. (1970). Forms of intellectual and ethical development in the college years: A scheme. New York: Holt. Piaget, J. (1928). The child’s conception of the world. London: Routledge and Kegan Paul. Robinson, V., and Kuin, L. (1999). The explanation of practice: Why Chinese students copy assignments. Qualitative Studies in Education. 12(2), 193–210. Rust, C., O’Donovan, B., & Price, M. (2005). A social constructivist assessment process model: How the research literature shows us this could be best practice, Assessment and Evaluation in Higher Education, 30 (3), 231–240. Szabo, A., & Underwood, J. (2004). Cybercheats: Is information and communication technology fuelling academic dishonesty? Active Learning in Higher Education, 5(2), 180–199. Vygotsky, L. (1962). Thought and language. Cambridge, Mass: M.I.T. Press. Write Now CETL. (2007). Retrieved September 8, 2007 from www.writenow.ac.uk/index. html.
Chapter 8
Using Assessment Results to Inform Teaching Practice and Promote Lasting Learning Linda Suskie
Introduction While some may view systematic strategies to assess student learning as merely chores to satisfy quality assurance agencies and other external stakeholders, for faculty who want to foster lasting learning assessment is an indispensable tool that informs teaching practice and thereby promotes lasting learning. Suskie (2004c) frames this relationship by characterizing assessment as part of a continual four-step teaching-learning-assessment cycle. The first step of this cycle is articulating expected learning outcomes. The teaching-learning-process is like taking a trip – one cannot plot out the route (curriculum and pedagogies) without knowing the destination (what students are to learn). Identifying expected student learning outcomes is thus the first step in the process. The more clearly the expected outcomes are articulated (i.e., articulating that one plans to visit San Francisco rather than simply the western United States), the easier it is to assess whether the outcome has been achieved (i.e., whether the destination has been reached). The second step of the teaching-learning-assessment cycle is providing sufficient learning opportunities, through curricula and pedagogies, for students to achieve expected outcomes. Students will not learn how to make an effective oral presentation, for example, if they are not given sufficient opportunities to learn about the characteristics of effective oral presentations, to practice delivering oral presentations, and to receive constructive feedback on them. The third step of the teaching-learning-assessment cycle is assessing how well students have achieved expected learning outcomes. If expected learning outcomes are clearly articulated and if students are given sufficient opportunity to achieve those outcomes, often this step is not particularly difficult – students’ learning opportunities become assessment opportunities as well. Assignments in which students prepare and deliver oral presentations, for example, are not just opportunities for them to hone their oral presentation skills but also L. Suskie Middle States Commission on Higher Education, Philadelphia, PA, USA e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_8, Ó Springer ScienceþBusiness Media B.V. 2009
133
134
L. Suskie
opportunities for faculty to assess how effectively students have developed those skills. The final step of the teaching-learning-assessment cycle is using results to inform teaching-learning practice and thereby promote lasting learning. As Dochy has noted in Chapter 6, assessments can be viewed as tools to enhance the instructional process. Indeed, the best assessments have what Dochy, along with Linn and Dunbar (1991) and Messick (1989, 1994), have called consequential validity and what Moran and Malott (2004) have called pedagogical validity – assessment results are used as the basis for appropriate action and, specifically, to help ‘‘achieve the instructional objectives’’ (Moran & Malott, 2004, p. 137). This chapter explores this conception of assessment as a means of promoting lasting learning through the consideration of three topics. First, teaching practices that have been shown through research to promote deep, lasting learning are reviewed. Next, some key underlying principles for using assessment results to inform teaching practices are discussed. Finally, practical suggestions for using assessment results to inform teaching practice and promote lasting learning are offered and explained through examples drawn from three very different kinds of assessment tools: rubrics (rating scales or scoring guides), multiple choice tests, and qualitative assessments such as reflective writing.
Teaching Practices that Promote Deep, Lasting Learning Today’s faculty are, in many ways, living in a golden age of education: their teaching practices can be informed by several decades of extensive research and publications documenting teaching practices that promote deep, lasting learning (e.g., Angelo, 1993; Association of American Colleges and Universities, 2002; Astin, 1993; Barr & Tagg, 1995; Chickering & Gamson, 1987, 1991; Ewell & Jones, 1996; Huba & Freed, 2000; Kuh, 2001; Kuh, Schuh, Whitt, & Associates, 1991; Light, 2001; McKeachie, 2002; Mentkowski & Associates, 2000; Palmer, 1998; Pascarella, 2001; Pascarella & Terenzini, 2005; Romer, & Education Commission of the States, 1996). Suskie (2004b) has aggregated this work into a list of 13 conditions under which students learn most effectively (p. 311): 1. Students understand course and program goals and the characteristics of excellent work. 2. They are academically challenged and given high but attainable expectations. 3. They spend more time actively involved in learning and less time listening to lectures. 4. They engage in multidimensional real-world tasks in which they explore, analyze, justify, evaluate, use other thinking skills, and arrive at multiple solutions. Such tasks may include realistic class assignments, field experiences, and service-learning opportunities.
8 Using Assessment Results to Inform Teaching Practice
135
5. The diversity of their learning styles is respected; they are given a variety of ways to learn and to demonstrate what they’ve learned. 6. They have positive interactions with faculty and work collaboratively with fellow students. 7. They spend significant time studying and practicing. 8. They receive prompt, concrete feedback on their work. 9. They have opportunities to revise their work. 10. They participate in co-curricular activities that build on what they are learning in the classroom. 11. They reflect on what and how they have learned and see coherence in their learning. 12. They have a synthesizing experience such as a capstone course, independent study, or research project. 13. Assessments focus on the most important course and program goals and are learning activities in their own right. These principles elucidate two ways that assessment activities play a central role in promoting deep, lasting learning. First, as Dochy has noted in Chapter 6, providing a broad array of activities in which students learn and are assessed (Principle 5) promotes deep, lasting learning (Biggs, 2001; Entwistle, 2001). This rich array of evidence also provides a more complete and more meaningful picture of student learning, making assessment evidence more usable and useful in understanding and improving student learning. At its best, this broad array of learning/assessment activities is designed to incorporate three other principles for promoting lasting learning: Principle 5: Students learn more effectively when the diversity of their learning styles is respected and they are given a variety of ways to learn and to demonstrate what they’ve learned. Suskie (2000) has noted that, because every assessment is inherently imperfect, any decisions related to student learning should be based on multiple sources of evidence. Furthermore, because every assessment favors some learning styles over others, students should have ‘‘a variety of ways to demonstrate what they’ve learned’’ (p. 8). Principle 11: Students learn more effectively when they reflect on what and how they have learned and see coherence in their learning. In Chapter 4, Sadler has argued extensively on the educational value of students’ practicing and developing skill in critical self-appraisal. Self-reflection fosters metacognition: learning how to learn by understanding how one learns (Suskie, 2004a). Principle 12: Students learn more effectively when they have a synthesizing experience such as a capstone course, independent study, or research project. Such experiences give students opportunities to engage in synthesis – integrating what they have learned over their academic careers into a new whole (Henscheid, 2000). The second way that assessment activities play a central role in promoting deep, lasting learning is through the time and energy students spend learning what they will be graded on, as Dochy has discussed extensively in Chapter 6. Indeed, Entwistle (2001) asserts, ‘‘It is not the teaching-learning environment
136
L. Suskie
itself that determines approaches to studying, but rather what students believe to be required’’ (p. 16). Biggs (2001) concurs, noting, ‘‘[Students] see the assessment first, then learn accordingly, then at last see the outcomes we are trying to impart’’ (p. 66). The assessments that are used to grade students can thus be a powerful force influencing what and how students learn. This idea is captured in the final principle: ‘‘Assessments focus on the most important course and program goals and are learning activities in their own right.’’ Four of the principles are keys to making this happen: Principle 1: Students learn more effectively when they understand course and program goals and the characteristics of excellent work. As Sadler has discussed in Chapter 4, students who want to do well learn more effectively when they understand clearly why they have been given a particular assignment, what they are expected to learn from it, and how they will be graded on it. If students know, for example, that their papers will be graded in part on how effective the introduction is, some will try their best to write an effective introduction. Principle 2: Students learn more effectively when they are academically challenged and given high but attainable expectations. While there is a limit to what students can achieve in a given amount of time (even the best writing faculty, for example, cannot in a few weeks enable first-year students to write at the level expected of doctoral dissertations), many students respond remarkably well to high standards, provided that they are given a clear roadmap on how to achieve them (De Sousa, 2005). First-year students, for example, may be able to write quite competent library research papers if the research and writing process is broken down into relatively small, manageable steps (identifying the research topic, finding appropriate library resources, reading and analyzing them, outlining the paper, etc.) and students are guided through each step. Principle 7: Students learn more effectively when they spend significant time studying and practicing. While this is an obvious truism, there is evidence that students in some parts of the world spend relatively little time on out-of-class studies. Those who attended college a generation or two ago may recall the adage to study two hours outside of class for every hour spent in class. For a full-time student spending 15 hours in class, this would mean spending about 30 hours on out-of-class studies. But according to the National Survey of Student Engagement (2006), about 45% of today’s American college freshmen and seniors report spending less than 11 hours a week preparing for class (studying, reading, writing, doing homework or lab work, analyzing data, rehearsing, and other academic activities). Only about ten percent spend more than 25 hours a week preparing for class. While these results reflect part-time as well as full-time students, it is nonetheless clear that, at least in the United States, much more could be done to have students take responsibility for their learning. Principle 8: Students learn more effectively when they receive prompt, concrete feedback on their work. As Dochy has discussed in Chapter 6, we all benefit from constructive feedback on our work, and students are no different (Ovando, 1992). The key is to get feedback to students quickly enough that they can use the feedback to improve their learning. Faculty know all too well that final
8 Using Assessment Results to Inform Teaching Practice
137
exams graded after classes end are often not retrieved by students and, if they are, checked only for the final grade. The problem is that providing constructive, timely feedback can take a great deal of time. Here are some practical suggestions on ways to minimize that time:
Return ungraded any assignments that show little or no effort, with a request that the paper be redone and resubmitted within 24 hours. As Walvoord and Anderson (1998) point out, if students make no effort to do an assignment well, why should the professor make any effort to offer feedback? When appropriate (see Sadler, Chapter 4 of this book) use rubrics (rating scales or scoring guides) to evaluate student work. They speed up the process because much feedback can be provided simply by checking or circling appropriate boxes rather than writing comments. Use Haswell’s (1983) minimal marking method: rather than correct grammatical errors, simply place a check in the margin next to the error, and require the student to identify and correct errors in that line. Provide less feedback on minor assignments. Some short homework assignments, for example, can be marked simply with a plus symbol for outstanding work, a checkmark for adequate work, and a minus symbol for minimal effort.
Using Assessment Results to Promote Lasting Learning: Two Underlying Principles Assessment results can provide a wealth of information to help faculty understand how effective their teaching is and how it might be improved, provided that two principles are followed when the assessments are planned.
Articulate the Decisions that Assessment Will Inform MacGregor, Tinto, and Lindblad (2001) note, ‘‘If you’re not clear on the goals of an assessment and the audiences to which that assessment will be directed, it’s hard to do the assessment well. So your first task is to ask yourself why, with whom, and for what purpose you are assessing. . .’’ (p. 48). Assessment results can inform decisions in at least five areas: Learning outcomes. Assessment results might help faculty decide, for example, if their statements of expected learning outcomes are sufficiently clear and focused or if they have too many intended learning outcomes to cover in the allotted instructional time. Curriculum. Assessment results might help faculty decide, for example, if classes or modules taught by several faculty have sufficient uniformity across sections or whether a service-learning component is achieving its goal.
138
L. Suskie
Pedagogy. Assessment results might help faculty decide, for example, whether online instruction is as effective as traditional classroom-based instruction or whether collaborative learning is more effective than traditional lectures. Assessment. Assessment results can, of course, help faculty decide how useful their assessment strategies have been and what changes are needed to improve their effectiveness. Resource allocations. Assessment results can provide a powerful argument for resource allocations. Disappointing evidence of student writing skills, for example, can lead to fact-based arguments for, say, more writing tutors or more online writing software. Poor student performance on a technology examination required for licensure in a profession can be a compelling argument for upgrading technologies in laboratories. Understanding and articulating the decisions that a particular assessment will inform helps to ensure that the assessment results will indeed help enlighten those decisions. Suppose, for example, that a professor is assessing student learning in a biology laboratory. If the results are to inform decisions about the laboratory’s learning outcomes, for example, the assessments will need to be designed to assess each expected outcome. If the results are to inform decisions about curriculum, the assessments will need to be designed to assess each aspect of the curriculum. And if the results are to inform decisions about pedagogy, the assessments may need to be designed to provide comparable information on different pedagogical approaches.
Develop Assessment Strategies that Will Provide Appropriate Frames of Reference to Inform Those Decisions Suskie (2007) has observed that assessment results considered in isolation are meaningless – they have significance only if they are compared against some kind of benchmark or frame of reference. Suskie has identified nine such frames of reference, four of which are especially relevant to most faculty. Under the strengths and weaknesses frame of reference, faculty compare the sub-scores of an assessment to identify students’ relative strengths and weaknesses. An assessment of writing skills, for example, might determine that students are relatively good at writing a strong introduction but relatively weak at supporting arguments with appropriate evidence. This frame of reference is often of great interest and great value to faculty, as it tells them what their students are learning well and what areas need more or different attention. In order to use this frame of reference, the assessment must be designed to generate comparable results on aspects of the trait being assessed, such as effective writing. This is generally accomplished by using a rubric or rating scale that lists the aspects being evaluated. Some published tests and surveys also generate sub-scores that can be compared with one another. In contrast, a holistic assessment, generating only one score per student without subscores, cannot provide this kind of information.
8 Using Assessment Results to Inform Teaching Practice
139
Under the improvement frame of reference, faculty compare student assessment results at the beginning and end of a class, module, course, or program. Faculty might, for example, give students the final examination on the first day of class and compare those scores against those on the same examination given at the end of instruction. Or faculty might give students writing assignments in the first and last weeks that are evaluated using the same criteria. Such a value-added approach is intrinsically appealing, as it appears to convey how much students’ learning has improved as a result of faculty teaching. But this approach has a number of shortcomings. One is that students must be motivated to do their best on the entry-point assessment. This can be a challenge because grading, generally a strong motivator, is often inappropriate at this point: Is it fair to grade students when they have not yet had an opportunity to learn anything? Another challenge is that both entry- and exit-point assessment information must be collected for this approach to be meaningful. This is not possible in situations in which sizable numbers of students either transfer into a class, module, course, or program after it has begun or drop out of it before it is completed. In these situations, the students who persist from beginning to end may not be a representative sample of all students who are subject to instruction. Yet another challenge with this value-added approach is that gain scores, the difference between entry- and exit-point assessment results, are notoriously unreliable. As noted by Banta and Pike (2007) and Pike (2006), the measurement error of gain scores is essentially double that of each assessment alone. This sizable measurement error can mask meaningful gains. But perhaps the major concern about the improvement or value-added frame of reference is that it is often confused with the pre-post experimental design (Campbell & Stanley, 1963) used in the social sciences. In pre-post experimental designs, subjects are randomly assigned to control and experimental treatments; this allows the research to separate the impact of the treatment from extraneous factors. In higher education, however, faculty cannot randomly assign students to institutions or programs, so if faculty find significant growth they cannot conclude that it is due to the learning experience or to extraneous factors. If a student’s oral communication skills improve, for example, faculty cannot be certain that the improvement is due to work in class or to, say, concurrent participation in a club or a part-time job in which the student uses and improves oral communication skills. Under the historical trends frame of reference, faculty compare student assessment results against those of prior cohorts of students. This frame of reference is of particular interest to faculty who want to know if their efforts to improve their curricula and pedagogies are yielding desired improvements in student learning. This frame of reference can only be used, of course, if identical or parallel assessments can be utilized with successive cohorts of
140
L. Suskie
students. This is not always possible – sometimes curricula must change in order to meet employer and societal demands, so assessments used a few years ago may no longer be appropriate or relevant today. Another challenge with this approach is that, as with the improvement frame of reference discussed above, this is not an experimental design with random assignments to cohorts. As a result, faculty cannot be certain that changes in student learning are due to changes in curricula or pedagogies or due to changes in the students themselves. Faculty teaching at an institution that has increased its admission standards, for example, cannot be certain that the growth they see in student learning is due to changes in curricula and pedagogies or is simply because today’s students are more talented, motivated, or prepared to learn. Under the standards frame of reference, faculty compare assessment results against an established standard, set either by the faculty or by a regional or national agency or organization. Faculty might decide for example, that students must answer at least 70% of test questions correctly in order to pass an examination, or a nursing licensure agency might state that nursing students must earn a particular score on a licensure examination in order to be licensed to practice. This frame of reference is of interest to faculty who want or need to ensure that students are meeting particular standards. Many colleges and universities, for example, want to ensure that all students graduate with a particular level of writing skill. The challenge with this approach is, not surprisingly, setting an appropriate standard. If the standard has been set by an agency or organization, the work is done, of course, but setting a defensible, valid local standard can be very difficult and time-consuming. While faculty have been doing this for generations (setting a standard, for example, that students must answer at least 65% of questions correctly in order to pass a final examination), in reality these standards are often set arbitrarily and without clear justification. Livingston and Zieky (1982) offer a variety of techniques for establishing defensible standards.
Practical Suggestions for Summarizing, Interpreting, and Using Assessment Results To Whom It May Concern: Promote Deep, Lasting Learning In order to use assessment results to inform teaching practice and thereby improve student learning, they must be summarized in a way that busy faculty can quickly and easily understand. They must then be interpreted in appropriate ways so they may be used to inform teaching practice and thereby promote deep, lasting learning. Summaries, analyses, and interpretations should aim to answer two fundamental questions: (1) What have we learned about our students’ learning? and (2) What are we going to do about what we
8 Using Assessment Results to Inform Teaching Practice
141
have learned? The key is to ensure that the steps that are taken build upon practices that promote deep, lasting learning. What follow are practical suggestions for summarizing, interpreting, and using assessment results to answer these questions for three different assessment practices: the use of rubrics (rating scales or scoring guides), multiple choice tests, and reflective writing exercises.
Rubrics As Sadler has discussed in Chapter 4, rubrics are a list of the criteria used to evaluate student work (papers, projects, performances, portfolios, and the like) accompanied by a rating scale. Using rubrics can be a good pedagogical practice for several reasons:
Creating a rubric before the corresponding assignment is developed, rather than vice versa, helps to ensure that the assignment will address what the professor wants students to learn. Giving students the rubric along with the assignment is an excellent way to help them understand the purpose of the assignment and how it will be evaluated. Using a rubric to grade student work ensures consistency and fairness. Returning the marked rubric to students with their graded assignment gives them valuable feedback on their strengths and weaknesses and helps them understand the basis of their grade. In order to understand rubric results and use them to inform teaching practice, it is helpful to tally students’ individual ratings in a simple chart. Table 8.1 provides a hypothetical example for the results of a rubric used to evaluate the portfolios of 30 students studying journalism. It is somewhat difficult to understand what Table 8.1 is saying. While it seems obvious that students performed best on the fourth criterion (‘‘understanding professional ethical principles and working ethically’’), their other relative strengths and weaknesses are not as readily apparent. Table 8.1 would be more useful if the results were somehow sorted. Let us suppose that, in this hypothetical example, the faculty’s goal is that all students earn at least ‘‘very good’’ on all criteria. Table 8.2 sorts the results based on the number of students who score either ‘‘excellent’’ or ‘‘very good’’ on each criterion. Table 8.2 also converts the raw numbers into percentages, because this allows faculty to compare student cohorts of different sizes. The percentages are rounded to the nearest whole percentage, to reduce the volume of information to be digested and to keep the reader from focusing on trivial differences. Now the results jump out at the reader. It is immediately apparent that students not only did relatively well on the fourth criterion but also on the first, third and fifth. It is equally apparent that students’ weakest area (of those evaluated here) is the tenth criterion (‘‘applying basic numerical and statistical
142
L. Suskie
Table 8.1 An example of tallied results of a rubric used to evaluate the portfolios of 30 students studying journalism The student: Excellent Very Adequate Inadequate Good 1.
2.
3.
4.
5.
6. 7.
8.
9.
10. 11.
Understands and applies the principles and laws of freedom of speech and press. Understands the history and role of professionals and institutions in shaping communications. Understands the diversity of groups in a global society in relationship to communications. Understands professional ethical principles and works ethically in pursuit of truth, accuracy, fairness, and diversity. Understands concepts and applies theories in the use and presentation of images and information. Thinks critically, creatively, and independently. Conducts research and evaluates information by methods appropriate to the communications profession(s) studied. Writes correctly and clearly in forms and styles appropriate for the communications profession(s) studied and the audiences and purposes they serve. Critically evaluates own work and that of others for accuracy and fairness, clarity, appropriate style and grammatical correctness. Applies basic numerical and statistical concepts. Applies tools and technologies appropriate for the communications profession(s) studied.
15
14
1
0
18
8
4
0
12
17
1
0
27
3
0
0
12
17
1
0
2
20
5
3
6
21
3
0
9
14
6
1
6
19
3
2
10
8
9
3
8
19
3
0
concepts’’). Another area of relative weakness is the sixth criterion (‘‘thinking critically, creatively, and independently’’). Table 8.2 thus provides a clear roadmap for faculty reflection on students’ relative strengths and weaknesses. Professors might begin this reflection by first examining how the curriculum addresses the application of basic numerical and statistical concepts by reviewing syllabi to identify the courses or modules in which this skill is addressed. The faculty might then discuss how well the identified courses or modules follow the
8 Using Assessment Results to Inform Teaching Practice
The student:
3.
5.
11.
7.
2.
9.
8.
6.
10.
143
Table 8.2 An improved version of Table 8.1 Excellent Excellent Very Adequate + Very good good
Understands the diversity of groups in a global society in relationship to communications. Understands concepts and applies theories in the use and presentation of images and information. Applies tools and technologies appropriate for the communications profession(s) studied. Conducts research and evaluates information by methods appropriate to the communications profession(s) studied. Understands the history and role of professionals and institutions in shaping communications. Critically evaluates own work and that of others for accuracy and fairness, clarity, appropriate style and grammatical correctness. Writes correctly and clearly in forms and styles appropriate for the communications profession(s) studied and the audiences and purposes they serve. Thinks critically, creatively, and independently. Applies basic numerical and statistical concepts.
Inadequate
97%
40%
57%
3%
0%
97%
40%
57%
3%
0%
90%
27%
63%
10%
0%
90%
20%
70%
10%
0%
87%
60%
27%
13%
0%
83%
20%
63%
10%
7%
77%
30%
47%
20%
3%
73%
7%
67%
17%
10%
60%
33%
27%
30%
10%
144
L. Suskie
thirteen principles for promoting deep, lasting learning articulated in this chapter. They might, for example, ask themselves:
Are we giving enough time and attention in these classes to applying basic
numerical and statistical concepts? Are we giving students enough classwork and assignments on this skill? Do students spend enough time actively applying numerical and statistical concepts? Are the assignments in which they apply numerical and statistical concepts real world problems, the kinds that may have more than one ‘‘correct’’ answer? Would students benefit from working with fellow students on these assignments rather than alone? Do we give students sufficient feedback on their work in applying numerical and statistical concepts? Do we give them sufficient opportunities to correct or revise their work?
Discussion of these and similar questions will doubtless lead to ideas about ways to strengthen students’ skills in applying numerical and statistical concepts. Faculty might decide, for example, to incorporate the application of numerical and statistical concepts into more courses or modules, to give students more practice through additional homework assignments, and to give students more collaborative projects in which they must interpret real world data with their fellow students.
Multiple Choice Tests The validity and usefulness of multiple choice tests is greatly enhanced if they are developed with the aid of a test blueprint or outline of the knowledge and skills being tested. Table 8.3 is an example of a simple test blueprint. In this example, the six listed objectives represent the professor’s key objectives for this course or module, and the third, fourth, and sixth objectives are considered the most important objectives. This blueprint is thus powerful evidence of the content validity of the examination and a good framework for summarizing the examination results, as shown in Table 8.4. Again, the results in Table 4 have been sorted from highest to lowest to help readers grasp the results more quickly and easily. Table 8.3 Example of a test blueprint for a statistics examination 1 item 1 item 6 items 4 items 2 items 4 items
Determine the value of t needed to find a confidence interval of a given size. Understand the effect of p on the standard error of a proportion. Choose the appropriate statistical analysis for a given research problem. Decide on the appropriate null and alternative hypotheses for a given research problem and state them correctly. Identify the critical value(s) for a given statistical test. Choose the appropriate standard error formula for a given research problem.
8 Using Assessment Results to Inform Teaching Practice
145
Table 8.4 Results of a statistics examination, matched to the test blueprint Percentage of students answering correctly
Learning Objective
95%
Determine the value of t needed to find a confidence interval of a given size. Understand the effect of p on the standard error of a proportion. Decide on the appropriate null and alternative hypotheses for a given research problem and state them correctly. Identify the critical value(s) for a given statistical test. Choose the appropriate standard error formula for a given research problem. Choose the appropriate statistical analysis for a given research problem
88% 85% 79% 62% 55%
Table 8.4 makes clear that students overall did quite well in determining the value of t needed to find a confidence interval, but a relatively high proportion were unable to choose the appropriate statistical analysis for a given research problem. Table 8.4 provides another clear roadmap for faculty reflection on students’ relative strengths and weaknesses. The professor might consider how to address the weakest area – choosing appropriate statistical analyses – by again reflecting on the practices that promote deep, lasting learning discussed earlier in this chapter. The professor might, for example, decide to address this skill by revising or expanding lectures on the topic, giving students additional homework on the topic, and having students work collaboratively on problems in this area. Another useful way to review the results of multiple choice tests is to calculate what testing professionals (e.g., Gronlund, 2005; Haladyna, 2004; Kubiszyn & Borich, 2002) call the discrimination of each item (Suskie, 2004d). This metric, a measure of the internal reliability or internal consistency of the test, is predicated on the assumption that students who do relatively well on a test overall will be more likely to get a particular item correct than those who do relatively poorly. Table 8.5 provides a hypothetical example of discrimination results for a 5-item quiz. In this example, responses of the ten students with the highest overall scores on this quiz are compared against those of the ten students with the lowest overall quiz scores.
Table 8.5 Discrimination results for a short quiz taken by 30 students Item Number of ‘‘Top Number of ‘‘Bottom Difference number 10’’ Students 10’’ Students (Discrimination) answering correctly answering correctly 1 2 3 4 5
10
0
10
8 5 10 4
6 5 10 8
2 0 0 4
146
L. Suskie
These five items have widely varying levels of discrimination:
Item 1 has the best possible discrimination – all the top students answered it
correctly, while none of the bottom students did. This is truly an item that ‘‘separates the wheat from the chaff,’’ discriminating students who have truly mastered class objectives from those who have not. Item 2 is an example of an item with good discrimination, though not as strong as Item 1, simply because it is easier. Items that fifty per cent of students get wrong have the greatest potential for discrimination; easier items have lower potential for discrimination. Item 3 has no discrimination – equal numbers of top and bottom students answered it correctly. While an item with no discrimination is not inherently a poor item, it would be worth a closer look to see why a number of top students struggled with it while some bottom students did not. Asking the top students why they got this question wrong would probably give the professor ideas on ways to revise the question for future administrations. Item 4 also has no discrimination, but this is simply because everyone answered it correctly. As already noted, easy items cannot discriminate well between top and bottom students, and items that are so easy that everyone answers them correctly will, of course, not discriminate at all. Item 5 discriminates negatively – students in the top group were more likely to answer it incorrectly than students in the bottom group. It is very likely that students in the top group misinterpreted either the question or one or more of its options, probably reading more into the item than the professor intended. This is an item that performed so poorly that it should be removed from the scores of these students and revised before it is used again. As with Item 3, the top students who got this question wrong would doubtless be able to give the professor suggestions on how to revise the item to minimize future misinterpretations.
Reflective Writing Reflective writing is a learning strategy in which students reflect and write on what and how they have learned. Students engaged in reflective writing typically reflect and write on ‘‘the larger context, the meaning, and the implications of an experience or action’’ (Branch & Paranjape, 2002, p. 1185) and ‘‘pull together a broad range of previous thinking or knowledge in order to make greater sense of it for another purpose that may transcend the previous bounds of personal knowledge or thought’’ (Moon, 2001, p. 5). Reflective writing thus helps students develop a number of skills (Costa & Kallick, 2000), including skill in synthesis—pulling together what they have learned in order to see the big picture – and metacognition – understanding how one learns. Reflective writing can also be a valuable assessment strategy. Costa and Kallick (2002) note that reflective writing provides an opportunity for
8 Using Assessment Results to Inform Teaching Practice
147
‘‘documenting learning and providing a rich base of shared knowledge’’ (p. 60), while the Conference on College Composition and Communication notes that ‘‘reflection by the writer on her or his own writing processes and performances holds particular promise as a way of generating knowledge about writing’’ (2006). Reflective writing may be especially valuable for assessing ineffable outcomes such as attitudes, values, and habits of mind. An intended student learning outcome to ‘‘be open to diverse viewpoints’’ would be difficult to assess through a traditional multiple choice test or essay assignment, because students would be tempted to provide what they perceive to be the ‘‘correct’’ answer rather than accurate information on their true beliefs and views. Because reflective writing seeks to elicit honest answers rather than ‘‘best’’ responses, reflective writing assignments may be assessed and the results used differently than other assessment strategies. While the structure of a student’s reflective writing response can be evaluated using a rubric, the thoughts and ideas expressed may be so wide-ranging that qualitative rather than quantitative assessment strategies may be more appropriate. Qualitative assessment techniques are drawn from qualitative research approaches, which Marshall and Rossman (2006) describe as ‘‘naturalistic,’’ ‘‘fundamentally interpretive,’’ relying on ‘‘complex reasoning that moves dialectically between deduction and induction,’’ and drawing on ‘‘multiple methods of inquiry’’ (p. 2). Qualitative assessment results may thus be summarized differently than quantitative results such as those from rubrics and multiple choice tests, which typically yield ratings or scores that can be summarized using descriptive and inferential statistics. Qualitative research techniques aim for naturalistic interpretations rather than, say, an average score. Qualitative assessment techniques typically include sorting the results into categories (e.g., Patton, 2002). Tables 8.6 and 8.7 provide an example of a summary of qualitative assessment results from a day-long workshop on the assessment of student learning, conducted by the author. The workshop addressed four topics: principles of good practice for assessment, promoting an institutional culture of assessment, the articulation of learning outcomes, and assessment strategies including rubrics. At the end of the day, participants were asked two questions, adapted from the minute paper suggested by Angelo Table 8.6 Responses to ‘‘What was the most useful or meaningful thing you learned today?’’ by Participants at a one-day workshop on assessing student learning Percent of respondents (%)
Category of response
40 20 16 10 13
Assessment strategies (e.g., rubrics) Culture of assessment Principles of good practice Articulating learning outcomes Miscellaneous
148
L. Suskie Table 8.7 Responses to ‘‘What question remains uppermost on your mind as we end this workshop?’’ by participants at a one-day workshop on assessing student learning Percent of Respondents (%) Category of Response 27 13 43 16
Culture of assessment Organizing assessment across an institution Unique questions on other topics No response
and Cross (1993): ‘‘What was the most useful or meaningful thing you learned today?’’ and ‘‘What one question is uppermost on your mind as we end this workshop?’’ For the first question, ‘‘What was the most useful or meaningful thing you learned today?’’ (Table 8.6), the author sorted comments into five fairly obvious categories: the four topics of the workshop plus a ‘‘miscellaneous’’ category. For the second question, ‘‘What one question is uppermost on your mind as we end this workshop?’’ (Table 8.7), potential categories were not as readily evident. After reviewing the responses, the author settled on the categories shown in Table 8.7, then sorted the comments into the identified categories. Qualitative analysis software is available to assist with this sorting – such programs search responses for particular keywords provided by the professor. The analysis of qualitative assessment results – identifying potential categories for results and then deciding the category into which a particular response is placed – is, of course, inherently subjective. In the workshop example described here, the question, ‘‘Would it be helpful to establish an assessment steering committee composed of faculty?’’ might be placed into the ‘‘culture of assessment’’ category by one person and into the ‘‘organizing assessment’’ category by another. But while qualitative assessment is a subjective process, open to inconsistencies in categorizations, it is important to note that any kind of assessment of student learning has an element of subjectivity, as the questions that faculty choose to ask of students and the criteria used to evaluate student work are a matter of professional judgment that is inherently subjective, however well-informed. Inconsistencies in categorizing results can be minimized by having two readers perform independent categorizations, then reviewing and reconciling differences, perhaps with the introduction of a third reader for areas of disagreement. Qualitative assessment results can be extraordinarily valuable in helping faculty understand and improve their teaching practices and thereby improve student learning. The ‘‘Minute Paper’’ responses to this workshop (Tables 8.6 and 8.7) provided a number of useful insights to the author:
The portion of the workshop addressing assessment strategies was clearly very successful in conveying useful, meaningful ideas; the author could take satisfaction in this and leave it as is in future workshops.
8 Using Assessment Results to Inform Teaching Practice
149
The portion of the workshop addressing learning outcomes was not especially successful; while there were few if any questions about this topic, few participants cited it as especially useful or meaningful. (A background knowledge probe (Angelo & Cross, 1993) would have revealed that most participants arrived at the workshop with a good working knowledge of this topic.) The author used this information to modify her workshop curriculum to limit coverage of this topic to a shorter review. Roughly one in eight participants had questions about organizing assessment activities across their institution, a topic not addressed in the workshop. The author used this information to modify her workshop curriculum to incorporate this topic. (Reducing time spent on learning outcomes allowed her to do this.) The portion of the workshop addressing promoting a culture of assessment was clearly the most problematic. While a fifth of all respondents found it useful or meaningful, two-fifths had questions about this topic when the workshop concluded. Upon reflection, the author realized that she placed this topic at the end of the workshop curriculum, addressing it at the end of the day when she was rushed and participants were tired. She modified her curriculum to move this topic to the beginning of the workshop and spend more time on it. As a result of this reflective writing assignment and the subsequent changes made in curriculum and pedagogy, participant learning increased significantly in subsequent workshops, as evidenced by the increased proportion of comments citing organizing assessment activities and promoting a culture of assessment as the most useful or meaningful things learned and the smaller proportion of participants with questions about these two areas.
Conclusion Why do faculty assess student learning? One longstanding reason, of course, is to form a basis for assigning grades to students. Another recently emerging reason is to demonstrate to various constituents – government agencies, quality assurance agencies, taxpayers, employers, and students and their families – that colleges and universities are indeed providing students with the quality education they promise. But the most compelling reason for many faculty to engage in assessing student learning is for the opportunity it provides to improve teaching practices and thereby foster deep, lasting learning. This chapter has described a number of ways that assessment activities can accomplish these ends:
Provide a broad array of learning and assessment activities. Design assessment activities (e.g., assignments, tests) so that they address key learning outcomes.
Help students understand course and program expectations and the characteristics of excellent work.
150
L. Suskie
Challenge students by giving them high but attainable expectations. Require students to spend significant time studying and practicing. Give students prompt, concrete feedback on their work. Articulate the decisions that assessment results are to inform. Design assessments so that they will provide appropriate frames of reference to inform those decisions. Use test blueprints to plan multiple choice tests and rubrics to plan other assignments and tests. Summarize assessment results into simple tables, perhaps with results sorted so that the best and most disappointing results can be quickly identified. Use the results of rubrics and multiple choice tests to identify students’ relative strengths and weaknesses and ways that the assessments themselves might be improved. Use the results of qualitative assessments to identify areas in which students are confused, dissatisfied with their learning, or fail to demonstrate attainment of key learning outcomes. Use recent research on strategies that promote deep, lasting learning, along with feedback from students, to plan how to address assessment results that are disappointing and thereby improve teaching practice and promote lasting learning.
Faculty who have a passion for teaching are always looking for ways to improve their practice and foster lasting student learning. Once they understand the nature and use of assessment, they quickly come to realize that assessment is one of the best tools in their teaching toolbox for achieving these ends.
References Angelo, T. A. (1993, April). A ‘‘teacher’s dozen’’: Fourteen general, research-based principles for improving higher learning in our classrooms. AAHE Bulletin, 45(8), 3–7, 13. Angelo, T. A., & Cross, K. P. (1993). Classroom assessment techniques: A handbook for college teachers (2nd ed.). San Francisco: Jossey-Bass. Association of American Colleges and Universities. (2002). Greater expectations: A New vision for learning as a nation goes to college. Washington, DC: Author. Astin, A. W. (1993). What matters in college: Four critical years revisited. San Francisco: Jossey-Bass. Banta, T. W., & Pike, G. R. (2007). Revisiting the blind alley of value added. Assessment Update, 19(1), 1–2, 14–15. Barr, R. B., & Tagg, J. (1995). From teaching to learning: A new paradigm for undergraduate education. Change, 27(6), 12–25. Biggs, J. (2001). Assessing for quality in learning. In L. Suskie (Ed.), Assessment to promote deep learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 65–68). Branch, W. T., & Paranjape, A. (2002). Feedback and reflection: Teaching methods for clinical settings. Academic Medicine, 77, 1185–1188. Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago: Rand McNally. Chickering, A. W., & Gamson, Z. (1987). Seven principles for good practice in undergraduate education. AAHE Bulletin, 39(7), 5–10.
8 Using Assessment Results to Inform Teaching Practice
151
Chickering, A. W., & Gamson, Z. (1991). Applying the seven principles for good practice in undergraduate education. New directions for teaching and learning, No. 47. San Francisco: Jossey-Bass. Conference on College Composition and Communication. (2006). Writing assessment: A position statement. Urbana, IL: National Council of Teachers of English. Retrieved September 4, 2007, from http://www.ncte.org/cccc/resources/123784.htm Costa, A., & Kallick B. (2000). Getting into the habit of reflection. Educational Leadership, 57(7), 60–62. De Sousa, D. J. (2005). Promoting student success: What advisors can do (Occasional Paper No. 11). Bloomington, Indiana: Indiana University Center for Postsecondary Research. Entwistle, N. (2001). Promoting deep learning through teaching and assessment. In L. Suskie (Ed.), Assessment to promote deep learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 9–20). Ewell, P. T., & Jones, D. P. (1996). Indicators of ‘‘good practice’’ in undergraduate education: A handbook for development and implementation. Boulder, CO: National Center for Higher Education Management Systems. Gronlund, N. E. (2005). Assessment of student achievement (8th ed.) Boston: Allyn & Bacon. Haladyna, T. M. (2004). Developing and validating multiple choice items. Boston, MA: Allyn & Bacon. Haswell, R. (1983). Minimal marking. College English, 45(6), 600–604. Henscheid, J. M. (2000). Professing the disciplines: An analysis of senior seminars and capstone courses (Monograph No. 30). Columbia, South Carolina: National Resource Center for the First-Year Experience and Students in Transition. Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting the focus from teaching to learning (pp. 32–64). Needham Heights, MA: Allyn & Bacon. Kubiszyn, T., & Borich, G. D. (2002). Educational testing and measurement: Classroom application and management (7th ed.) San Francisco: Jossey-Bass. Kuh, G. (2001). Assessing what really matters to student learning: Inside the National Survey of Student Engagement. Change, 33(3), 10–17, 66. Kuh, G. D., Schuh, J. H., Whitt, E. J., & Associates. (1991). Involving colleges: Successful approaches to fostering student learning and development outside the classroom. San Francisco: Jossey-Bass. Light, R. (2001). Making the most of college: Students speak their minds. Cambridge, MA: Harvard University Press. Linn, R., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Los Angeles: University of California Center for Research on Evaluation, Standards, and Student Testing. Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards on performance on educational and occupational tests. Princeton: Educational Testing Service. MacGregor, J., Tinto, V., & Lindblad, J. H. (2001). Assessment of innovative efforts: Lessons from the learning community movement. In L. Suskie (Ed.), Assessment to promote deep learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 41–48). McKeachie, W. J. (2002). Teaching tips: Strategies, research, and theory for colleges and university teachers (11th ed.). Boston: Houghton Mifflin. Marshall, C., & Rossman, G. B. (2006). Designing qualitative research (4th ed.). Thousand Oaks, CA: Sage. Mentkowski, M., & Associates. (2000). Learning that lasts: Integrating learning, development, and performance in college and beyond. San Francisco, CA: Jossey-Bass. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement. New York: Macmillan. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23.
152
L. Suskie
Moon, J. (2001). Reflection in learning and professional development: Theory and practice. London: Routledge. Moran, D. J., & Malott, R. W. (Eds.). (2004). Evidence-based educational methods. San Diego: Elsevier Academic Press. National Survey of Student Engagement. (2006). Engaged learning: Fostering success for all students: Annual report 2006. Bloomington, IN: Author. Ovando, M. N. (1992). Constructive feedback: A key to successful teaching and learning. Austin, TX: University of Texas at Austin, College of Education, Department of Education Administration. ERIC Document Reproductive Services No. ED 404 291. Palmer, P. J. (1998). The courage to teach: Exploring the inner landscape of a teacher’s life. San Francisco: Jossey-Bass. Pascarella, E. T. (2001). Identifying excellence in undergraduate education: Are we even close? Change, 33(3), 19–23. Pascarella, E. T., & Terenzini, P.T. (2005). How college affects students: A third decade of research. San Francisco: Jossey-Bass. Patton, M. Q. (2002). Qualitative research & evaluation methods (3rd ed.). Thousand Oaks, CA: Sage. Pike, G. R. (2006). Assessment measures: Value-added models and the Collegiate Learning Assessment. Assessment Update, 18(4), 5–7. Romer, R., & Education Commission of the States. (1996, April). What research says about improving undergraduate education. AAHE Bulletin, 48(8), 5–8. Suskie, L. (2000, May). Fair assessment practices: Giving students equitable opportunities to demonstrate learning. AAHE Bulletin, 52(9), 7–9. Suskie, L. (2004a). Encouraging student reflection. In Assessing student learning: A common sense guide (pp. 168–184). San Francisco: Jossey-Bass Anker Series. Suskie, L. (2004b). Using assessment findings effectively and appropriately. In Assessing student learning: A common sense guide (pp. 300–317). San Francisco: Jossey-Bass Anker Series. Suskie, L. (2004c). What is assessment? Why assess? In Assessing student learning: A common sense guide (pp. 3–17). San Francisco: Jossey-Bass Anker Series. Suskie, L. (2004d). Writing a traditional objective test. In Assessing student learning: A common sense guide (pp. 200–221). San Francisco: Jossey-Bass Anker Series. Suskie, L. (2007). Answering the complex question of ‘‘How good is good enough?’’ Assessment Update, 19(4), 1–2, 14–15. Walvoord, B., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment. San Francisco: Jossey-Bass.
Chapter 9
Instrumental or Sustainable Learning? The Impact of Learning Cultures on Formative Assessment in Vocational Education Kathryn Ecclestone
Introduction There is growing interest amongst researchers, policy makers and teachers at all levels of the British education system in assessment that encourages engagement with learning, develops autonomy and motivation and raises levels of formal achievement. These goals have been influenced by developments in outcome-based and portfolio-based qualifications in post-school education, including higher education. There is parallel interest in developing learning to learn skills and encouraging a positive attitude to learning after formal education through assessment that serves immediate goals for achievement whilst establishing a basis for learners to undertake their own assessment activities in future (see Boud & Falchikov, 2007). More specifically, research in the school sector offers insights about how to promote a sophisticated understanding of formative assessment that changes how students and teachers regard the purposes of assessment and their respective roles in it in order to enhance learning (see Assessment Reform Group, 2002; Black & Wiliam, 1998; Gardner, 2006). Yet, despite such compelling goals and the apparently unproblematic nature of principles and methods to encourage them, theoretical and empirical research about the assessment experiences of post-compulsory students in the UK shows that promoting sustainable learning through assessment is not straightforward. Recent studies show that students and their teachers have different expectations about the type of learners suitable for vocational and academic courses, the purposes of assessment as either to foster subjectknowledge or personal development, and about ‘‘appropriate’’ forms of assessment. Students progressing to university have therefore experienced very different approaches to formative assessment, leading to different expectations about what they can or should expect in terms of feedback and help in improving their work. Taken together, these studies suggest that K. Ecclestone Westminster Institute of Education, Oxford Brookes University, Oxford, UK e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_9, Ó Springer ScienceþBusiness Media B.V. 2009
153
154
K. Ecclestone
staff in universities need to understand more about the ways in which learning cultures that students experience before university are a powerful influence on expectations, attitudes and practices as they progress into higher education (see Davies & Ecclestone, 2007; Ecclestone, 2002; Torrance et al., 2005). As a contribution to debate about how staff in universities might encourage formative assessment for sustainable learning, the chapter draws on two studies: one explored the factors that help and hinder teachers in changing their formative assessment practices, the other explored the effects of formative and summative assessment on students’ and teachers’ ideas about learning, assessment and achievement (Davies & Ecclestone, 2007; Ecclestone et al., in progress; Torrance et al., 2005). First, the chapter summarises some barriers to better understanding of formative assessment in assessment systems in the UK. Second, it summarises the concept of learning cultures as an aid to understanding the effects of formative assessment on attitudes to learning. Third, it applies this concept to case study data from in-depth interviews and observations of assessment activities in an Advanced Vocational Business Studies qualification in a further education (tertiary) college (two tutors, eight students) and an Advanced Vocational Science Qualification (one teacher, two students) in a school. This section explores the different factors in the learning cultures that affected students’ and teachers’ attitudes to learning and the role of assessment in their learning. Finally, it evaluates implications of this discussion for assessment practices in universities for young people progressing to higher education from different learning cultures in schools and colleges.
Barriers to Understanding Formative Assessment in Post-compulsory Education A Divided System Most young people in the UK progress to university from general advanced level academic qualifications or general advanced level vocational qualifications. Some go to higher education from a work-based qualification. After compulsory schooling, students can take these qualifications in schools or further education colleges. Despite government attempts since 2000 to encourage more mixing of vocational and academic options, the UK’s qualification tracks remain very segregated. This leads students to have strong images of what is appropriate for them in terms of content and assessment approaches, and to choose university options on the basis of particular images about the sort of learners they are (see Ball, Maguire, & Macrae, 2000; Ball, David, & Reay, 2005; Torrance et al., 2005).
9 Instrumental or Sustainable Learning?
155
The past 30 years has seen a huge shift in British post-school qualifications from methods and processes based on competitive, norm-referenced examinations for selection, towards assessment that encourages more achievement. There is now a strong emphasis on raising standards of achievement for everyone through better feedback, sharing the outcomes and criteria with students and recording achievement and progress in portfolios. One effect of this shift is a harmonisation of assessment methods and processes between long-running, high status academic qualifications in subjects such as history, sociology and psychology and newer, lower status general vocational qualifications such as leisure and tourism, health and social care. General academic and general vocational qualifications now combine external examinations set by an awarding body, assignments or course work assessed by an awarding body and course work or assignments assessed by teachers. Some general academic qualifications are still assessed entirely by external examinations. A series of ad hoc policy-based and professional initiatives has encouraged alternative approaches to both formative and summative assessment, such as outcome and competence-based assessment, teacher and work-place assessment, and portfolios of achievement. These have blurred the distinction between summative and formative assessment and emphasised outcomebased assessment as the main way to raise levels of participation, achievement, confidence and motivation amongst young people and adults who have not succeeded in school assessment (see, for example, Jessup, 1991; McNair, 1995; Unit for Development of Adult and Continuing Education, 1989). One effect has been to institutionalise processes for diagnostic assessment, the setting and reviewing of targets, processes to engage students with assessment specifications and criteria, methods of support and feedback to raise grade attainment or improve competence, and ways of recording achievement. In vocational and adult education, these processes tend to be seen as a generic tool for motivation and achievement and for widening participation. In contrast, similar practices in schools and higher education are more strongly located in the demands of subject disciplines as the basis for higher achievement and better student engagement.
Confusion about Formative Assessment There is currently no watertight definition of formative assessment. Black and Wiliam define formative assessment as ‘‘encompassing all those activities undertaken by teachers and/or by their students which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged (Black & Wiliam, 1998, p. 7). Formative assessment is sometimes described as assessment for learning as distinct from assessment of learning: Assessment for learning is any assessment for which the first priority in its design and practice is to serve the purpose of promoting students’ learning. It thus differs from
156
K. Ecclestone
assessment designed primarily to serve the purposes of accountability, or of ranking, or of certifying competence. An assessment activity can help learning if it provides information to be used as feedback, by teachers, and by their students, in assessing themselves and each other, to modify the teaching and learning activities in which they are engaged. Such assessment becomes ‘formative assessment’ when the evidence is actually used to adapt the teaching work to meet learning needs. (Black & Wiliam, 1998, p. 2)
The Assessment Reform Group has developed principles of formative assessment that encourage teachers to develop the links between information about students’ progress towards learning goals, adaptations to planning and teaching based on feedback and dialogue, and attention to the ways in which students learn. This requires encouragement of autonomy, some choice about activities and students’ understanding of goals, criteria and the purpose of feedback (Assessment Reform Group, 2002). Royce Sadler (1989) argues that formative assessment has to close the gap between current states of understanding and final goals as part of demystifying and communicating the guild knowledge of subject disciplines. More recently, Black regards all feedback as synonymous with formative assessment or assessment for learning and feedback can take many forms. He argues that especially valuable feedback can be seen in peerand self-assessment, in new approaches to discussion work and to teachers’ written feedback, and in more sensitive and open-ended questioning in class (Black, 2007). Despite agreement at the level of research, ideas about formative assessment in everyday practice reveal images of learning as attaining objectives, where knowledge is fixed and externally-defined (see Hargreaves, 2005). The principles outlined above can also mask practices that equate formative assessment with continuous or modular feedback and monitoring for summative tasks spread through a course. In contrast, the same rhetoric can convey images of learning as the construction of knowledge, where knowledge needs reworking by students so that it makes sense to them (Hargreaves, 2005). Conflicting images of learning and teaching therefore affect the underlying assumptions and apparently unproblematic practices of sharing goals and criteria, giving feedback, closing the gap and adapting teaching to suit learning needs. Activities based on transmission of the teacher’s expertise, knowledge and advice (or the knowledge of those designing the assessment specifications) have a very different ethos and outcome than formative assessment based on transaction between teachers and students about processes, the content of an activity or task or about its goals. In turn, formative assessment that aims to transform students’ and teachers’ understanding of concepts and processes associated with learning a subject offers a higher degree of challenge. For example, Hargreaves shows how the notion of closing the gap is often rooted in teacher-led images of performance, delivery, adapting teaching in the light of assessment information, or as a gift from teacher to pupil (Hargreaves, 2005). It is therefore important to pay attention to the language that teachers, qualification designers and students use, as well as to their practices.
9 Instrumental or Sustainable Learning?
157
Attempts to define formative assessment do not, therefore, off-set widespread misunderstanding amongst practitioners and institution managers that some activities are formative and others summative. In post-compulsory education, there is a tendency to see formative assessment as a series of teacher-led techniques for feedback, diagnosis and review where, despite an accompanying rhetoric of ‘‘engaging students with learning’’, the techniques and associated formal paperwork are often solely to ‘‘track’’ students towards their summative targets (see Ecclestone, 2002; Torrance et al., 2005). In post-compulsory education, formative assessment is widely, and mistakenly, seen as synonymous with continuous or modular assessment where summative tasks are broken up into interim ones. Yet, more holistic definitions lead to further difficulty because teachers, reasonably, associate such activities as questioning, written feedback and practice examination questions with teaching.
Differentiating Between the Spirit and the Letter of Formative Assessment The need to be clearer and more precise about the purposes of formative assessment is confirmed by research which shows that the same assessment activities or methods can lead to very different kinds of learning in different contexts. In the Learning How to Learn Project in the Economic and Social Science Research Council’s Teaching and Learning Research Programme (TLRP), Marshall and Drummond use the evocative terms spirit and letter of formative assessment as assessment for learning (AfL) to capture how it was practised in the classroom: The ‘spirit’ of AfL. . .we have characterized as ‘high organisation based on ideas’, where the underpinning principle is promoting pupil autonomy . . ..This contrasts with those lessons where only the procedures, or ‘letter’ of AfL seem in place. We use these headings – the ‘spirit’ and ‘letter’ – to describe the types of lessons we watched, because they have a colloquial resonance which captures the essence of the differences we observed. In common usage adhering to the spirit implies an underlying principle which does not allow a simple application of rigid technique. In contrast, sticking to the letter of a particular rule is likely to lose the underlying spirit it was intended to embody. (Marshall & Drummond, 2006, p. 137)
They found that teachers working in the spirit of AfL encouraged students to become more independent and critical learners in contrast to those working in the letter of AfL, where formative assessment activities were teacher-centred and teacher-led in order to transmit knowledge and skills. Researchers and teachers in the Improving Formative Assessment project have found the terms letter and spirit helpful in characterising formative assessment practices emerging from our research, and we connect them in the paper to instrumental and sustainable formative assessment. This useful distinction of spirit and letter illuminates the ways in which formative assessment might enable students to go beyond extrinsic success in
158
K. Ecclestone
meeting targets and, instead, to combine better performance with engagement and good learning habits in order to develop learning autonomy. The distinction enables a contrast to be drawn with techniques based on a teacher-centred, transmission view of knowledge and learning and which encourage compliant, narrow responses. However, the spirit and letter are not neatly separated: teachers in this project often had a particular goal and focus of attention in mind, but shifted between these and others during a lesson (Marshall & Drummond, 2006). The same phenomenon is also apparent amongst vocational and adult education teachers (see Derrick & Gawn, 2007).
Motivation Consideration of the spirit and letter of formative assessment suggests that a more nuanced understanding of the forms of motivation and autonomy that assessment promotes is also important if we are to understand the effects of formative assessment on learning. In order to go beyond an old and somewhat unrealistic dichotomy between intrinsic and extrinsic motivation, German researchers have undertaken longitudinal studies of the ways in which students combine strategic approaches to learning based on external motivation, with self-determination and personal agency. This work uses well-known psychological constructs of motivation, such as students’ and teachers’ attribution of achievement to effort, luck, ability, the difficulty of a particular task or to other external factors, and the extent to which students have a sense of agency or locus of control. The resulting typology offers ‘‘a systematically ordered spectrum of constructs’’ that illuminates individual behaviours and activities in different contexts whilst recognising that motivation is affected strongly by social factors in a learning group, and from family, peers and work colleagues (Prenzel, Kramer, & Dreschel, 2001). The typology proved useful and illuminating in a study of vocational education students’ experiences of, and responses to, assessment (Ecclestone, 2002). Another study is currently evaluating whether or not the typology might help teachers understand their students’ motivation better. (Ecclestone et al., in progress). My descriptions here draw on Prenzel’s original categories and insights gained from these two studies. Amotivated Amotivated learners lack any direction for learning, and are, variously, indifferent or apathetic. Sometimes this state is an almost permanent response to formal education or assessment and therefore hard to shift, or it appears at points during a course. There is a sense that amotivated learners are drifting or hanging on until something better appears. However, it is important to recognise the obvious point, that for all of us at different times, our deepest, most intrinsic motivation can revert to states where we are barely motivated or not
9 Instrumental or Sustainable Learning?
159
motivated at all! In this context, surviving the pressure of targets or trying to achieve something difficult requires the reward or punishment of external motivation. External Learning takes place largely in association with reinforcement, reward, or to avoid threat or punishment, including: short-term targets, prescriptive outcomes and criteria, frequent feedback (this might be about the task, the person’s ego or feelings or the overall goal) and reviews of progress, deadlines, sanctions and grades. In post-compulsory education, external motivation sometimes takes the form of financial incentives (payment to attend classes or rewards for getting a particular grade) or sanctions (money deducted for non-attendance on courses). External motives are sometimes essential at the beginning of a course, or at low points during it. They are not, therefore, negative and can be used strategically as a springboard for other forms of motivation or to get people through difficult times in their learning. Yet, if left unchecked, external motives can dominate learning all the way through a course and lead to instrumental compliance rather than deep engagement. Introjected/internalised Introjected is a therapeutic term where someone has internalised an external supportive structure and can articulate it as her or his own: in a qualification, this might comprise the vocabulary and procedures of criteria, targets and overall assessment requirements. Good specifications of grade criteria and learning outcomes and having processes or tasks broken into small steps enable learners to use the official specifications independently of teachers. Nevertheless, although introjected motivation enables students to articulate the official requirements and criteria almost by rote, it is not selfdetermined. For learners disaffected by assessment in the past, introjected motivation is powerful and, initially, empowering. However, like external motivation, it can become a straitjacket by restricting learners and teachers to prioritising the formal requirements, especially in contexts where contact time and resources are restricted. Identified Learning occurs when students accept content or activities that may hold no incentive in terms of processes or content (they might even see them as a burden) but which are necessary for attaining a pre-defined goal such as a qualification and short-term targets. It links closely to introjected motivation, and goals that students identify with can be course-related, personal and social or all three as a means to a desirable end.
160
K. Ecclestone
Intrinsic Learners perceive any incentives as intrinsic to the content or processes of a formal learning or assessment activity, such as enjoyment of learning something for its own sake, helping someone else towards mastery or being committed to others outside or inside the learning group. It is often more prevalent amongst learners than teachers might assume and takes idiosyncratic, deeply personal and sometimes fleeting forms. Intrinsic motivation is context-specific: someone can therefore show high levels of intrinsic motivation in one task or context and not in another. Learning is highly self-determined and independent of external contingencies.
Interested Interested motivation is characterised by learners recognising the intrinsic value of particular activities and goals and then assigning their own subjective criteria for what makes something important: these include introjected and identified motives. Like intrinsic motivation, it relies on students assigning deeply personal meanings of relevance to content, activities and contexts. It is accompanied by feelings of curiosity and perhaps risk or challenge, and encouraged by a sense of flow, connection or continuity between different elements of a task or situation. High levels of self-determination, a positive identity or sense of self and the ability to attribute achievement to factors within one’s own control are integrated in a self-image associated with being a successful learner. Interested motivation relates closely to Maslow’s well-known notion of self-actualisation, where identity, learning activities, feelings of social and civic responsibility and personal development are fused together. It is therefore often correlated with good peer and social dynamics in a learning group. Interested motivation characterised ideas about learning and personal identity amongst some young people in Ball, Maguire and Macrae’s study of transitions into post-compulsory education, training or work. The processes and experiences of becoming somebody and of having an imagined future were not only rooted strongly in formal education but also in their sense of having positive opportunities in the local labour market and in their social lives. Formal education therefore played an important and positive, although not dominant part, in their evolving personal identity (Ball, Maguire, & Macrae, 2000). Despite the descriptive and analytical appeal of these types, it is, of course, crucial to bear in mind that they are not stable or neatly separated categories. Nor can motivation be isolated from structural factors such as class, gender and race, opportunities for work and education, and students’ and teachers’ perceptions of these factors. In formal education, students might combine aspects of more than one type, they might change from day to day, and they might show
9 Instrumental or Sustainable Learning?
161
interested motivation strongly in one context (such as a hobby) but not at all in the activities required of them at school, college or university. The factors that make someone an interested or barely motivated learner are therefore idiosyncratic, very personal and changeable: the most motivated, enthusiastic young person can show high levels of interested motivation in year one of a two year vocational course but be barely hanging on through external and introjected motivation by the end of year two, simply because they are tired with a course and want to move on. Some need the incentives of external rewards and sanctions and to internalise the official demands and support structures as a springboard to develop deeper forms of motivation (see Davies & Ecclestone, 2007; Ecclestone, 2002). Nevertheless, these caveats do not detract from a strong empirical connection shown in Prenzel’s studies and my own between intrinsic and interested motivation based on high levels of self-determination and positive evidence of the conditions listed below and, conversely, amotivation or external motivation and poor evidence of these conditions:
support for students’ autonomy, e.g., choices for self-determined discovery, planning and acting;
support for competence, e.g., effective feedback about knowledge and skills in particular tasks and how to improve them;
social relations, e.g., cooperative working, a relaxed and friendly working atmosphere;
relevance of content, e.g., applicability of content, proximity to reality, connections to other subjects (here it is important to note that ‘‘relevance’’ is not limited to personal relevance or application to everyday life); quality of teaching and assessment, e.g., situated in authentic, meaningful problem contexts, adapted to students’ starting points; and teachers’ interest, e.g., expression of commitment to students.
A Cultural Understanding of Assessment The Concept of Learning Culture It is not sufficient simply to promote agreed meanings and principles of formative assessment or motivation and to improve teachers’ formative techniques. Instead, we need to know more about the ways in which the idiosyncratic features of local ‘‘learning cultures’’ and the political and cultural conditions that affect how formative assessment affects motivation positively or negatively. The concept of ‘‘learning culture’’ enables us to analyse how formative assessment practices help to foster students’ deep engagement with learning in some contexts and their instrumental compliance with assessment targets in others. It was developed in the Transforming Learning Cultures in Further Education (TLC) project which drew on the well-known work of Pierre Bourdieu to define it as:
162
K. Ecclestone
a particular way to understand a learning site1 as a practice constituted by the actions, dispositions and interpretations of the participants. This is not a one way process. Cultures are (re)produced by individuals, just as much as individuals are (re)produced by cultures, though individuals are differently positioned with regard to shaping and changing a culture – in other words, differences in power are always at issue too. Cultures, then, are both structured and structuring, and individuals’ actions are neither totally determined by the confines of a learning culture, nor are they totally free. (James & Biesta, 2007, p. 18)
A learning culture is therefore not the same as a list of features that comprise a course or programme, nor is it a list of factors that affect what goes on in a learning programme; rather, it is a particular way of understanding the effects of any course or programme by emphasising the significance of the interactions and practices that take place within and through it. These interactions and practices are part of a dynamic, iterative process in which participants (and environments) shape cultures at the same time as cultures shape participants. Learning cultures are therefore relational and their participants go beyond the students and teachers to include parents, college managers at various levels, policy makers and national awarding bodies. Learning culture is not, then, synonymous with learning environment since the environment is only part of the learning culture: a learning culture should not be understood as the context or environment within which learning takes place. Rather, ‘learning culture’ stands for the social practices through which people learn. A cultural understanding of learning implies, in other words, that learning is not simply occurring in a cultural context, but is itself to be understood as a cultural practice. (James & Biesta, 2007, p. 18, original emphases)
Instead, the TLC project shows that learning cultures are characterised by the interactions between a number of dimensions:
the positions, dispositions and actions of the students; the positions, dispositions and actions of the tutors; the location and resources of the learning site which are not neutral, but enable some approaches and attitudes, and constrain or prevent others;
the syllabus or course specification, the assessment and qualification specifications;
the time tutors and students spend together, their interrelationships, and the range of other learning sites students are engaged with;
issues of college management and procedures, together with funding and inspection body procedures and regulations, and government policy;
wider vocational and academic cultures, of which any learning site is part; and
1
In the TLC project, the term ‘learning site’ was used, rather than ‘course’, to denote more than classroom learning. In the IFA project we use the more usual terms ‘course’ and ‘programme’.
9 Instrumental or Sustainable Learning?
163
wider social and cultural values and practices, for example around issues of social class, gender and ethnicity, the nature of employment opportunities, social and family life, and the perceived status of Further Education as a sector. Learning culture is intrinsically linked to a cultural theory of learning ‘‘[that] aims to understand how people learn through their participation in learning cultures, [and] we see learning cultures themselves as the practices through which people learn’’ (James & Biesta, 2007, p. 26). A cultural understanding illuminates the subtle ways in which students and teachers act upon the learning and assessment opportunities they encounter and the assessment systems they participate in. Teachers, students, institutional managers, inspectors and awarding bodies all have implicit and explicit values and beliefs about the purposes of a course or qualification, together with certain expectations of students’ abilities and motivation. Such expectations are explicit or implicit and can be realistic or inaccurate. Other influential factors in learning cultures are the nature of relationships with other students and teachers, their lives outside college and the resources available to them during the course (such as class contact time): all these make students active agents in shaping expectations and practices in relation to the formal demands of a qualification (see, for example, Ecclestone, 2002; Torrance et al., 2005).
The Learning Culture of the Advanced Vocational Certificate of Education (AVCE) Science Data and analysis in this section are taken from a paper by Davies and Ecclestone (2007). The student group at Moorview College, a school in a rural area of south west England, comprised 16 Year 13 students aged 17–18, with roughly equal numbers of boys and girls, and three teachers (teaching the physics, chemistry and biology elements of the course respectively). The learning culture was marked by a high level of synergy and expansiveness, of which teachers’ formative assessment practices were a part. Teachers regarded formative assessment as integral to good learning, ‘‘part of what we do’’ rather than separate practices, where formative assessment was a subtle combination of helping students to gain realistic grades, alongside developing their enthusiasm for and knowledge of scientific principles and issues, and their skills of self-assessment. It encouraged them to become more independent and self-critical learners. Strong cohesion between teachers’ expectations, attitudes to their subject and aspirations for their students was highly significant in the way they practised formative assessment. Synergy stemmed from close convergence between teachers and students regarding both expectations and dispositions to learning. The AVCE teachers expected students to achieve, while accepting that they did not usually arrive on the course with such high GCSE grades as those in A-level science subjects. There was also a general consensus between teachers and students that science
164
K. Ecclestone
was intrinsically interesting as well as practically relevant. The teachers were confident and enthusiastic about their subject areas, teaching subjects that were part of the accepted academic canon translated into a vocational syllabus. Most students saw the course as a positive choice, although for others it was a second choice when they failed to achieve high enough grades to take a singlesubject science A-level. However, once on the course, there was a high level of enthusiasm and commitment and a desire to study science, rather than because of any vocational relevance. Most did not initially aim for higher education but soon became motivated to apply to study in a vocational branch of science (such as forensic science), or a different field (such as architectural technology) where the vocational A-level grades would help them achieve the necessary points score. Their relationship with the course enabled some to broaden their horizons for action considerably during Year 13, such horizons being the arena within which actions can be taken and decisions made. There was, therefore, an influential ethos amongst students and teachers of progression in a clear route to something desirable and interesting. This reinforced the strong subject culture, applied approvingly by teachers and students to real vocational and life contexts. Although grades were crucial, formative assessment was far from being predominantly grade-focused, despite the strong institutional ethos. A powerful factor in the learning culture was the beliefs and commitment of the main teacher whose assessment practices we focused on in a project designed to improve formative assessment practices in vocational education and adult literacy and numeracy programmes (Ecclestone et al., in progress). Derek Armstrong insisted that he would not simply teach to the test; instead, he emphasised constantly the importance of developing students’ understanding of the value of scientific knowledge and their ability to become more independent learners as an essential preparation for university. He believed strongly that an advantage of the vocational over the academic A-level was that it taught students to be ‘‘in charge of their own learning’’ (1st interview). This involved his belief that students should acquire self-knowledge in order to be able to learn effectively, including knowing when to ask for help: ‘‘I think students must know how good they are, and know what their limitations are’’ (1st interview). His teaching and assessment practice encouraged dispositions that could lead to deeper learning, rather than success in meeting targets: I’m a lot more comfortable with saying, ‘‘You’re actually getting a grade that is much more appropriate to what you’ve done, rather than one which we could have forced you to get, by making you do exactly what we know needs to be done’’, which obviously we know happens more and more in education because it’s all results driven. (Derek, 2nd interview)
Indeed, he was not prepared to compromise to meet a target-driven educational culture: There’s no point in jumping through hoops for the sake of jumping through hoops and there’s no point in getting grades for the sake of getting grades. I know that’s not the
9 Instrumental or Sustainable Learning?
165
answer, because the answer is – no, we should be getting them to get grades. But that’s never as I’ve seen it and it never will be. (Derek, 3rd interview)
Derek espoused a theory of learning that encouraged him and students to construct their knowledge and understanding together, by working actively to understand mistakes, learn from them and build new insights together. Classroom observations and student interviews revealed that this espoused theory was also his theory-in-use (Argyris & Schon, 1971). He routinely asked students to explain a point to the rest of the group rather than always doing this himself. Although students did not conceptualise their learning in exactly the same way, and placed a higher premium on their grades, they also showed appreciation of the way their understanding and appreciation of science was developing: Some of the teachers teach you the subject and some of the teachers just help you learn it. Mr Armstrong will help you learn it and understand it. (Nick, student, 1st interview)
Despite its collaborative nature, this learning culture was rooted in a strong belief on both sides that the teacher is the most crucial factor in learning: ‘‘I believe they all know they can’t do it without me’’ (Derek, 3rd interview). When asked what motivated his students on the AVCE, Derek’s answer was unequivocal: ‘‘I’m going to put me, me, me, me’’ (1st interview).
Motivation in AVCE Science Drawing on Prenzel’s typology of motivation, our study showed that expectations of positive achievement for all students interacted with, and were also shaped by, expectations of students’ motivation. Teachers showed high levels of intrinsic motivation, where engagement with topics and ideas was rooted in their intrinsic value rather than for external reward (such as grades) and also interested motivation, where a sense of personal and learning identity is bound up with the subject, its activities and possibilities. They expected students to develop intrinsic and interested motivation too. Students wanted a qualification in order to achieve their individual goals (external motivation) but the goals stemmed from interest in the course/science and their sense of becoming somebody in a subject with progression and future possibilities (intrinsic and interested motivation). Students’ motivation appeared to stem from a symbiotic relationship between their teachers’ expertise and enthusiasm, the supportive group dynamics, the focus on collaborative learning, and their own vocational goals. This did, however, fluctuate with individuals and over time. While the learning culture of AVCE Science was characterised by a high level of synergy, it was also reasonably expansive. Despite the constraints of the syllabus and the assessment criteria, teachers took opportunities to promote an interest in scientific issues and topics, and Derek encouraged students to develop individual approaches to meet the criteria. Moreover, the vocational relevance of the AVCE contributed towards the expansive nature of the
166
K. Ecclestone
learning culture. Although the course was not highly practical and did not include work placements, it did include relevant trips and experimental work. It was vocational above all, though, in the way the teachers related the knowledge they taught to real-life experience. As Derek summed up: I think the real life concepts that we try and pull out in everything works very well. I think we’re incredibly fortunate to have time to teach learning, as opposed to time to teach content. (1st interview, original emphases)
Students generally had begun the course expecting the work to be reasonably easy and therefore restrictive and ‘‘safe’’, rather than challenging. In fact, the teachers’ pedagogy and formative assessment enabled students to accept challenge and risk, not in what they were learning, but in how they were learning. Our observations and interviews showed that they found themselves explaining work to their fellow students, joining in Derek’s explanations, negotiating how they might go about tasks and losing any initial inhibitions about asking questions of Derek and of one another.
The Relationship Between the Learning Culture and Formative Assessment In theory, there was potential for negative tension between Moorview’s targetdriven, achievement-orientated ethos and the AVCE Science teachers’ commitment to their subjects, but in practice this did not materialise. Instead, these teachers saw formative assessment as being about how students learned and part of a continuum of teaching and assessment techniques deeply embedded in their day-to-day practice. Derek used formative assessment to help students construct their knowledge, rather than solely to achieve targets. As he put it, ‘‘I can teach them to enjoy the science – but I wouldn’t call that formative assessment’’ (2nd workshop). His declaration, ‘‘My primary concern has never been their final grade’’ (3rd interview), should not, though, be taken to imply a cavalier attitude towards helping students to gain reasonable grades. All his students achieved high enough grades in the AVCE to gain their choice of university place. Students’ willingness to admit to misunderstandings or half-understandings, and for teachers to diagnose these, is crucial to effective formative assessment (Black, 2007). The learning culture encouraged students to became involved in peer- and self-assessment in different ways and to view a problem as an interesting issue to be explored with one another and with Derek, rather than as an indicator of their lack of ability. High levels of synergy and expansiveness were reflected in Derek’s formative assessment practices which he sees simply as part of teaching and learning. As he put it, ‘‘I know no jargon’’ (1st interview). Critical and positive feedback was also integral to his teaching: I don’t think there is any point in scribbling on a piece of paper, ‘‘This isn’t done right. This is how it should be done’’. I think you’ve actually got to go through and do it with them. They’ve got to know where the issues and the problems are for themselves. (1st interview)
9 Instrumental or Sustainable Learning?
167
His approach to formative assessment might, in other learning cultures, have been a technique used in the letter of formative assessment. However, he refused to give students a ‘‘check list’’ to use with the criteria ‘‘because that’s not preparing, that’s not what the course is about, and they should be working more independently’’ (2nd interview). Instead, he used the technique in the spirit of formative assessment, to develop students’ deeper understanding both of the coursework subject matter and of their own ability to be self-critical of their assignment. He wanted to encourage them, for them to say, ‘‘well, it’s not actually as difficult as you think it is’’ (2nd interview).
The Learning Culture of Advanced Vocational Business Data and analysis here are drawn from a different project that observed and explored students’ and teachers’ attitudes to assessment, and which observed formative assessment practices (Torrance et al., 2005). I focus here on students in a further education college, Western Counties, where students progressed from their local school to the college. The group comprised eleven students, three boys and eight girls, all of whom had done Intermediate Level Business at school and who were therefore continuing a vocational track. Students in this group saw clear differences between academic and vocational qualifications: academic ones were ‘‘higher status, more well-known’’. Yet, despite the lower status of the vocational course, students valued its relevance. Learning and achievement were synonymous and assessment became the delivery of achievement, mirroring the language of inspection reports and policy texts. Vocational tutors reconciled official targets for delivery with educational goals and concerns about students’ prospects. They saw achievement as largely about growing confidence and ability to overcome previous fears and failures. Personal development was therefore more important than the acquisition of skills or subject knowledge: this comment was typical: [students] develop such a lot in the two years that we have them... the course gives them an overall understanding of business studies really, it develops their understanding and develops them as people, and hopefully sets them up for employment. It doesn’t train them for a job; it’s much more to develop the student than the content... that’s my personal view, anyway.... (Vocational course leader, quoted in Torrance et al., 2005, p. 43)
Some, but not all, students had a strong sense of identity as second chance learners, an image that their tutors also empathised with from their own educational experience and often used as a label. This led to particular ideas about what students liked and wanted. One tutor expressed the widely held view that ‘‘good assessment’’ comprised practical activities, work-experience and field trips: ‘‘all the things these kids love... to move away from all this written assessment’’. For her, assessment should reflect ‘‘the way that [vocational] students prefer to learn. . .they are often less secure and enjoy being part of
168
K. Ecclestone
one group with a small team of staff...[assessment is] more supported, it’s to do with comfort zones – being in a more protected environment’’. Beliefs about ‘‘comfort zones’’ and ‘‘protecting’’ students meant that teachers aimed to minimize assessment stress or pressure. Teachers and students liked working in a lively, relaxed atmosphere that combined group work, teacher input, time to work on assignments individually or in small friendship-based groups, and feedback to the whole group about completed assignments.
Motivation There was a high level of synergy between students’ goals and the official policy goals their teachers had to aim for, namely to raise attainment of grades, maintain retention on courses and encourage progression to formal education at the next level. Three students were high achievers, gaining peer status from consistently good work, conscientious studying and high grades. They referred to themselves in the same language as teachers, as ‘‘strong A-grade students’’, approaching all their assignments with confidence and certainty and unable to imagine getting low grades. They combined identified and interested motivation from the typology discussed above. Crucial to their positive identity as successful students was that less confident or lower achieving students saw them as a source of help and expertise. In an earlier study of a similar advanced level business group in a further education college, successful students drew upon this contrast to create a new, successful learning identity (see Ecclestone, 2004). The majority of students combined the introjected motivation of being familiar with the detail of the assessment specifications and the identified motivation of working towards external targets, namely achieving the qualification. They worked strategically in a comfort zone, adopting a different grade identity from the high achievers. Most did not aim for A-grades but for Cs or perhaps Bs. They were unconcerned about outright failure, since, as teachers and students knew, not submitting work was the only cause of failure. Goals for retention and achievement, together with teachers’ goals for personal development and students’ desire to work in a conducive atmosphere without too much pressure, encouraged external, introjected and identified motivation and much-valued procedural autonomy. This enabled students to use the specifications to aim for acceptable levels of achievement, with freedom to hunt and gather information to meet the criteria, to escape from ‘‘boring’’ classrooms and to work without supervision in friendship groups (see also Bates, 1998). Synergy between teachers and students in relation to these features of the learning culture encouraged comfortable, safe goals below students’ potential capacity, together with instrumental compliance. The official specifications enabled students to put pressure on teachers to ‘‘cover’’ only relevant knowledge to pass the assignment, or to pressurise teachers in difficult subjects to ‘‘make it easy’’.
9 Instrumental or Sustainable Learning?
169
The Relationship Between the Learning Culture and Formative Assessment Meeting the summative requirements dominated formative activities completely. Teachers saw their main assessment role as a translator of official criteria, breaking up the strongly framed assignment briefs into sequential tasks to meet each criterion. Students prepared their assignments, working to copies of the official criteria specified for grades in each unit. Posters of criteria and grades gained by each student for each assignment were displayed on classroom walls. Formative assessment for both students and teachers focused on raising grade achievement. Students could submit a completed draft for feedback: this had to reflect a best attempt and they could not submit a half-hearted version in the hope of feedback to make it pass. There was wide variation in arrangements for this: in some courses, drafting was done numerous times while in others, only one opportunity was offered. Lesson time was almost entirely dominated by assessment and was used to introduce each assignment but also to talk through the outcomes of draft assignments, outlined here by one of the college tutors: I talk through the assessment criteria grid with them and the assignment brief, pinpointing the relationships between P, M and D [pass, merit and distinction] and that it does evolve through to D. The students like to go for the best grade possible and discuss how they could go about getting an M. There again, some students just aim for basic pass and those are the ones who leave everything to the last minute. Then I see a draft work, read through it, make notes, talk to each one, show the good areas in relation to the criteria and explain why and how if they have met them, saying things like ‘you’ve missed out M2’.... some will action it, some won’t. It’s generally giving them ideas and giving them a platform to achieve the outstanding M or D criteria. (Torrance et al., 2005, p. 46)
Tutors spent a great deal of time marking draft and final work, starting with grade criteria from the Es through the Cs to the As (the grade descriptors were changed from ‘P’, ‘M’, ‘D’ to ‘A’, ‘B’, ‘C’ in 2000). Where students had not done enough to get an E, teachers offered advice about how to plug and cover gaps, cross-referenced to the assessment specifications. Students had strong expectations that teachers would offer advice and guidance to improve their work. There was a strong sense that assessment feedback and allowing time to do work on assessed assignments in lessons had replaced older notions of lesson planning and preparation. Assignments were meticulously organised and staff and students in both programmes had internalised the language of ‘‘evidencing the criteria’’, ‘‘signposting’’ and ‘‘cross-referencing’’ and generating ‘‘moderatable evidence’’. Demands from the awarding body that teachers must generate ‘‘evidence’’ that enabled moderation processes to ensure parity and consistency between different centres offering the qualification around the country, led to student assignments for assessment that had similar formats. Teachers in the Advanced Business course felt that this made idiosyncratic interpretation or genuine local design of content impossible. Yet, as we saw in the case of AVCE Science above,
170
K. Ecclestone
this sterile approach is not inevitable: the learning culture of this college, with its strong conformity to official targets, combined with teachers’ beliefs about students and what the qualification was primarily for, were powerful factors shaping formative assessment.
The Implications of Different Learning Cultures on Attitudes to Learning and Assessment The learning cultures of the two courses influenced formative assessment practices, just as those formative assessment practices simultaneously emerged from and reinforced the levels of synergy and expansiveness within the learning cultures. There were major differences between teachers’ overall approaches to formative assessment. A socio-cultural analysis of the effects of assessment on motivation, autonomy and achievement shows that students and teachers bring particular dispositions to their teaching and assessment practices. The formal demands of an assessment system interact with these dispositions and other features of a learning culture whilst also creating and enabling new dispositions and practices. The processes involved in navigating a tightly specified, regulated assessment system socialise students into particular expectations about what counts as appropriate feedback, help and support and about how easy or difficult it is to act on feedback in order to get a good grade. A cultural understanding also illuminates how feedback, internalising the criteria, self-assessment and using detailed grade descriptors and exemplars can, in some learning cultures, encourage superficial compliance with atomised tasks derived from the assessment criteria, bureaucratic forms of self-assessment and high expectations of coaching and support. In the learning culture of Advanced Business, teachers used these processes to build confidence for learners they saw as second chance or fragile. A significant number of students developed procedural autonomy and introjected and identified motivation as part of a new, successful learning identity. A minority used these as springboards to deeper engagement and enthusiasm, and a sense of real achievement. Business students accepted the requirements without complaint or dissent and learned to navigate the various demands and processes. They liked the assessment system and were far from passive in it: indeed, their ideas about enjoyable learning, beliefs about their abilities and acceptable assessment methods, together with their strategic approach, were as influential as the official specifications and targets and teachers’ ideas about students’ needs. Collusion was therefore integral to commitment, compliance and comfort zones. In the learning culture of Advanced Science, images of second chance learners were much less prominent: despite a sense that the students did not have good enough grades to do the academic alternative to the vocational course, teachers had high expectations of intrinsic motivation for an interesting
9 Instrumental or Sustainable Learning?
171
subject. In contrast, there was a compelling sense in the Advanced Business learning culture that students were following pre-determined tracks that conformed to and confirmed an existing identity as a type of learner. There were also strong stereotypes about what vocational students expect, need or want, and can deal with. In many courses, narrow instrumentalism has become central to those expectations, making assessment in post-compulsory education not merely for learning or of learning: instead, it was learning: The clearer the task of how to achieve a grade or award becomes, and the more detailed the assistance given by tutors, supervisors and assessors, the more likely the candidates are to succeed; but succeed at what? Transparency of objectives, coupled with extensive use of coaching and practice to help learners meet them, is in danger of removing the challenge of learning and reducing the quality and validity of outcomes achieved. . ..assessment procedures and practices come completely to dominate the learning experience, and ‘criteria compliance’ comes to replace ‘learning’. (Torrance et al., 2005, p. 46)
In contrast, the learning culture of AVCE Science was shaped by the qualification design, the subject enthusiasm of the teachers and a clear sense of vocational knowledge, together with a system of selection as students progress from compulsory schooling that guaranteed a certain level of achievement and motivation. Selection was not available for the Advanced Business course: in order to meet recruitment targets, the college took almost every student who applied. These features, and the practices and expectations of teachers and students, combined to produce a much more expansive learning culture, including the way formative assessment was conceptualised and practised. Formative assessment practices as part of the learning culture of a vocational course are, to some extent, linked to the ways in which managers, practitioners, parents and students perceive such courses. Differences in the learning cultures of these two courses have also raised the question of what students and educators expect from a vocational course. Images and expectations of what counted as vocational and the kind of status attached to vocational was a key factor in shaping the learning culture. Its meaning in Advanced Science stemmed strongly from the way teachers linked scientific knowledge to real life situations. In Advanced Business, it seemed to be synonymous with the greater ratio of coursework to exams and activities that enable students to gain the qualification without too much difficulty. Vocational courses were generally accepted by students, their parents and certain teachers as being of lower status than the academic single subject courses at GCSE or A-level. These two case studies therefore illuminate the subtle ways in which some learning cultures foster a predominantly instrumental approach to learning and formative assessment, while others encourage sustainable forms of formative assessment. It seems that the high level of synergy and the expansive nature of the learning culture of AVCE Science both encouraged and encompassed practices in the spirit of formative assessment. In contrast, the more restrictive learning culture of Advanced Business encouraged and perpetuated practices that are essentially in the letter of it. From these differences, formative
172
K. Ecclestone
assessment in the latter was a straitjacket on the potential for sustainable learning whereas formative assessment in the former can be seen as a springboard for it. There is potential for certain practices to become springboards in Advanced Business, but our analysis of the learning culture suggests that this will not be easy. Some learning cultures therefore offer more potential for teachers to use formative assessment as a springboard for sustainable learning than others.
Implications for Formative Assessment in Higher Education Learners progressing to higher education in Britain will experience more attempts to demystify assessment, engage students with the criteria and provide as much transparency as possible: paper after paper at a European conference on assessment in higher education in 2006 presented these processes unproblematically and uncritically as innovative and progressive2. Unwittingly, this offers an image of learners who cannot or will not cope without these aids. Rather than being used in the spirit of formative assessment, many students adapt accordingly: I get an A4 piece of paper for each of 5 assessment criteria. . .examine the texts and lift out from the texts and put it in whichever section I feel relevant, that’s how I do it: you’re getting your five pieces of work, separate pieces of the question and you just string them all together and structure it. (Male, high achieving BA Sociology of Sport student quoted by Bloxham & West, 2006, p. 7)
As analysis shows here, students progressing from learning cultures that have already socialised them into instrumental attitudes to formative assessment are likely to expect similar approaches and to resist those that are more challenging. Yet, instrumentalism is not entirely negative. Instead, it has contradictory effects: indeed, instrumentalism to achieve meaningful, challenging tasks cannot be said to be uneducational per se. In a socio-cultural context of serious concern about disengagement and the professed need to keep young people in formal education as long as possible, instrumentalism enables some students in the British education system to achieve when they would not otherwise have done so and to go beyond their previous levels of skill and insight. It also encourages others to work in comfortable confines of self-defined expectations that are below their potential. These conclusions raise questions about the purpose and effects of formative assessment used primarily to increase motivation and participation rather than to develop subject-based knowledge and skills. Assessment systems can privilege broad or narrow learning outcomes, external, introjected, identified, intrinsic or interested motivation. They can also reinforce old images and stereotypes 2
‘Assessment for Excellence’: The Third Biennial Joint Northumbria/EARLI SIG Assessment Conference, Darlington, England, August 30 – September 1, 2006.
9 Instrumental or Sustainable Learning?
173
of learning and assessment or encourage new ones, and therefore offer comfortable, familiar approaches or risky, challenging ones. However, in the British education system, socio-political concerns about disengagement from formal education amongst particular groups have institutionalised formative assessment practices designed to raise formal levels of achievement rather than to develop deep engagement with subject knowledge and skills. Vocational students are undoubtedly ‘‘achieving’’ in some learning cultures but we have to question what they are really ‘‘learning’’. A tendency to reinforce comfortable, instrumental motivation and procedural autonomy, in a segregated, pre-determined track, begs serious questions about the quality of education offered to young people still widely seen as second best, second chance learners. Nevertheless, as the learning culture of Advanced Vocational Science shows, these features are not inevitable.
References Assessment Reform Group. (2002). 10 principles of assessment for learning. Cambridge: University of Cambridge. Ball, S.J., David, M., & Reay, D. (2005). Degrees of difference. London: RoutledgeFalmer. Ball, S.J., Maguire, M., & Macrae, S. (2000). Choices, pathways and transitions post-16: New youth, new economies in the global city. London: RoutledgeFalmer. Bates, I. (1998). Resisting empowerment and realising power: An exploration of aspects of General National Vocational Qualifications. Journal of Education and Work, 11(2), 109–127. Black, P. (2007, February). The role of feedback in learning. Keynote presentation to Improving Formative Assessment in Post-Compulsory Education Conference, University of Nottingham, UK. Black, P., & Wiliam, D. (1998). Assessment and classroom learning, Assessment in Education, 18, 1–73. Bloxham, S. & West, A. (2006, August–September). Tell me so that I can understand. Paper presented at European Association for Learning and Instruction Assessment Special Interest Group Bi-annual Conference, Darlington, England. Boud, D., & Falchikov, N. (Eds.), (2007). Re-thinking assessment in higher education: Learning for the long term. London: RoutledgeFalmer. Davies, J., & Ecclestone, K. (2007, September). Springboard or strait-jacket?: Formative assessment in vocational education learning cultures. Paper presented at the British Educational Research Association Annual Conference, Institute of Education, London. Derrick, J., & Gawn, J. (2007, September). The ‘spirit’ and ‘letter’ of formative assessment in adult literacy and numeracy programmes. Paper presented at the British Educational Research Association Annual Conference, Institute of Education, London. Ecclestone, K. (2002). Learning autonomy in post-compulsory education: The politics and practice of formative assessment. London: RoutledgeFalmer. Ecclestone, K. (2004). Learning in a comfort zone: Cultural and social capital in outcomebased assessment regimes. Assessment in Education, 11(1), 30–47. Ecclestone, K., Davies, J., Derrick, J., Gawn, J., Lopez, D., Koboutskou, M., & Collins, C. (in progress). Improving formative assessment in vocational education and adult literacy and numeracy programmes. Project funded by the Nuffield Foundation/National Research Centre for Adult Literacy and Numeracy/Quality Improvement Agency. Nottingham: University of Nottingham. www.brookes.ac.uk/education/research
174
K. Ecclestone
European Association for Learning and Instruction. (2006, August–September). Assessment Special Interest Group Bi-annual Conference, Darlington, England. Gardner, J. (Ed.). (2006). Assessment and learning. London: Sage. Hargreaves, E. (2005). Assessment for learning: Thinking outside the black box. Cambridge Journal of Education, 35(2), 213–224. James, D., & Biesta, G. (2007). Improving learning cultures in further education. London: Routledge. Jessup, G. (1991). Outcomes: NVQS and the emerging model of education and training. London: Falmer Press. Marshall, B., & Drummond, M. J. (2006). How teachers engage with assessment for learning: Lessons from the classroom. Research Papers in Education, 18(4), 119–132. McNair, S. (1995). Outcomes and autonomy. In J. Burke (Ed.), Outcomes, learning and the curriculum: implications for NVQs, GNVQs and other qualifications. London: Falmer Press. Prenzel, M., Kramer, K., & Dreschel, B. (2001). Self-interested and interested learning in vocational education. In K. Beck (Ed.), Teaching-learning processes in business education. Boston: Kluwer. Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. Torrance, H., Colley, H., Garratt, D., Jarvis, J., Piper, H., Ecclestone, K., & James, D. (2005). The impact of different modes of assessment on achievement and progress in the learning and skills sector. London: Learning and Skills Development Agency. Unit for Development of Adult and Continuing Education. (1989). Understanding learning outcomes. Leicester, UK: Unit for Development of Adult and Continuing Education.
Chapter 10
Collaborative and Systemic Assessment of Student Learning: From Principles to Practice Tim Riordan and Georgine Loacker
Introduction What can be learned from an institution with more than thirty years experience in the teaching and assessment of student learning outcomes? Much of that depends on what they themselves have learned and what they can tell us. This chapter aims to find out in one particular case. In doing so, we focus on the process of going from principles to practice, rather than merely on a description of our practice. Much has been written about the ability-based curriculum and assessment at Alverno College (Arenson, 2000; Bollag, 2006; Levine, 2006; Mentkowski & Associates, 2000; Twohey, 2006; Who needs Harvard?, 2006. Also see, www. alverno.edu), but this particular chapter emphasizes how certain shared principles have informed the practice of the Alverno faculty and how the practice of the faculty has led to emerging principles. Our reasoning is that the principles are more apt to be something that Alverno faculty can hold in common with faculty from other institutions, and the process can suggest how an institution can come to its unique practice from a set of shared principles. The instances of Alverno practice that we do include are examples of what kind of practice the faculty at one institution have constructed as expressions of their shared principles. Since 1973, the faculty of Alverno College have implemented and refined a curriculum that has at its core the teaching and assessment of explicitly articulated learning outcomes that are grounded in eight core abilities integrated with disciplinary concepts and methods. In this chapter we explore the shared principles and common practices that are integral to teaching, learning, and assessment at Alverno. Indeed, one of the fundamental principles at the college is that the primary focus of the faculty is on how to teach and assess in ways that most effectively enhance student learning. This principle has permeated and shaped the culture of the college and informed the scholarly work of the faculty, and it serves as the impetus for our ongoing reflection and discussion with colleagues throughout higher education. T. Riordan Alverno College, Milwaulkee WX, USA e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_10, Ó Springer ScienceþBusiness Media B.V. 2009
175
176
T. Riordan and G. Loacker
We also consider in this chapter what Alverno faculty have learned as they have worked on their campus and with other higher education institutions to make assessment a meaningful and vital way to enhance student learning. We draw from the experience of our faculty, from their work with many institutions interested in becoming more steadily focused on student learning, and from research grounded in practice across higher education. We hope that what we have to say here will be a stimulus for further reflection on issues we all face as educators in the service of student learning. Although, as our title implies, we emphasize in this chapter the collaborative and systemic dimensions of assessment, some important shifts in the thinking of Alverno faculty happened more in fits and starts than in an immediate, shared sense of direction; so we start at this point in this introduction with a couple of ideas that emerged in both individual reflection by Alverno faculty and in larger discussions. A key insight many of us gradually came to was that we had been teaching as scholars rather than as scholar teachers: we had the conviction that, if we did well in conveying our scholarship, learning was the student’s job. We did care whether the students were learning, though, and many of us were beginning to raise questions because of that. Then we began to learn that our caring had to take on more responsibility. Gradually we came to learn what that responsibility is. We knew we needed to integrate our scholarship and our teaching, and consequently our assessment, at the level of the individual student as well as of the program and institution. Our ideas about assessment did not come to us fully-formed; we simply kept asking ourselves questions about what would be necessary to put student learning at the center of our scholarly work. We have since come to learn much about what that entails. For a start, we decided we were responsible for deciding and expressing what we thought our students should learn. We made that a foundational assumption which turned into an abiding first principle.
It Is Important to Clearly Articulate what we Require our Students to Learn The need to clearly and publicly articulate learning outcomes was and remains a critical first principle for all of us at Alverno. Doing so provides a common framework and language for faculty in developing learning and assessment experiences. It helps students internalize what they are learning because they are more conscious of the learning that is expected. The process of articulating learning outcomes can begin in many ways. For us, it grew out of faculty discussions in response to questions posed by the president of the college. More than thirty years ago, we engaged in the process of articulating student learning outcomes when the president challenged us to consider what ‘‘students could not afford to miss’’ in our respective fields of
10
Collaborative and Systemic Assessment of Student Learning
177
study and to make public our answers to one another. Although we entered those discussions confident that we knew what we wanted our students to learn, we sometimes struggled to be as clear as we hoped. That is when we began to realize that if we found it hard to articulate what students should learn, how much harder it is for students to figure it out. We also discovered in those discussions that, although each discipline emphasizes its unique concepts and methods, there are some learning outcomes that seem to be common across all fields and that faculty want students to learn no matter what their major area of study is. The faculty agreed that all students would be required to demonstrate eight abilities that emerged from the learning outcomes discussions. Since that time, we have taught and assessed for the following abilities in the contexts of study in the disciplines or in interdisciplinary areas, both in the general education curriculum and the majors:
Communication Analysis Problem Solving Valuing in Decision-Making Social Interaction Developing a Global Perspective Effective Citizenship Aesthetic Engagement.
The faculty have revised and refined the meaning of the abilities many times over the years, but our commitment to the teaching and assessment of the abilities in the context of disciplinary study has remained steadfast. In order to graduate from the college, all students must demonstrate the abilities as part of their curriculum, so faculty design of teaching and assessment processes is informed by that requirement. We have in the course of our practice learned more about the nature of what is involved in this kind of learning. For instance, we were initially thinking of these abilities in terms of what we would give the students; however, the more we thought about the abilities we had identified and about our disciplines and the more we attended to the research on the meaning of understanding, the more we realized what a student would have to do to develop the abilities inherent in our disciplines. We moved from questions like, ‘‘What is important for students to remember or know?’’ to ‘‘Are they more likely to remember or know something if they do something with it?’’ This led us to another key principle at the foundation of our practice.
Education goes Beyond Knowing to Being able to use What One Knows We, along with other educational researchers, have seen that, for students, using what they know moves their knowledge to understanding and that observing how students use what they know or understand is usually the most effective way to
178
T. Riordan and G. Loacker
assess what they have learned. We are also committed to the notion that students who graduate from our college should be able to apply their learning in the variety of contexts they will face in life, whether it be to solve a life problem or to appreciate a work of art. In all of these senses, we believe that education involves not only knowing but also being able to use or do what one knows. This principle has been confirmed not only by our own practice at Alverno but also by scholars like Gardner (1999) and Wiggins & McTighe (2005) who have demonstrated through their research that understanding is further developed and usually best determined in performance. They argue that students grow in their understanding as they use what they have studied, and that students’ understanding is most evident when they are required to use or apply a theory or concept, preferably in a context that most closely resembles situations they are likely to face in the future. For example, having engineering students solve actual engineering problems using the theories they have studied is usually a more effective way of assessing and improving their learning than merely asking them to take a test on those theories. This principle has been important to us from the start, and its implications for our practice continue to emerge. The nature of the eight abilities themselves reflects an emphasis on how students are able to think and what they are able to do, so design of teaching and assessment in the curriculum has increasingly been more about engaging students in the practice of the disciplines they are studying, not just learning about the subject by listening to the teacher. In most class sessions Alverno students are, for example, engaging in communicating, interacting with one another, doing active analysis based on assignments, arguing for a position, exploring different perspectives on issues, or are engaged in other processes that entail practice of the abilities in the context of the disciplines. In fact, we often make the point that design of teaching has much more to do with what the students will do than what the teacher will say or do. Even our physical facilities have reflected our growing commitment to this kind of learning. Our classrooms are furnished not with rows of desks, but tables with chairs, to create a more interactive learning environment, one in which students are engaged in the analytic processes they are studying as opposed to one in which they are watching the teacher do the analysis for them. This is not to suggest that there is no room for input and guidance from the teacher, but, in the end, we have found that it is the student’s ability to practice the discipline that matters. In the same spirit, our assessment design is guided by the principle that students not only improve and demonstrate their understanding most reliably when they are using it, but also that what they do with their understanding will make them most effective in the variety of contexts they face in their futures. There has always been an emphasis on practice of a field in the professional areas like nursing and education. Students must demonstrate their understanding and ability by what they can do in clinicals or in student teaching. We take the same approach to assessment in disciplines across the curriculum. Students in philosophy are required to give presentations and engage in
10
Collaborative and Systemic Assessment of Student Learning
179
group interactions in which they use theories they have studied to develop a philosophical argument on a significant issue; English students draw from their work in literary criticism to write an editorial in which they address a simulated controversy over censorship of a novel in a school district; chemistry students develop their scientific understanding by presenting a panel on dangers to the environment. In all of these instances faculty have designed assessments that require students to demonstrate understanding and capability in performance that involves using what they have learned. But what makes for effective assessment design in addition to the emphasis on engaging students in using what they have studied? Essential to effective assessment of student learning, we have found, is the use of performance criteria to provide feedback to students – another critical principle guiding our educational practice.
Feedback Based on Criteria Enhances Student Learning As the authors throughout this book insist, assessment should not only evaluate student learning but enhance it as well. Our own practice has taught us that effective and timely feedback to students on their performance is critical to making assessment an opportunity for improvement in their learning. In addition, we have learned how important specific criteria are as a basis for feedback – criteria made known to the student when an assignment or assessment is given. A question many ask of Alverno faculty is, ‘‘What has been the greatest change for you as a result of your practice, as teachers, in your ability-based curriculum?’’ Most of us point to the amount of time and attention we give to feedback. We now realize that merely evaluating the work of students by documenting it with a letter or number does little to inform students of what they have learned and what they need to do to improve. How many of us as students ourselves received grades on an assignment or test with very little clue as to what we did well or not? We see giving timely and meaningful feedback as essential to teaching and assessment so that our students do not have a similar clueless experience. In fact, our students have come to expect such feedback from us and are quick to call us to task if we do not provide it. The challenge, we have learned, is to develop the most meaningful and productive ways of giving feedback. Sometimes a few words to a student in a conversation can be enough; other times extensive written feedback is necessary. There are times when giving feedback to an entire class of students is important in order to point out patterns for improvement. Increasingly the question of how to communicate with students about their performance from a distance is on our minds. Perhaps most important in giving feedback, however, is being able to use criteria that give students a picture of what their learning should look like in performance. This is a principle we have come to value even more over time.
180
T. Riordan and G. Loacker
Although we could say with confidence from the start that we wanted our students to be able to ‘‘communicate effectively,’’ to ‘‘analyze effectively,’’ to ‘‘problem-solve effectively,’’ we also knew that such broad statements would not be of much help to students in their learning. Therefore we have given more attention as faculty to developing and using criteria that make our expectations more explicit. It is helpful to students, for example, when they understand that effective analysis requires them to ‘‘use evidence to support their conclusions,’’ and effective problem-solving includes the ability to ‘‘use data to test a hypothesis.’’ The process of breaking open general learning outcomes into specific performance components provides a language faculty can use to give feedback to students about their progress in relation to the broader learning outcomes. As we have emphasized, students learn the abilities in our curriculum in the context of study in the disciplines, and this has important implications for developing criteria as well. Criteria must not only be more specific indicators of learning, but must also reflect an integration of abilities with the disciplines. In other words, we have found that another significant dimension of developing criteria for assessment and feedback is integrating the abilities of our curriculum with the disciplinary contexts in which students are learning them. This disciplinary dimension involves, in this sense, a different form of specificity. In a literature course the generic ‘‘using evidence to support conclusions’’ becomes ‘‘using examples from the text to support interpretations’’; while in psychology the use of evidence might be ‘‘uses behavioral observations to diagnose a disorder’’. It has been our experience that giving feedback that integrates more generic language with the language of our disciplines assists the students to internalize the learning outcomes of the curriculum in general and the uniqueness and similarities across disciplinary boundaries. The following example illustrates how our faculty create criteria informed by the disciplinary context and how this enhances their ability to provide helpful feedback. Two of the criteria for an assigned analysis of Macbeth’s motivation in Shakespeare’s play might be ‘‘Identifies potentially credible motives for Macbeth’’ and ‘‘Provides evidence that makes the identified motives credible.’’ These criteria can enable an instructor, when giving feedback, to affirm a good performance by a student in an assessment and still be precise about limitations that a student should be aware of in order to improve future performance: ‘‘You did an excellent job of providing a variety of possible credible motives for Macbeth with evidence that makes them credible. The only one I’d seriously question is ‘‘remorse’’. Do you think that Macbeth’s guilt and fear moved into remorse? Where, in your evidence of his seeing Banquo’s ghost and finally confronting death, do you see him being sorry for what he did?’’ In such a case, the instructor lets the student know that the evidence she provided for some of the motives made them credible, and she realizes that she has demonstrated effective analysis. The question about ‘‘remorse’’ gives her an opportunity to examine what she had in mind in that case and either discover where her thinking went askew or find what she considers sufficient evidence to open a conversation about it with the instructor.
10
Collaborative and Systemic Assessment of Student Learning
181
We have tried to emphasize here how vital using criteria to give instructor feedback is to student learning as part of the assessment process, but just as critical is the students’ ability to assess their own performance. How can they best learn to self assess? First, we have found, by practice with criteria. Then, by our modeling in our own feedback to them, what it means to judge on the basis of criteria and to plan for continued improvement of learning. After all, the teacher will not be there to give feedback forever; students must be able to determine for themselves what they do well and what they need to improve. Criteria are important, then, not only for effective instructor feedback but also as the basis for student self assessment.
Self Assessment Is Integral to Learning In our work with students and one another we have been continually reminded of the paradox in education that the most effective teaching eventually makes the teacher unnecessary. Put another way, students will succeed to the extent that they become independent life-long learners who have learned from us but no longer depend on us to learn. We have found that a key element in helping students develop as independent learners is to actively engage them in self assessment throughout their studies. Self assessment is such a critical aspect of our educational approach that some have asked why we have not made it the ninth ability in our curriculum. The fact that we have not included self assessment as a ninth ability required for graduation reflects its uniqueness because it underlies the learning and assessment of all of the abilities. Our belief in its importance has prompted us to make self assessment an essential part of the assessment process to be done regularly and systematically. We include self assessment in the design of our courses and work with one another to develop new and more effective ways of engaging students in self assessment processes. We also record key performances of self assessment in a diagnostic digital portfolio where students can trace their own development of it along with their development of the eight abilities. We have learned from our students that their practice of self assessment teaches them its importance. Our research shows that beginning students ‘‘make judgments on their own behavior when someone else points out concrete evidence to them’’ and that they ‘‘expect the teacher to take the initiative in recognizing their problems’’ (Alverno College Assessment Committee/Office of Educational Research and Evaluation, 1982). Then, gradually, in the midst of constant practice, they have an aha experience. Of what sort? Of having judged accurately, of realizing that they understand the criteria and can use them meaningfully, of trusting their own judgment enough to release them from complete dependence on the teacher. The words of one student in a panel responding to the questions of a group of educators provide an example of such an experience of self assessment. The
182
T. Riordan and G. Loacker
student said, ‘‘I think that’s one of the big things that self assessment gives you, a lot of confidence. When you know that you have the evidence for what you’ve done, it makes you feel confident about what you can and can’t do, not just looking at what somebody else says about what you did, but looking at that and seeing why or why not and back up what you said about what you did. It’s really confidence-building.’’ Implicit in all that we have said here about elements of the assessment process, including the importance of feedback and self assessment, is the vision of learning as a process of development. We emphasize with students the idea that they are developing as learners, and that this process never ends; but what else have we done to build on this principle of development in our practice?
Learning Is a Developmental Process Faculty recognize that, although students may not progress in a strictly linear fashion as learners, design of teaching and assessment must take into account the importance of creating the conditions that assist students to develop in each stage of their learning. This has led us to approach both curriculum and teaching design with a decidedly developmental character. After committing to the eight abilities as learning outcomes in the curriculum, we asked ourselves what each of these abilities would look like at different stages of a student’s program of study. What would be the difference, for example, between the kind of analysis we would expect of a student in her first year as opposed to what we would expect when she is about to graduate? As a result of these discussions, the faculty articulated six generic developmental levels for each of the eight abilities. For instance, the levels for the ability of Analysis are: Level 1 Level 2 Level 3 Level 4 Level 5 Level 6
Observes accurately Draws reasonable inferences from observations Perceives and makes relationships Analyzes structures and relationships Refines understanding of disciplinary frameworks Applies disciplinary frameworks independently.
We would not argue that the levels we have articulated are the final word on the meaning and stages of analysis or any of the abilities; indeed, we continually refine and revise their meaning. On the other hand, we recognize the value of providing a framework and language that faculty and students share in the learning process. All students are required to demonstrate through level four of all eight abilities in their general education curriculum and to specialize in some of them at levels five and six, depending on what faculty decide are inherent in their major and thus most important for their respective majors. For example, nursing majors and philosophy majors, like every other major, are required to demonstrate all eight abilities through level four. At levels five and six, however,
10
Collaborative and Systemic Assessment of Student Learning
183
nursing students would specialize in problem solving, social interaction, and valuing in decision-making while philosophy students would specialize in analysis, valuing in decision-making, and aesthetic engagement. We quickly learned in implementing such a developmental approach to the curriculum that we would need to rely on each other to ensure that students were, indeed, having the kinds of experiences that fostered the development we envisioned. From articulating the stages of the curriculum to designing our courses, we needed to be on the same page and in constant collaboration to make it work for students. How has this emphasis on development affected the ways in which we design courses? It has meant that we always ask ourselves as part of the design process at what stage in our curriculum students will be taking a course. If students are taking a course in the first year of their program, the learning outcomes in the course usually reflect levels one and two of the generic institutional abilities in disciplinary or interdisciplinary contexts. With respect to the ability of analysis, for example, faculty design courses in the first year that focus on developing the ability to ‘‘Observe accurately’’ and ‘‘Draw reasonable inferences from observations’’, no matter what the disciplinary or interdisciplinary context is. Whether students are studying psychology or sociology, biology or chemistry, philosophy or history the courses they take in the first year are designed with the beginning levels of analysis in mind; and courses later in the curriculum are designed on the basis of more sophisticated analytic abilities, as reflected in the developmental levels indicated above. This is true for all eight of the abilities: faculty integrate the teaching and assessment of abilities they are responsible for into course design according to the developmental level of the course in the curriculum, whether in general education or the major. But, we are often asked, what have you done to ensure that faculty sustain and improve a developmental and coherent curriculum? What have you found helpful in fostering an ongoing spirit of inquiry consistent with your principles regarding learning as a developmental process? Attention to structures and processes aligned with those principles has been critical for us.
It Is Important to Create Structures and Processes that Foster Collaborative Responsibility for Student Learning While it is true that individual faculty are responsible for the particular group of students they are teaching, the Alverno faculty have practiced the principle that they also share responsibility for student learning as a whole. What one faculty member does in a particular class can affect how students learn in other classes. Consequently, faculty collaboration with one another about issues of teaching and assessment is necessary to ensure the developmental and coherent nature of the curriculum. This principle has made a difference in the structures and processes we have developed over time.
184
T. Riordan and G. Loacker
The faculty at Alverno take seriously the idea that teaching and assessment are not individual enterprises, but collaborative ones. In addition to the common commitment to the teaching and assessment of shared learning outcomes, we believe that we are responsible to assist one another to improve the quality of student learning across the college. We believe that requires organizing our structures to allow for the collaboration necessary to achieve this. There is no question that this does challenge the image of the individual scholar. We do recognize that keeping current in one’s own field is part of our professional responsibility, but we have also noted that a consequence of specialization and a limited view of research in higher education is that faculty are often more connected with colleagues of a disciplinary specialization than with the faculty members on the same campus, even in the same department. As recent discussions in higher education about the nature of academic freedom have suggested, the autonomy of individual faculty members to pursue scholarship and to have their teaching unfettered by forces that restrict free and open inquiry presumes a responsibility to foster learning of all kinds. We believe that if the learning of our students is a primary responsibility, then our professional lives should reflect that, including how and with whom we spend our time. There are several ways in which we have formalized this as a priority in institutional structures and processes. One important structural change we decided to make, given the significance of the eight abilities at the heart of the curriculum, was that the Alverno faculty decided to create a department for each of the abilities. In practice this means that faculty at the college serve in both a discipline department and an ability department. Depending on their interest and expertise, faculty join ability departments, the role of which is to do ongoing research on each respective ability, refine and revise the meaning of that ability, provide workshops for faculty, and regularly publish materials about the teaching and assessment of that ability. These departments provide a powerful context for cross-disciplinary discourse and they are just as significant as the discipline departments in taking responsibility for the quality and coherence of the curriculum. But how is this collaboration supported? In order to facilitate it, the academic schedule has been organized to make time for this kind of work. No classes are scheduled on Friday afternoons, and faculty use that time to meet in discipline or ability departments or to provide workshops for one another on important developments in curriculum, teaching, and assessment. In addition, the faculty hold three Institutes a year – at the beginning of each semester and at the end of the academic year – and those are also devoted to collaborative inquiry into issues of teaching, learning, and assessment. As a faculty we expect one another to participate in the Friday afternoon work and the College Institutes as part of our responsibility to develop as educators because of what we can learn from one another and can create together in the service of student learning. The value placed on this work together is strongly reinforced by criteria for the academic ranks, which include very close attention to the quality and
10
Collaborative and Systemic Assessment of Student Learning
185
significance of contributions faculty make to this kind of ongoing discourse on teaching, learning, and assessment. In fact, it was during one of our College Institutes focused on effective teaching when we noted, somewhat sheepishly, that the criteria we had for faculty performance with respect to teaching at that point said simply, ‘‘Teaches effectively’’. We have since redefined that general statement into increasingly complex levels to include the different principles we have explored in this chapter. These principles have shaped our view not only of teaching but also of scholarship grounded in our commitment to student learning. In this spirit, faculty are evaluated not only on the quality of their own teaching but also on their contributions to the quality of teaching and learning across the institution and, indeed, to higher education in general. As faculty proceed through the academic ranks, the criteria for this dimension of scholarship reflect more intensive and extensive contributions.
Making Teaching a Scholarly Enterprise Should be a Priority The tradition of scholarship in higher education has emphasized both rigorous study of the subject under investigation and making the study and its results public to a community of scholars for review and critique. Our own work on teaching and assessment at Alverno has certainly been informed by that tradition, but our focus on teaching and assessment has led us to rethink the scope of what is studied, the ways in which we might define rigor, and the form of and audience for making our ideas public. In his landmark publication, Scholarship Reconsidered, Boyer (1997) challenged the academic community to broaden and deepen its notion of scholarship, and one very significant aspect of his analysis called for a focus on what he referred to as the scholarship of teaching. Since that time, serious work has been done in higher education circles on what this might mean and how we might make it a more substantive and recognized dimension of scholarly work. Particularly noteworthy has been the work of the Carnegie Foundation for the Advancement of Teaching and Learning. Under the leadership of Lee Shulman and Pat Hutchings, Carnegie (www.carnegie.org) has developed multiple initiatives and fostered ongoing discourse on the scholarship of teaching. At Alverno we have become increasingly committed to scholarly inquiry into teaching and learning and have made considerable inroads in establishing expectations regarding scholarship that contributes to the improvement of student learning. We know, for example, that we are more likely to help students learn if we understand how they learn, and this implies that our scholarly inquiry includes study beyond our own disciplinary expertise. What do we know about the different learning styles or ways of knowing that students bring to their learning (Kolb, 1976)? What are cognitive psychologists and other learning theorists saying about the most effective ways of engaging students in their learning
186
T. Riordan and G. Loacker
(Bransford et al., 2000)? What can developmental theories tell us about the complexities of learning (Perry, 1970)? We regularly share with one another the literature on these kinds of questions and explore implications for our teaching practice. The research of others on learning has been a great help to us, but we have also come to realize that each student and group of students is unique, and this has led us to see that assessment is an opportunity to understand more about our students as learners and to modify our pedagogical strategies accordingly. In this sense, our faculty design and implement assignments and assessments with an eye to how each one will assist them to know who their students are. We are consistently asking ourselves how we have assisted students to make connections between their experience and the disciplines they are studying, how we have drawn on the previous background and learning of students, and how we can challenge their strengths and address areas for improvement. This focus on scholarly inquiry into the learning of our students is a supplement to, not a substitute for, the ongoing inquiry of faculty in their disciplines, but our thinking about what this disciplinary study involves and what it means to keep current in our fields has evolved. Increasingly we have approached our disciplines not only as objects of our own study but also as frameworks for student learning. What has this meant for us in practice? Perhaps most importantly we have tried to move beyond the debate in higher education about the extent to which research into one’s field informs one’s teaching or even whether research or teaching should take precedence in a hierarchy of professional responsibilities. At Alverno we have taken a position and explored its implications in order to act on them. We have asked ourselves how our role as educators affects the kind of inquiry we pursue in our disciplines and how we engage students in the practice of those disciplines in ways that are meaningful and appropriate to the level at which they are practicing them. From this perspective there is less emphasis on sharing what we know about our disciplines and more on figuring out how to assist students to use the discipline themselves. This is not to suggest that the content of disciplinary study is unimportant for the faculty member or the student, but it does mean that what faculty study and what students study should be considered in light of how we want our students to be able to think and act as a result of their study. In this view faculty scholarship emerges from questions about student learning, and disciplines are used as frameworks for student learning. Clearly essential to any scholarly work is making one’s ideas public to colleagues for critical review and dialogue. This is an expectation we have for our faculty as well, but we have embraced a view of ‘‘making one’s ideas public’’ that is not restricted to traditional forms of publication. As we have suggested throughout this chapter, the first audience for our ideas on teaching, learning, and assessment is usually our colleagues here at the college. Because our first responsibility is to the learning of our students, our Friday afternoon sessions and College Institutes provide opportunities for us to share and critically consider the insights and issues we are encountering in our teaching and in
10
Collaborative and Systemic Assessment of Student Learning
187
the research on teaching we have studied. Our faculty also regularly create written publications on various aspects of our scholarly work on teaching, learning, and assessment. For example, Learning that Lasts (Mentkowski and Associates, 2000) is a comprehensive look at the learning culture of Alverno College based on two decades of longitudinal studies and on leading educational theories. Assessment at Alverno College: Student, Program, and Institutional (Loacker & Rogers, 2005) explores the conceptual framework for assessment at Alverno with specific examples from across disciplines and programs, while Self Assessment at Alverno College (Alverno College Faculty, 2000) provides an analysis of the theory and practice of Alverno faculty in assisting students to develop self assessment as a powerful means of learning. Ability-based Learning Outcomes: Teaching and Assessment at Alverno College (Alverno College Faculty, 2005) articulates the meaning of each of the eight abilities in the Alverno curriculum and gives examples of teaching and assessment processes for each. And in Disciplines as Frameworks for Student Learning: Teaching the Practice of the Disciplines (Riordan & Roth, 2005), Alverno faculty from a variety of disciplines consider how approaching disciplines as frameworks for learning transforms the way they think about their respective fields. In most instances these are collaborative publications based on the work of different groups of faculty, like members of ability departments or representatives from the assessment council or faculty who have pursued a topic of interest together and have insights to share with the higher education community. The collective wisdom of the faculty on educational issues also serves as the basis for regular workshops we offer for faculty and staff from around the world both in hosting them on campus and in consulting opportunities outside the college. Our remarks about scholarly inquiry are not intended to diminish the value of significant scholarly inquiry that advances understanding of disciplinary fields and the connections among them. On the other hand, we have come to value the fertile ground for inquiry that lies at the intersection of student learning and the disciplines. We believe that it deserves the same critical and substantive analysis because it has the potential to not only enhance student learning but also to help us re-imagine the meaning and practice of the disciplines themselves.
Reviewing what we Have Learned In one sense, assessment of student learning is not a new thing. We teachers throughout the world at all levels have always been involved in determining what and how well our students have learned. What has challenged us, particularly in higher education, is the call to raise serious questions about the kind of learning that is most significant, the most effective ways of determining whether students are learning, how assessment processes not only determine student learning but improve it, and how assessment can be used to improve
188
T. Riordan and G. Loacker
design of curriculum and instruction in the service of student learning. Many of us across the higher education community have perhaps raised these questions for ourselves individually, but can individual spurts of a focus on student learning enhance that learning as much as a focus committed to by an entire faculty? What might be possible if a faculty together organized to find better ways of improving student learning through assessment? Now institutions are being required, by stakeholders and accrediting bodies, to not only raise the questions but explain how they are answering them. We think it is important to brainstorm about what individual faculty might do to bring their institutions to a willingness to grapple with these questions and come to answers that fit the institution and its students. In this chapter we have explored our experience in addressing these questions as an institution over an extended period of time. We would like to further exchange what we have learned in the process, based on our own practice as well as on what our colleagues have told us and what you can tell us in return. We have learned from colleagues that our commitment to collaboration on issues of teaching, learning, and assessment is perhaps the most important dimension of creating a culture of learning. We recognize that our particular structures might not be appropriate or most effective for other institutions, but our experience and that of others across higher education reinforces the value of taking shared responsibility for student learning. However this might be translated into practice, we have found that it makes for a more coherent, developmental, and even satisfying learning experience for students, because the faculty are working together toward common learning outcomes and are developing as well as sharing with one another the most effective ways of teaching and assessing those outcomes. We have learned in our work with faculty around the world on assessment of learning outcomes that the more faculty take personal ownership of learning outcomes across a program and/or institution the more success they will have in doing meaningful assessment. Frequently we have heard stories from our colleagues at other institutions that faculty are sometimes suspicious, even resentful, of assessment because they perceive it as handed down from someone or some group. If, on the other hand, they see assessment of learning outcomes as helping them address questions they themselves have about the quality of their teaching and student learning, they are more likely to embrace it. At Alverno we, as the faculty, see ourselves as responsible for articulating the eight abilities at the heart of the curriculum. Initially we saw them and still see them as related to the significant processes of thinking in our own fields of study. How then, we asked, shall we evaluate each student’s performance of them? The commitment of the faculty to consistent, shared, and systemic assessment of student learning emerged, therefore, not in response to external stakeholders or accountability concerns, but in recognition of the need to design forms of assessment that reflect the kind of curriculum we had articulated. While many institutions now may be responding to external pressures as the basis for their work on assessment,
10
Collaborative and Systemic Assessment of Student Learning
189
ultimately the success of that work will depend on whether faculty embrace it because they see it as enhancing how students learn what is important. We have also learned that the effectiveness of learning outcomes in promoting student learning depends heavily on the extent to which the outcomes are actually required. From the start of our ability-based curriculum, student success throughout the curriculum has depended on demonstration of the eight abilities in the context of disciplinary study. Our students know that graduation from the college, based on performance in their courses in general education and their majors, is directly related to their achievement of learning outcomes. They know this because the faculty have agreed to hold them to those standards. We realize that learning outcomes that are seen by faculty as aspirations at best and distractions at worst will not be taken seriously by students and, thus, will not become vital dimensions of the curriculum, no matter what assessment initiatives emerge. We have learned as well how important it is to make a focus on student learning a significant priority in our scholarship. When most of us in higher education complete our graduate work we have as our primary focus scholarly inquiry into the discipline we have chosen. Indeed, for many the focus is a very specialized aspect of a discipline. Another way of saying this is to say that our primary identity is based on our responsibility to the discipline. But what happens when we make student learning the kind of priority we have suggested throughout this chapter, indeed throughout this book? The reflections here attempt to indicate how the lens of student learning has affected both our work as a community of educators as well as our individual professional identities. Finally, we have learned that, although our collaboration as Alverno colleagues has been essential to a focus on student learning, connections with colleagues beyond the institution are critical to our ongoing inquiry. These colleagues include the more than four-hundred volunteer assessors from the business and professional community who contribute their time and insight to our students’ learning. They also include the colleagues we connect with in workshops, consultations, grant projects, conferences, and other venues; or those who are contributing to the growing body of literature on teaching, learning, and assessment we have found so valuable. We owe them a great debt and are certain our students have been the beneficiaries as well. It is in this spirit that this chapter aims to reflect and foster the kind of continuing conversation we need in order to make assessment of student learning the powerful and dynamic process it can be.
Looking Forward What does the future hold for us, both at Alverno College and in the higher education community at large? How can we continue to improve in our efforts to optimize student learning? Drawing on our own principles and practices and on our work with hundreds of institutions around the world, we propose a few final thoughts.
190
T. Riordan and G. Loacker
Make assessment of student learning central to teaching, not an addition to it. One of the most frequently asked questions at our workshops on teaching and assessing of student learning is how we can devote so much time and energy to the practices we have described in this chapter. This question usually comes out of the assumption that assessment of student learning outcomes is something we are adding to the already long list of faculty responsibilities in higher education. At Alverno, however, assessing student learning has become an organic part of the teaching and learning process. The question is not, ‘‘How will we find the time and resources to do add on assessment of student learning outcomes?’’ Rather the question is, ‘‘How will we determine whether our students have learned the kinds of thinking and doing we consider essential to the degree we are conferring?’’ This is not a question peripheral to teaching; it is at the heart of teaching. Whether we call it ‘‘assessment’’ or not, we strongly believe that designing processes to determine what students are learning and to help them improve is critical to the teaching enterprise. As institutions of higher education seek to become more focused on student learning, this integral connection between teaching and assessment will be important. Think carefully about priorities, including asking hard questions about what is not as important as student learning. Are we in higher education spending too much time, energy, and money on interests and activities that are not contributing to student learning in effective ways? This is another way to think about the resource questions that always seem to emerge in discussions about assessment. Take a hard look at the connection, or lack of it, between allocation of resources and student learning. One practical step we have taken at Alverno to ensure productive use of resources in the service of student learning is to make as our primary scholarly responsibility collaborative inquiry among our faculty about teaching, learning, and assessment within and across our disciplines. This is reflected in how we focus our intellectual energy, how we spend our time, and how we evaluate faculty performance. We understand that different institutions will have different scholarly requirements depending on their missions, but asking difficult questions about the benefits to our students is important. Find or create the language or pathways that will most actively engage faculty in productive work on teaching, learning, and assessment. It sometimes seems that using the word ‘‘assessment’’ is counterproductive in encouraging people to take it seriously in relation to teaching and learning. It has so many connotations associated not only with accountability, but even with backdoor attempts to negatively evaluate faculty, that many hope it will just go away. On the other hand, committed faculty surely take seriously the responsibility to help their students learn and to create ways of making sure that either students have learned or that something needs to be done to improve their learning. Accountability is here to stay and will provide the heat, but doesn’t the light need to come from faculty commitment to student learning? The more we can do to tap into that important resource of faculty commitment with the language we use and the initiatives we develop, the more likely we are to engage and sustain interest in assessment of student learning outcomes.
10
Collaborative and Systemic Assessment of Student Learning
191
Clearly, as we at Alverno study our curriculum and our students’ performance, we realize that we do not experience learning outcomes and student assessment as add-ons to our learning environment. Rather, we understand them as recently discovered elements integral to the teaching/learning process that help to define it more meaningfully. A student might be overheard, for instance, describing a debate for which she is gathering evidence on an incident or character in Hamlet – not as another type of speech to be learned but as a way of understanding Shakespeare’s characters and style more deeply. In a self assessment she might describe her problem solving process in creating a new art work – not as a generic strategy to be learned but as a new understanding of the role of her intellect in her production of art. Through our refining and redefining of the curriculum, we recognize that our practice now results from carefully thought-through revisions of our disciplinary and educational philosophies – revisions that, had we foreseen them when we began, might have kept us from proceeding. Because we thought we were already doing a good job of teaching, we didn’t realize that students could learn better if we changed our methods. We didn’t foresee that focusing on student learning might threaten the ongoing existence of our favorite lecture or might require us to listen more than we speak. As we look ahead, we are determined to commit ourselves to keep reexamining our practice as we learn from our own experience as well as from other emerging theories. We also need to re-examine it as new seismic shifts like globalization and technology defy our strategic planning and demand new kinds of transformation. Where will our re-examining take us? We’re not sure, but if we can keep observing and tracking growth in our students’ understanding and performance, we will take that growth as a prod to keep learning rather than as a testimony to success achieved. Signs within the educational community across the globe suggest that incorporation of student learning outcomes into teaching, learning, and assessment is not just the unique focus of our institution; nor is it just a possibly reversible trend. (Note: the First International Conference on Enhancing Teaching and Learning Through Assessment, Hong Kong, 2005; the Bologna Declaration, Europe, 1999; the Sharing Responsibility for Essential Learning Outcomes Conference, Association of American Colleges and Universities, 2007). Instead, these signs seem to us to be evidence of growth in the kind of inquiry that can keep education alive with questioning that is intellectually and morally lifesustaining.
References Alverno College Assessment Committee/Office of Research and Evaluation. (1982). Alverno students’ developing perspectives on self assessment. Milwaukee, WI: Alverno College Institute. Alverno College Faculty. (2005). Ability-based learning outcomes: Teaching and assessment at Alverno College. Milwaukee, WI: Alverno College Institute.
192
T. Riordan and G. Loacker
Alverno College Faculty. (Georgine Loacker, Ed.). (2000). Self assessment at Alverno College. Milwaukee, WI: Alverno College Institute. Arenson, K. W. (2000, January 1). Going higher tech degree by degree. The New York Times, p. E29. Bollag, B. (2006, October 27). Making an art form of assessment. The Chronicle of Higher Education, pp. A1–A4. Boyer, E. (1997). Scholarship reconsidered: Priorities of the professoriate. New York: John Wylie & Sons. Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). How people learn: Brain, mind, experience, and school (Expanded ed.). Washington, DC: National Academy Press. Gardner, H. (1999). The disciplined mind: What all students should understand. New York: Simon & Schuster. Kolb, D. (1976). The learning style inventory: Technical manual. Boston: McBer. Levine, A. (2006). Educating school teachers. The Education Schools Project Report #2. Washington, DC: The Education Schools Project. Loacker, G., & Rogers, G. (2005). Assessment at Alverno College: Student, program, and institutional. Milwaukee, WI: Alverno College Institute. Mentkowski, M. and Associates. (2000). Learning that lasts: Integrating learning, performance, and development. San Francisco: Jossey-Bass. Perry, W.G., Jr. (1970). Forms of intellectual and ethical development in the college: A Scheme. Austin, TX: Holt, Rinehart &Winston. Riordan, T., & Roth, J. (Eds.). (2005). Disciplines as frameworks for student learning: Teaching the practice of the disciplines. Sterling, VA: Stylus Publications. Twohey, M. (2006, August 6). Tiny college earns big reputation. The Milwaukee Journal Sentinel, pp. A1, A19. Who needs Harvard? Forget the ivy league. (2006, August 21). Time, pp. 37–44. Wiggins, G. & McTighe, J. (2005). Understanding by design (Expanded 2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development (ASCD).
Chapter 11
Changing Assessment in Higher Education: A Model in Support of Institution-Wide Improvement Ranald Macdonald and Gordon Joughin
Introduction Assessment of student learning is a complex matter being dealt with in large, complex institutions, involving diverse collections of staff and students engaged in intensive activities that lie at the heart of higher education. Not surprisingly, assessment has become a focal point of research and innovation in higher education. Consequently, the literature on assessment is now considerable and growing rapidly, while national projects to generate ways of improving assessment have become commonplace in many countries At the same time, however, attempts to improve assessment often fall short of their promise. Wellfounded statements of assessment principles fail to work their way down from the committees which draft them; individual lecturers’ innovative assessment practices take hold within their immediate sphere of influence without infiltrating a whole programme; limited progress is made in encouraging departments to take seriously the assessment of generic qualities or attributes considered as essential learning outcomes for all graduates. What then does it take to improve assessment in a university? This chapter presents a response to this situation focused on an emerging understanding of the nature of higher education institutions in the context of change. The purpose of this chapter therefore is to present a model of higher education institutions that is relevant to this context, highlighting those aspects of universities and similar organisations which require attention if assessment practices are to be improved. We believe that the challenges facing educational organisations, their aspirations and those of their staff, are best construed and most successfully pursued by clarifying their nature in the light of recent organisational thinking, and that failure to do this will inevitably lead to unrealistic goals, misdirection and consequent dissipation of energy, and, ultimately, disappointment, while at the same time more effective ways of achieving
R. Macdonald Learning and Teaching Institute, Sheffield Hallam University, Sheffield, UK e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_11, Ó Springer ScienceþBusiness Media B.V. 2009
193
194
R. Macdonald and G. Joughin
improvement in assessment will not be pursued. The chapter therefore begins with a model of higher education institutions which seeks to clarify our understanding of the nature of universities, followed by an exploration of the implications of the model for changing assessment practices.
A Model of Institutional Impacts on Assessment, Learning and Teaching Because of the complex nature of both assessment and universities, a model is considered to be a necessary component in addressing the issue of improving assessment. Keeves (1994, p. 3865) notes that ‘‘research in education is concerned with the action of many factors, simultaneously or in a causal sequence, in a problematic situation. Thus it is inevitable that research in the field of education should make use of models in the course of its inquiries to portray the interrelations between the factors involved.’’ Models are useful for more than research, however, since they are representations designed to help us see more clearly what we already know, portray things in ways that have not occurred to us before, or provide us with insights into areas that have previously confused us. In the case of an organisation such as a university, a model needs to balance both formal structures and the roles and behaviours of people who may act outside the remit of their formal responsibilities. Models by their nature and purpose simplify things – by naming the elements of a system and describing relationships between those elements, they seek to provide a usable tool for analysis, discovery and action, and in so doing risk omitting elements or relationships which some may see as important (Beer, Strachey, Stone, & Torrance, 1999). This may be what Trowler and his colleagues had in mind when claiming that ‘‘all models are wrong; some are useful’’ (Trowler, Saunders, & Knight, 2003, p. 36, citing Nash, Plugge & Eurlings, 2000). We acknowledge that, like all models, our model is open to critique and revision. Our model is presented below in Fig. 11.1, followed by an explanation of its components. Many of the elements of our model can be located at particular levels of the institution, or, in the case of ‘context’, beyond the institution. Other elements, namely those associated with quality, leadership, management and administration, cut across these levels. The foundational level of the model is the unit of study where learning and assessment happens – variously termed the module, subject, unit or course. The second level of the model represents the collection of those units that constitute the student’s overall learning experience – typically the degree programme or a major within a degree. At the third level, academic staff are normally organised into what are variously termed departments, schools or faculties (where the latter term refers to a structural unit within the organisation).
11
Changing Assessment in Higher Education
195
Fig. 11.1 A model of institutional impacts on assessment, learning and teaching
The fourth level represents institutional-level entities that support learning, teaching and assessment. Finally, the model acknowledges the important influence of the wider, external context in which universities operate. The number of levels and their names may vary from institution to institution and between countries, but essentially they reflect a growing distance from where learning, teaching and assessment happens, though they may be no less influential in terms of the impact they may have.
The Module Level Each level within the model has a number of elements. At the level of the module, where learning and its assessment occurs, we might see module design, teachers’ experience of teaching and assessment, and students’ previous and current experiences of being assessed. This is where the individual academic or the teaching team designs assessment tasks and where students are typically most actively engaged in their studies. This is also the level at which most research into and writing about assessment, at least until recently, has been directed, ranging from the classic studies into how assessment directs students’ learning (e.g., Miller & Parlett, 1974; Snyder, 1971) to more recent studies into student responses to feedback, and including comprehensive textbooks on assessment in higher education (e.g., Boud & Falchikov, 2007; Heywood, 2000; Miller, Imrie & Cox, 1998), not to mention a plethora of how to books and collections of illuminating case studies (e.g., Brown & Glasner, 1999; Bryan & Clegg, 2006; Knight, 1995; Schwarz & Gibbs, 2002).
196
R. Macdonald and G. Joughin
The Course or Programme Level At the course or programme level, the elements might include programme and overall curriculum design, procedures for appointing and supporting course teams, and how resources are allocated for course delivery. At this level we would also see aspects such as innovation in learning and teaching, and it is here that students’ overall experience of learning and assessment occurs. Both the importance of this level and its relative neglect are highlighted by recent arguments for focusing attention at the programme level (e.g., by Knight, 2000; Knight & Trowler, 2000; Knight & Yorke, 2003). This level involves more than the sum of individual modules; it is where disciplinary and professional identity is to the fore and where innovation in teaching and assessment impacts on the student experience.
The Department or Faculty Level It is at the departmental or faculty level1 that cultural issues become a major determinant of staff behaviour. It is here that local policies and procedures, including the allocation of staff to particular responsibilities, can support or impede innovations in teaching and assessment. It is where the take up of staff development opportunities develops greater capacity for innovation and academic leadership. Messages about the relative importance of research, teaching and management are enacted at this level, and it is here that issues of identity and morale come to the fore.
The Institutional Level The institutional level elements may include the overall framework for setting principles, policies and regulations around learning, teaching and assessment, the allocation of resources to learning and teaching as against other activities, and rewards and recognition for learning and teaching in comparison to rewards and recognition for research, consultancy or management. At this level, overall resourcing decisions are made regarding teaching and learning, probation and promotion policies are determined, assessment principles are formulated and institutional responses to external demands are devised.
1
where ‘faculty’ is used to refer to an organisational unit rather than, as is common in the US and some other countries, to individual academic staff
11
Changing Assessment in Higher Education
197
The External Level Finally, the external level will have a large number of elements, including the regulatory requirements of governments, the role of government funding bodies in supporting quality enhancement and innovation in teaching and learning, national and international developments within the higher education sector, requirements of professional accrediting bodies, media perception and portrayal of higher education, and the external research environment. With respect to learning and assessment, this level may well result in institutional responses to curriculum design and content as well as overall principles such as the requirement for general graduate attributes to be reflected in assessed student learning outcomes. In addition to this, students’ expectations can be strongly influenced by external forces, including their experience of other levels of education, their engagement in work, and the operation of market forces in relation to their chosen field of study.
Quality, Leadership, Management and Administration Several functional elements cut across these five levels and may impact on any or all of their elements. In our model we have identified two main functional themes, firstly quality, including quality assurance (doing things better) and quality enhancement (doing better things) and secondly, roles. The functional roles in the model include leadership – conceived in terms of visioning, new ideas and strategy; management – concerned with putting fully-operating systems into place and maintaining them; and administration – the day-to-day operation of systems (Yorke, 2001). The functional aspects may relate to quality through an emphasis on quality assurance, quality enhancement or both.
Relationships If the elements of the model indicate what aspects of an institution require attention in attempts to improve assessment, it is how these elements relate to each other that gives the model its dynamic. The possible connections between elements are not infinite, but they are certainly numerous and complex. We indicate this, albeit in a limited way, through the two-way arrows in Fig. 11.1.
Applying the Model How useful is the model? One test is to consider the following fictional, but nevertheless realistic, scenario that will resonate with many academics.
198
R. Macdonald and G. Joughin
For some time students have been complaining that they are not all being treated the same when it comes to their assessment. There are different arrangements for where and when to hand in written assignments; the time taken to provide feedback varies enormously, despite university guidelines; different practices exist for late work and student illness; some staff only seem to mark between 35 and 75 per cent whereas others seem prepared to give anything between zero and 100; in some courses different tutors put varying degrees of effort into identifying plagiarism; procedures for handing back work vary with the result that some students do not even bother to collect it. All this in spite of the fact this (UK) university has policies and guidelines in place which fully meet the Quality Assurance Agency’s Code of Practice on Assessment (The Quality Assurance Agency, 2006a).
Which aspects of the university, as identified by the elements of the model, would need to be addressed to rectify or ameliorate the range of issues raised here? Which connections between elements need strengthening? In the scenario, the principles, policies and regulations at the institutional level have not connected with the students’ experience of assessment at the module level. Is this due to the nature of the policies themselves, to the departmental cultures which subvert the policies, and/or to the teachers’ limited exposure to assessment theory? We don’t propose to resolve these matters here, but simply to indicate that we think the model is useful in considering such situations. It provides a framework for identifying what might need to be looked into to resolve such problems and a starting point for defining problems, analysing their bases, and directing activities to address them. It suggests that, to implement a planned innovation or to address a given problem, a number of elements will need to be worked with and several critical relationships will be involved. How many elements, or how much of a system, needs to be brought into play for change to be effective? At least three possibilities are apparent. The first, which we can quickly dismiss, would be to suggest that, if we can select the ‘‘right’’ element, intervention at a single point might do the trick. The second is to consider if there might be a small number of elements which, in combination, might lead to large-scale change, in line with Malcolm Gladwell’s popularised notion of the tipping point, according to which significant change can occur when a small number of factors, none of which seem to be of major significance in themselves, coalesce, leading to change that is significant or even dramatic (Gladwell, 2000). Both of these possibilities pander to an almost certainly mistaken belief in leverage points within organisations. Most of us have a strong tendency to believe that there are such points and an equally strong desire to know where they are, since this would give us considerable power in working towards our goals (Meadows, 1999). This search is illusory and unlikely to succeed,
11
Changing Assessment in Higher Education
199
not least because of the loosely coupled nature of universities as organisations (Knight & Trowler, 2000) – change in one element may or may not lead to significant change in adjoining elements. The third possibility, and the one that seems most plausible to us, is that all or most elements need to be taken into account. The need to address some elements may be more immediately apparent, while the need to address others may be less so. Moreover, as we intervene at one level of the institution, and in relation to one element at that level, other elements and other levels are likely to be affected, though the nature and degree of this impact may be difficult to predict. Each of these possibilities is based on an understanding of the university as a more-or-less linear organisation in which action in one part of the organisation has more-or-less predictable consequences in other parts through a chain of cause-and-effect leading to a desired outcome. That universities do not operate quite, or even at all, like this is soon recognised by anyone charged with bringing about widespread change in teaching, learning, or assessment within their university. A more sophisticated approach is needed.
Systems Approaches So far we have not used the term system, but it is clear that our model represents a systems approach to understanding universities and change, so an exploration of systems thinking is inevitable if we are to understand higher education institutions adequately and use such understanding to adopt change strategies that have a reasonable chance of success. In this section, therefore, we consider some fundamental concepts in systems theory. The essential feature of a systems approach is to view the university as a whole, rather than focusing on one or even several of its parts. A systems approach therefore complements the expanding body of work more narrowly focused on specific aspects of the university such as assessment strategies, students’ experience of assessment in their courses, plagiarism or departmental responses to assessment issues. A systems approach is also a reaction to the reductionism implicit in much thinking about learning, teaching and assessment, according to which change depends on improvements in one or two key areas of academic practice.
Hard and Soft Systems Within systems thinking, a useful distinction is often made between hard and soft systems. Checkland has documented the shift in thinking from hard to soft systems well (Checkland 1993; 1999). Hard systems are associated with organisations in which clearly defined, uncontested goals can be achieved and problems corrected through the systematic application of agreed means. The system is seen as somewhat self-evident and its existence is independent of those
200
R. Macdonald and G. Joughin
working towards its goals. Importantly, the system is open to direct manipulation by human intervention – things can be made to happen. In contrast to this, soft systems methodologies come into play where objectives are less clear, where assumptions about elements of the system cannot be taken for granted, and where there is a challenging issue needing to be addressed, in short, in situations where organisational, management and leadership issues are involved. Checkland’s acronym, CATWOE, indicates the flavour of soft systems thinking (Checkland, 1993). It refers to the elements involved in defining a situation, where there are customers (the beneficiaries of the system); actors (the agents who undertake the various activities within the system); a transformation process (which results, in the case of higher education institutions, in graduates and research outputs); a Weltanschauung, or worldview; ownership (those who fund the system); and environmental or contextual constraints on the system. Checkland distinguishes the move towards soft systems as a move away from working with obvious problems towards working with situations which some people may regard as problematical, which are not easily defined, and which involve approaches characterised by inquiry and learning (Checkland, 1999). When the university is construed as a hard system, it is considered to have agreed goals of good practice, clearly defined structures designed to achieve these goals, and processes that are controlled and somewhat predictable. This is a view of universities which may appeal to some, particularly senior managers keen to introduce change and move their institution in particular directions. However, hard systems thinking is inappropriate where ends are contested, where stakeholders have differing (and often competing) perspectives and needs, and where understanding of the system is necessarily partial. Inappropriate as it may be, the temptation to see the university as a hard system remains, especially in the minds of some administrators or leaders seeking to achieve specific ends through imposing non-consensual means. When this occurs, change becomes problematic, since change in a soft system involves not merely working on the right elements of the system, but understanding transformational processes in particular ways, including understanding one’s own role as an actor within the system. The difficulties in effecting change within universities as systems is often attributed to the ‘loose-coupling’ between units. Loose-coupling is well described by Eckel, Green, Hill and Mallon as characteristic of an organisation in which activity in one unit may be only loosely connected with what happens in another unit. This compares with more hierarchical or ‘tightly-coupled’ organizations where units impact more directly on each other. As a result, change within a university is likely to be small, improvisational, accommodating, and local, rather than large, planned, and organization-wide. Change in higher education is generally incremental and uneven because a change in one area may not affect a second area, and it may affect a third area only after significant time has passed. If institutional leaders want to achieve comprehensive, widespread change, they must create strategies to compensate for this decentralization. (Eckel, Green, Hill & Mallon., 1999, p. 4)
11
Changing Assessment in Higher Education
201
Loose coupling can apply to both hard and soft systems thinking. Both assume a degree of causal relationship between elements or units, and in both cases loose coupling highlights that these connections are not necessarily direct and immediate. In our earlier scenario, we might consider the loose couplings involved as we follow the impact of the university’s assessment policies as they move from the university council down to faculties and departments and into the design of course assessment and its implementation by individual academic staff and students, noting how original intentions become diluted, distorted or completely lost at various stages. Loose coupling is an evocative term and, until recently, few would have contested its use to describe the university. Loose coupling, however, assumes a degree of connectivity or linearity between university units. This view has been challenged by a quite different understanding which sees the university as a nonlinear system, where interactions between elements are far from direct and where notions of causality in change need to be radically rethought, in short, the university is seen as a complex adaptive system.
Complex Adaptive Systems The notion of universities as complex adaptive systems promises interesting insights. Thinking about complex adaptive systems has been developed in a number of fields, including genetics, economics, and mathematics, and has some of its foundations in chaos theory. It is applied to management and organisation theory, and to an analysis of higher education institutions, by way of analogy, and therefore we should proceed cautiously in our use of it here. As noted earlier regarding models in general, the test of complexity theory’s applicability to universities will lie in its usefulness. Having noted this caveat, three constructs from the writing on complex adaptive systems seem particularly suggestive, namely agents, self organization, and emergence.
Agents Stacey describes a complex adaptive system as consisting of ‘‘a large number of agents, each of which behaves according to some set of rules. These rules require the agents to adjust their behaviors to that of other agents. In other words, agents interact with, and adapt to, each other’’ (Stacey, 2003, p. 237). Each unit in a university can be thought of as an agent, as can each individual within each unit, with each agent operating according to its own set of rules while adjusting to the behaviour of other individuals and units. The concept of agents introduces considerable complexity in two ways. Firstly, it draws our attention to the large number of actors involved in any given university activity. At the level of the individual course or module, we have the body of enrolled students, tutorial groups, and individual students, as
202
R. Macdonald and G. Joughin
well as a course coordinator, individual tutors, and the course team as a whole, including administrative staff. To these we could add the department which interacts in various ways with each of the agents mentioned, and multiple agents outside the department, including various agents of the university as well as agents external to the university. Secondly, each agent operates according to their own rules. This understanding is in contrast to the view that the various players in a university are all operating within an agreed order, with certain values and parameters of behaviour being determined at a high level of the organisation. Instead of seeking to understand an organisation in terms of an overall blueprint, we need to look at how each agent or unit behaves and the principles associated with that behaviour. What appear to be cohesive patterns of behaviour associated with the university as a whole are the result of the local activity of this array of agents, each acting according to their own rules while also adapting to the actions of other agents around them. Many writers have noted the many sets of rules operating in universities, often pointing out that the differing sets of values which drive action can be complex and are typically contrasting. While some values, such as the importance of research, may be shared across an institution, variation is inevitable, with disciplinary cultures, administrators, academics and students all having distinctive beliefs (Kezar, 2001). Complexity theory draws our attention to these variations, while suggesting that further variation is likely to be present across each of the many agents involved in the university system. One implication for our model is that it may need to be seen in the light of the different agents within a university, with the concept of agency brought to bear on each of the model’s elements. Aligned with this, we would interpret the model as incorporating the actions within each agent, interactions between agents, and a complex of intra- and inter-agent feedback loops (Eoyang, Yellowthunder, & Ward, 1998). The model thus becomes considerably more complex, and understanding and appreciating the rules according to which the various agents operate, and how they might adapt to the actions of other agents, becomes critical to understanding the possibilities and limitations of change.
Self-organisation With a multiplicity of agents each functioning according to their own rules, a complex adaptive system would seem likely to behave chaotically. The principle of self-organisation seeks to explain why this is not the case, and why organisations, including universities, exhibit organised patterns of behaviour (Matthews, White, & Ling, 1999). Systems as a whole and agents within a system self-organise, change and evolve, often in response to external challenges. Within a complex system of agents, however, each agent acts according
11
Changing Assessment in Higher Education
203
to its own rules and according to what it believes is in its best interests. Tosey (2002) illustrates this well with reference to students adjusting their behaviour in meeting, or not meeting, assignment deadlines – their patterns changed in direct response to their tutors’ lack of consistency in applying rules. Multiply this kind of scenario countless times and we have a picture of the university as a collection of units and individuals constantly changing or re-organising, in unpredictable ways, in response to change. The result is not chaos, but neither is it predictable or easily controlled, since multifarious internal and external factors are involved.
Emergence The characteristics of agents and self-organisation give rise to the third core construct of complex adaptive systems. Many of us are accustomed to thinking in terms of actions and results, of implementing processes that will lead, more or less, to intended outcomes. The notion of emergence in complex adaptive systems is a significant variation on this. Emergence has been defined as ‘‘the process by which patterns or global-level structures arise from interactive locallevel processes. This ‘structure’ or ‘pattern’ cannot be understood or predicted from the behaviour or properties of the components alone’’ (Mihata, 1997, p. 31, quoted by Seel, 2005). In other words, how universities function, what they produce, and certainly the results of any kind of change process, emerge from the activities of a multiplicity of agents and the interactions between them. In such a context, causality becomes a problematic notion (Stacey, 2003); certainly the idea of one unit, or one individual at one level of the organisation, initiating a chain of events leading to a predicted outcome, is seen as misleading. As Eckel et al. (1999, p. 4) note, ‘‘(e)ffects are difficult to attribute to causes’’, thus making it hard for leaders to know where to focus attention. On the other hand, and somewhat paradoxically, small actions in one part of a system may lead to large and unforeseeable consequences. Emergence also suggests the need for what Seel (2005) refers to as watchful anticipation on the part of institutional managers, as they wait while change works it way through different parts of a system. Emergence implies unpredictability, and, for those of us in higher education, it reflects the notion of teaching at the edge of chaos (Tosey, 2002). Along with the concept of agency, the complexity which emergence entails ‘‘challenges managers to act in the knowledge that they have no control, only influence’’ (Tosey, 2002, p. 10). Seel (2005) suggests a set of preconditions for emergence, or ways of influencing emergence, including connectivity – emergence is unlikely in a fragmented organisation; diversity – an essential requirement if new patterns are to emerge; an appropriate rate of information flow; the absence of inhibiting factors such as power differences or threats to identity; some constraints to action or effective boundaries; and a clear sense of purpose.
204
R. Macdonald and G. Joughin
Let’s return to our earlier scenario. Does viewing the university as a complex adaptive system help us to see this situation differently, and in a more useful way? For a start, it leads us to focus more strongly on the agents involved – the university’s policy unit; whatever committee and/or senior staff are responsible for quality assurance; the department and its staff; and the students as individuals and as a body. What drives these agents, and what motivates them to comply with or resist university policy? How do they selforganise when faced with new policies and procedures? What efforts have been made across the university to communicate the Quality Assurance Agency’s codes of practice, and what, apart from dissatisfied students, has emerged from these efforts? Perhaps most importantly, how might Seel’s preconditions for emergence influence attempts to improve the situation?
Complex Adaptive Systems and Change It is, at least in part, the failure of such policies and the frustrations experienced in seeking system-wide improvements in teaching, learning and assessment that has led to an interest in harnessing the notion of the university as a complex adaptive system in the interests of change. While our model suggests that change can be promoted by identifying critical elements of the university system and key relationships between them, perhaps the facilitation of change in a complex adaptive system requires something more. Two management theorists have proposed principles of organisational change in the context of complex adaptive systems. The first of these, Dooley (1997), suggests seven principles for designing complex adaptive organisations. We would argue that universities are complex adaptive organisations – they do not need to be designed as such. We interpret Dooley’s suggestions as ways of living with this complexity and taking the best advantage of the university’s resulting organisational qualities. Dooley’s principles are as follows: (a) create a shared purpose; (b) cultivate inquiry, learning, experimentation, and divergent thinking; (c) enhance external and internal interconnections via communication and technology; (d) instill rapid feedback loops for self-reference and self-control; (e) cultivate diversity, specialisation, differentiation, and integration; (f) create shared values and principles of action; and (g) make explicit a few but essential structural and behavioral boundaries. (Dooley, 1997, 92–93) Dooley’s list is prescriptive. It suggests how complexity theory could be applied to an organisation. The second theorist, Stacey, in contrast, eschews the notion of application, implying as it does the intentional control of a system by a manager who stands outside it. Rather, ‘‘managers are understood to be
11
Changing Assessment in Higher Education
205
participants in complex responsive processes, engaged in emergent enquiry into what they are doing and what steps they should take next’’ (Stacey, 2003, p. 414). Consequently, instead of offering applications or prescriptions, Stacey notes how the theory of complex adaptive systems shifts our focus of attention. In the context of our model, the elements remain, but the focus becomes directed in particular ways, namely
on the quality of participation, that is, on how self-organising units are
responding to initiatives, and recognise the responses and decisions that emerge from them; on the quality of conversational life, noting the themes that organise colleagues’ conversational relating (along with themes that block free-flowing conversation) and bringing themes from outside the institution (or one’s part of it) into conversations; on the quality of anxiety and how it is lived with. Anxiety is an inevitable consequence of free-flowing conversation based on a search for new meanings. Public conversations have private implications, such as a threat to one’s professional identity; principles of self-organisation apply at the individual as well as the unit level. on the quality of diversity. Diversity is essential if new ways of doing things are to emerge. Deviance, eccentricity and subversion play important roles in emergence. on unpredictability and paradox. Understanding universities as complex adaptive systems leads to the recognition of the limits to predictability, that we must often act not knowing what the consequences will be, and that surprise becomes part of the dynamic of the process of change. Actions should be judged on whether they create possibilities of further action rather than on whether they produce predicted desired outcomes. So what do we do now to bring about change in our scenario? As well as looking at the various units, how they operate, and what has resulted, we are likely to look closely at whatever happens as change is initiated. We will keep in touch with colleagues, teaching teams, departments and other units to see how they are responding. We will look out for, and encourage the expression of, disparate views about what should happen. We will cease looking only for outcomes that meet our objectives, and be alert to the unexpected. We will work in ways that energise our colleagues. We will seek to create contexts for conversations and learning.
Some Practical Implications: Change as Conversation By starting with a model and exploring how aspects of the change literature help us to put some flesh on it for a complex institution such as a university, we may lose sight of the main purpose of our enquiry in this chapter – to understand
206
R. Macdonald and G. Joughin
how to bring about improvements in assessment most effectively in order to improve the practices and experiences of all those engaged in the process. Clearly it is not just the elements of our model which need exploring but the relationships between them – the arrows in Fig. 11.1 – and how patterns of interrelationships overlay the model. Exploring the relationship between systems thinking and complexity, Stacey (2007) draws on a responsive processes perspective whereby individuals are seen as interdependent persons whose social interactions produce patterns of relationships which develop over time. Often characterised as conversations, these interactions allow more novel strategies to emerge, depending on how valued these conversations are within the organisation. The case study presented towards the end of this chapter provides a genuine and current example of a change process which, as well as using institutional structures, arose from coffee bar conversations. Importantly, it gives credence to those conversations that Shaw (2002) and Owen (1997) identify as critical change mechanisms in complex organisations. The educational change literature, examining large change initiatives in compulsory and post-compulsory education in recent years, alludes to many of the complexity constructs without naming them as such. Fullan, a wellknown writer on educational change within schools, is a prime example. He notes that significant educational change ‘‘consists of changes in beliefs, teaching styles, and materials, which can come about only through a process of personal development in a social context’’ (Fullan, 2001, p.124, his emphasis). He goes on to stress the need for teachers to be able to ‘‘converse about the meaning of change’’ (ibid.). Fullan recognises that change involves individuals in a social context and that there may be resistance to change or tokenistic compliance, possibly based on past histories of change or engagement with management. This necessitates that dialogue has to be the starting point for change. Failure to engage individuals whereby they ‘‘alter their ways of thinking and doing’’ is unlikely to take them through anxiety and feelings of uncertainty (what Fullan calls the ‘‘implementation dip’’) to successful implementation. Hopkins’ work is also illuminating. In seeking to make connections between research and experiences in schools and higher education, Hopkins (2002) stresses the role of networks in supporting innovation and educational change by bringing together those with like-minded interests. This is similar to Lave and Wenger’s concept of communities of practice (1991) which are ‘‘ . . .groups of people who share a concern, a set of problems, or a passion about a topic, and who deepen their knowledge and expertise in this area by interacting on an ongoing basis’’ (Wenger, McDermott & Snyder, 2002, p. 4). They are separate from but may be part of the institution; they do not come about as the result of organisational design. Though Wenger does not use the terms, communities of practice may be thought of as emergent and self-organising as the result of social interaction between agents within the organisation.
11
Changing Assessment in Higher Education
207
Recognising that consensus is difficult in higher education because of the individual autonomy which academics have, Hopkins (2002) outlines a set of principles upon which successful improvement programmes in higher education should draw. Such programmes should be:
achievement focused; empowering in aspiration; research based and theory rich; context specific; capacity building in nature; inquiry driven; implementation oriented; interventionist and strategic; externally supported; and systemic.
Hopkins acknowledges that change initiatives will not necessarily draw on all of these principles and that, because of the differing contexts, cultures and nature of institutions, they may need to adopt differentiated approaches to choose or adapt appropriate strategies. This highlights the fact that one size fits all approaches to change, perhaps using the latest fad in organisational development or strategic management, is unlikely to have the desired effects. The practical application of any model or set of conceptual constructs such as those expounded in the complexity literature will depend on how well we understand the way our organisation works and how the important networks within it operate. One approach to developing this understanding is through appreciative inquiry. Appreciative inquiry was developed by David Cooperrider and colleagues at Case Western Reserve’s School of Organization Behaviour in the 1980s (Cooperrider, Whitney, & Stavros, 2005). The approach is based on the premise that organisations change in the direction in which they inquire, so that an organisation which inquires into problems will keep finding problems but an organisation which attempts to appreciate what is best in itself will discover more and more that is good. It can then use these discoveries to build a new future where the best becomes more common. Appreciative inquiry can be applied to many kinds of organisational development, including change management processes involved in quality enhancement initiatives. The first step in appreciative inquiry is to select the affirmative topic – that is, the topic that will become the focus of questioning and future interventions. In our case, this would be assessment. An appreciative inquiry into assessment would then explore two questions:
What factors give life to assessment in this course/programme/university when it is and has been most successful and effective? This question seeks to discover what the organisation has done well in the past and is doing well in the present in relation to assessment.
208
R. Macdonald and G. Joughin
What possibilities, expressed and latent, provide opportunities for more vital, successful, and effective assessment practices? This question asks the participants to envisage and design a better future. (Cooperrider, Whitney, & Stavros, 2005, p. 32) Participants then engage in appreciative interviews, one-on-one dialogues using questions related to highpoint experiences, valuing what gives life to the organisation/project/initiative at its best. Whether one uses appreciative inquiry or some other approach to reflecting on the organisation, a key focus is on asking questions, engaging in conversations and, drawing on complexity theory, enabling change to emerge. This may prove problematic in situations where quality assurance processes have created a compliance culture rather than one characterised as unpredictable, creative, challenging and paradoxical. Hopkins (2002) observes that changes in schools have been characterised by increases in accountability and adherence to national educational reform programmes. One aspect of similar reforms in higher education in the UK and elsewhere has been the introduction of selfevaluation – currently called institutional audit in the UK (Quality Assurance Agency, 2006a) – which encourages a more open, reflective evaluation of the institution, including through the evaluation of subjects and courses. Since this process includes external examiners, there is a triangulation of review and evaluation which is increasingly focused on quality enhancement and evaluation for development and learning, not just for accountability purposes (Chelimsky, 1997). The following example illustrates how a change programme that is still in progress developed, not out of a top-down dictat, but out of a number of occurrences at different levels of a university. The reader is asked to reflect on how the matters we have considered in this chapter – systems approaches (particularly complex adaptive systems), change as conversation, and approaches to educational change – can aid us in understanding the nature of successful change within higher education institutions. It would require a more in depth presentation to highlight how agents, self-organisation and emergence all played their parts, but their roles are all present in the scenario which follows.
A Case Study in Institutional Change: Sheffield Hallam University Sheffield Hallam University, no doubt like many others, has made changes to assessment regulations and frameworks, processes and systems, and academic practices over the years without necessarily ensuring that the various aspects are well integrated or that the whole is viewed holistically. In 2005 the Assessment Working Group, which had been addressing assessment regulations and procedures for a number of years, acknowledged something was not right. At the same time feedback from students was suggesting
11
Changing Assessment in Higher Education
209
that, whilst the quality of their learning experience overall was very good, there were some aspects of assessment and feedback which needed addressing – a fact reinforced in some subject areas by the results of the 2006 National Student Survey. It was in this context that the first author and a senior academic in one of the university’s faculties produced a paper on Profile Assessment that addressed a number of issues concerning assessment, feedback and re-assessment. This was the outcome of the first of many coffee bar discussions! A further outcome was a spin-off from one of the University’s Centres for Excellence in Teaching and Learning (CETLs) with the proposal by the first author to establish TALI – The Assessment for Learning Initiative. At the same time, with the growing use of Blackboard as the university’s virtual learning environment and the need for it to communicate effectively with the University’s central data system, along with increasing complexity in the assessment regulations as changes were made from year to year, there was a growing misalignment between the practices of academics, the regulations and procedures, and the systems used to manage and report on the outcomes of the assessment procedures. There was also a perception that the assessment regulations and procedures drove the academic practices, rather than the other way around, producing what was commonly referred to as ‘the burden of assessment’. The opportunity was taken to gather a group together, involving senior staff from key departments and academics from the faculties at a local conference centre, Ranmoor Hall, to discuss the issues widely and produce suggestions from what became known as the Ranmoor Group. The Assessment Project was established to focus on the deliberate actions required to support the development of:
assessment practices that are learner focused and promote student engagement and attainment;
regulations that are clear, consistent and student centred; and assessment processes that are efficient and effective and which enable the delivery of a high quality experience for staff and students. A lead was taken in each of these areas by an appropriate senior member of the relevant department under the aegis of the Assessment Development Group. This group comprised three separate working groups and interacted with other key parts of the university. It reported to an Assessment Project Board, chaired by the Pro Vice-Chancellor, Academic Development, to ensure the appropriate governance of the project and the availability of people and resources to make it work. One aspect, The Assessment for Learning Initiative (TALI), illustrates how change has been occurring. TALI’s aim has been to achieve large-scale cultural change with regard to assessment practice across the university with a focus on:
210
R. Macdonald and G. Joughin
research informed change, supported by a research post in the institution’s academic development unit (the Learning and Teaching Institute), with a strong focus on the student and staff experience of assessment; the development of resources and case studies sharing good practice and focusing on key assessment themes such as feedback and academic integrity and supported by the appointment of secondees to key roles in each faculty; and the innovative use of technology to create learning experiences and improve the efficiency and effectiveness of assessment by, for example, promoting the use of audio feedback and the development of feedback tools. TALI has engaged with large numbers of staff through faculty and subject group meetings, more coffee bar encounters, and generally raising the profile of assessment across the university. Most importantly, it has engaged with staff at all levels and with students, including through the Students’ Union. The faculty appointments ensure the local ownership of the initiative and remove much of the sense of a top-down imposition. The initiative has been received enthusiastically across the university as people feel they genuinely have something to contribute and that the changes are designed to help them focus assessment more on learning and less on fitting into institutional constraints. As one would expect in a large institution (28,000+ students) with a diverse student and academic population, change has not always gone smoothly. However, the enthusiasm and commitment of the whole team to the success of the initiative, including the Pro Vice-Chancellor, Academic Development, is resulting in significant progress. The PVC recently said that this was the most exciting change initiative that he had been involved with at the University as it was so successful and leading to genuine change.
Conclusion Early in this chapter we posed the question, ‘‘What does it take to improve assessment in a university?’’ and suggested that part of the response lay in a model of the university that would help us to see more clearly those aspects of the university, and the relationships between them, that would require attention if initiatives for change were to be effective. It is possible, in light of the various matters canvassed in this chapter, that the complexities of universities may defy the kind of simple depiction we have presented in Fig. 11.1. However, we do not wish to propose a new model here. We believe that a model which is true to the complexity of higher education institutions still needs to identify the key elements of those institutions and the connections between them as we have done in our model. On the other hand, a useful model must emphasise the role of agents in relation to the elements, including both individuals and units involved in or likely to be affected by change.
11
Changing Assessment in Higher Education
211
Moreover, such a model needs to recognise the complexity of interactions between agents, along with the agents’ essentially self-organising nature. Finally, the model needs to be considered in relation to an ‘overlay’ of change management, or conditions for improvement such as those proposed by Dooley (1997) and Stacey (2003, 2007). In short, while a simple model of higher education systems can be useful in drawing attention to certain aspects of the system, the model as represented in Fig. 11.1 is only a partial representation which needs to be understood in terms of the complexity and change factors considered in this chapter. We noted at the beginning of the chapter that models seek to simplify what they are describing in order to assist thinking and action. Now we see that a model of higher education institutions which might help us consider processes for improving teaching, learning and assessment needs to be more complex than we initially envisaged. Is a simple model of a complex system possible? The case study suggests a model cannot be prescriptive as institutions differ – one size does not fit all. We offer these thoughts in the hope that they will prompt further thinking and discussion, and in the belief that conversations towards developing a more comprehensive model will be as valuable as what may emerge.
An Endnote This chapter reflects a journey by the authors. It began when Ranald participated in a workshop run by Gordon at a conference in Coventry, UK in July 2003. This was followed by lengthy email exchanges which resulted in an article for the Institute for Learning and Teaching (now the UK’s Higher Education Academy) website (Joughin & Macdonald, 2004). The article introduced the model and explored the various elements and relationships, though in little depth and without locating it in any theoretical or conceptual literatures. Two subsequent visits by Ranald to the Hong Kong Institute of Education, where Gordon was then working, allowed us to develop our ideas further and we began to draw on a variety of literatures, not least in the areas of systems and complexity. The ideas for this chapter have developed whilst we have been writing and are likely to continue to change from the point at which we are writing in October 2007. Our discussions and writing in themselves reflect aspects of complexity with ourselves as the agents, self-organising our thoughts leading to emergent ideas for influencing policy and practice in our roles as academic developers. Further, we can identify with other areas of the literature such as communities of practice, appreciative inquiry and the literature on educational change. Perhaps the concern for us at the moment is to develop a highly pragmatic approach to facilitating change, informed by the latest organisational development or strategic management thinking, but not dominated by this. We note with interest that the subtitle of Stacey, Griffin and Shaw’s book applying complexity theory
212
R. Macdonald and G. Joughin
to management (Stacey, Griffin & Shaw, 2000) is ‘‘Fad or Radical Challenge to Systems Thinking?’’ Perhaps it does not matter which it is so long as it aids understanding and leads to effective change leadership and management around important educational issues such as learning and assessment.
References Beer, S., Strachey, C., Stone, R., & Torrance, J. (1999). Model. In A. Bullock & S. Tromble (Eds.), The new Fontana dictionary of modern thought (3rd ed.), (pp. 536–537). London: Harper Collins. Boud, D., & Falchikov, N (Eds.). (2007). Rethinking assessment in higher education. Abingdon, UK: Routledge. Brown, S., & Glasner, A. (Eds.). (1999). Assessment matters in higher education. Buckingham, UK: SRHE/Open University Press. Bryan, C., & Clegg, K. (2006). Innovative assessment in higher education. Abingdon, UK: Routledge. Checkland, P. (1993). Systems thinking, systems practice. Chichester: John Wiley and Sons. Checkland, P. (1999). Soft systems methodology: A 30-year retrospective. Chichester: John Wiley and Sons. Chelimsky, E. (1997). Thoughts for a new evaluation society. Evaluation, 3(1), 97–118. Cooperrider, D. L., Whitney, D., & Stavros, J. M. (2005). Appreciative inquiry handbook. Brunswick, Ohio: Crown Custom Publishing. Dooley, K. (1997). A complex adaptive systems model of organization change. Nonlinear Dynamics, Psychology, and life Sciences, 1(1), 69–97. Eckel, P., Green, M., Hill, B., & Mallon, W. (1999). On change III: Taking charge of change: A primer for colleges and universities. Washington, DC: American Council on Education. Retrieved October 12, 2007, from http://www.acenet.edu/bookstore/pdf/on-change/onchangeIII.pdf Eoyang, G., Yellowthunder, L., & Ward, V. (1998). A complex adaptive systems (CAS) approach to public policy making. Society for Chaos Theory in the Life Sciences. Retrieved October 12, 2007, from http://www.chaos-limited.com/gstuff/SCTPLSPolicy.pdf Fullan, M. (2001). The new meaning of educational change (3rd ed.). London: RoutledgeFalmer. Gladwell, M. (2000). The tipping point. London: Abacus. Heywood, J. (2000). Assessment in higher education: Student learning, teaching, programmes and institutions. London: Jessica Kingsley. Hopkins, D. (2002). The evolution of strategies for educational change – implications for higher education. Retrieved October 10, 2007, from archive at The Higher Education Academy Website: http://www.heacademy.ac.uk Joughin, G., & Macdonald, R. (2004). A model of assessment in higher education institutions. The Higher Education Academy. Retrieved September 11, 2007, from http://www. heacademy.ac.uk/resources/detail/id588_model_of_assessment_in_heis Keeves, J. P. (1994). Longitudinal research methods. In T. Huse´n, and N. T. Postlethwaite, (Eds.), The international encyclopedia of education (2nd edn), (Vol. 6, pp. 3512–3524). Oxford: Pergamon Press. Kezar, A. (2001). Understanding and facilitating organizational change in the 21st century: Recent research and conceptualizations. San Francisco: Jossey-Bass. Knight, P. (Ed.). (1995). Assessment for learning in higher education. London: Kogan Page. Knight, P. (2000). The value of a programme-wide approach to assessment. Assessment and Evaluation in Higher Education, 25(3), 237–251.
11
Changing Assessment in Higher Education
213
Knight, P., & Trowler, P. (2000). Department-level cultures and the improvement of learning and teaching. Studies in Higher Education, 25(1), 69–83. Knight, P., & Yorke, M. (2003). Assessment, learning and employability. Buckingham, UK: SRHE/Open University Press. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, UK: Cambridge University Press. Matthews, K. M., White, M. C., & Ling, R. G. (1999). Why study the complexity sciences in the social sciences? Human Relations, 52(4), 439–462. Meadows, D. (1999). Leverage points: Places to intervene in a system. Hartland: The Sustainability Institute. Retrieved 27 September, 2007, from http://www.sustainabilityinstitute. org/pubs/Leverage_Points.pdf Miller, A.H., Imrie, B.W., & Cox, K. (1998). Student assessment in higher education. London: Kogan Page. Miller, C. M. L., & Parlett, M. (1974). Up to the mark: A study of the examination game. London: SRHE. Nash, J., Plugge, L., & Eurlings, A. (2000). Defining and evaluating CSCL Projects. Unpublished paper, Stanford, CA: Stanford University. Owen, H. (1997). Open space technology: A user’s guide. San Francisco: Berrett-Koehler Publishers. Quality Assurance Agency. (2006a). Code of practice for the assurance of academic quality and standards in higher education, Section 6: Assessment of students. Retrieved October 12, 2007, from http://www.qaa.ac.uk/academicinfrastructure/codeOfPractice/section6/ default.asp Quality Assurance Agency for Higher Education (2006b). Handbook for institutional audit: England and Northern Ireland. Mansfield: The Quality Assurance Agency for Higher Education. Schwarz, P., & Gibbs, G. (Eds.). (2002). Assessment: Case studies, experience and practice from higher education. London: Kogan Page. Seel, R. (2005). Creativity in organisations: An emergent perspective. Retrieved 3 October, 2007, from http://www.new-paradigm.co.uk/creativity-emergent.htm Shaw, P. (2002). Changing conversations in organizations: A complexity approach to change. Abingdon, UK: Routledge. Snyder, B. R. (1971). The hidden curriculum. New York: Knopf. Stacey, R. (2003). Strategic management and organizational dynamics: The challenge of complexity (4th ed.). Harlow, England: Prentice-Hall. Stacey, R. D. (2007). Strategic management and organizational dynamics (5th ed.). Harlow: Pearson Educational Limited. Stacey, R. D., Griffin, D., & Shaw, P. (2000). Complexity and management: Fad or radical challenge to systems thinking? London: Routledge. Tosey, P. (2002). Teaching at the edge of chaos. York: LTSN Generic Centre. Retrieved October 4, 2007, from http://www.heacademy.ac.uk/resources.asp? process=full_record§ion=generic&id=111 Trowler, P., Saunders, M., & Knight, P. (2003) Changing thinking, changing practices. York: Higher Education Academy, available from
[email protected] Wenger, E, McDermott, R., & Snyder, W. M. (2002). Cultivating communities of practice. Boston, Mass: Harvard Business School Press. Yorke, M. (2001) Assessment: A guide for senior managers. York: Higher Education Academy
Chapter 12
Assessment, Learning and Judgement: Emerging Directions Gordon Joughin
Introduction This book began by noting the complexity of assessment of student learning as a field of scholarship and practice and proposing a re-consideration of a range of issues concerning the very meaning of assessment, the nature and process of making professional judgements about the quality of students’ work, the various relationships between assessment and the process of student learning, and the intricacies of changing how assessment is thought of and practised across an institution. The complexities of assessment are indicated by the range of matters addressed in this single volume: foundational empirical research; the role of the work context in determining approaches to assessment; how judgements are made; the limitations of grades as reporting instruments; quality measures for new modes of assessment; students’ experience of assessment cultures; assessment as a source of insight for teaching; the role of plagiarism in subverting learning; and the principles and processes involved in institution-wide enhancement of assessment. Perhaps it is appropriate that the penultimate chapter drew on complexity theory as an aid to understanding higher education institutions in search of improved assessment. Each author has argued for changes in thinking and/or practice within the particular focus of their chapter. However, the chapters’ various foci are not isolated from each other; they are inter-related in building a picture of a coherent culture of assessment appropriate for the first decades of the 21st century. Thus the definition of assessment proposed in the introductory chapter provides a basis for understanding the approaches to assessment discussed in each of the chapters, while its emphasis on assessment as judgement is particularly reinforced by Boud and Sadler. Dochy’s argument for edumetric rather than psychometric standards for judging the quality of assessment provides G. Joughin Centre for Educational Development and Interactive Resources, University of Wollongong, Australia e-mail:
[email protected]
G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, DOI: 10.1007/978-1-4020-8905-3_12, Ó Springer ScienceþBusiness Media B.V. 2009
215
216
G. Joughin
support for the forms of assessment that encourage ability-based learning (Riordan and Loacker) and require students to create rather than simply find answers to assessment tasks (Carroll). And the capacity for self-assessment which underpins Riordan and Loacker’s work at Alverno College is the very same capacity which forms the basis for Yorke’s approach to students’ claims making in relation to their own learning at the completion of their degree. The notion of assessment cultures introduced by Ecclestone highlights the importance of understanding assessment in terms of students’ and teachers’ experience of assessment, an experience which permeates their careers (whether as students or teachers) and which must be accommodated as students encounter the new approaches to assessment argued for in this book. Finally, using assessment results to improve teaching, and devising specific ways of doing this in relation to particular forms of assessment (Suskie), is a principle that can be applied to each of the approaches to assessment argued for elsewhere in the book, not least in relation to those approaches that place students’ developing the capacity to evaluate their own work at the heart of assessment in support of learning. Thus, while the chapters have been conceived independently (though admittedly within the context of the learning-oriented assessment approach noted in the Preface), there is a high degree of coherence across them, so that the book as a whole constitutes a consistent argument for an integrated approach to assessment based on a set of progressions:
from conceptualizing assessment as a process of quasi-measurement, to conceptualizing assessment as a process of informing, and making, judgements;
from judgements based on criteria abstracted from an informed under
standing of quality, to judgements based on an holistic appreciation of quality; from assessments located within the frame of reference of the course, to assessments located in a frame of reference beyond the course in the world of practice; from simple grades as expressions of achievement, to more complex and comprehensive representations of knowledge and abilities; from assessment as the endpoint of learning, to assessment as a starting point for teaching; from assessment as discipline- and course-focused, to assessment focused on generic abilities aligned with discipline knowledge; from standardised testing, to qualitative judgements of competence through practice-based or practice-like tasks; from students as objects of assessment, to students as active subjects participating in assessing their own work; and from a conception of the university as an organisation susceptible to systemic change through managerial interventions, to a conception of the university as a complex organism where change depends on the values, intentions and actions of interdependent actors.
12
Assessment, Learning and Judgement
217
These movements raise a number of challenges, each far from trivial, which could well set the agenda for theorising, researching, and acting to improve assessment for the next decade and beyond.
Conceptual Challenges The conceptual challenges posed here call for the ongoing exploration of central tenets of assessment, subjecting them to critical scrutiny, and extending our understanding of the emerging challenges to orthodoxy noted in this book. Core concepts and issues that demand further scrutiny include the following: The purposes of assessment. While the multiple functions of assessment have long been recognised, confusion continues regarding how these functions can be held in creative tension. Terminology does not help – the very term formative assessment, for example, suggests to many assessment of a certain type, rather than assessment performing a certain function. The term learning-oriented assessment is a step in the right direction, but only if it is seen as a way of looking at assessment, rather than as a type of assessment. A single piece of assessment may be, to varying extents, oriented towards learning, oriented towards informing judgements about achievement, and oriented towards maintaining the standards of a discipline or profession. Of course, according to the definition of assessment proffered in Chapter 2, assessment will always entail judgements about the quality of students’ work, irrespective of the purpose to which these judgements are put. The challenge is to define assessment in its own right, and to use this singular definition in understanding and defining each of assessment’s multiple purposes. The location of assessment. Boud has argued for locating assessment in the context of the world of work which students will inhabit on graduation. This context not only helps to define what kinds of assessment tasks will be appropriate, but also emphasises the need for students to develop the capacity to assess their own work, since this becomes a routine function in the workplace. Once this perspective is adopted, other constructs fall into place, including the use of complex, authentic tasks, the inclusion of generic attributes as integral objects of assessment, and tasks that deter plagiarism and engage students in learning through embedding assessment in current but constantly changing real world contexts. The challenge is to unite the often necessarily abstracted nature of learning in universities with its ultimate purpose – students need not only to learn, but to learn in order to be able to act, and to be able to act in practice-like contexts where they can experience themselves as responsible agents and be aware of the consequences of their actions. Assessment and judgement. Assessment as measurement represents a paradigm that has been strongly articulated over decades. Assessment as judgement, though underpinning what Dochy terms the new modes of assessment which
218
G. Joughin
have been the focus of attention in assessment literature for a least the past twenty years, has received far less attention. Eisner’s work on connoisseurship (Eisner, 1998) and Sadler and Boud’s work described in this book are notable exceptions. The challenge is to continue the work of conceptualizing and articulating assessment as judgement, drawing on both education and cognate disciplines to illuminate our understanding of the nature of professional judgement and how judgements are made in practice. The use of criteria has come to be seen as central to judgement, to the extent that criterion-referenced assessment has become an established orthodoxy in the assessment literature and the policies, if not the practices, of many universities. Indeed, both Suskie and Riordan and Loacker see criteria as central to learning. Sadler has challenged us to reconsider the unthinking application of criteria as a basis of judgement, to review our understanding of quality and its representation, and to legitimate the role of holistic judgements.
Research Challenges Given the centrality of assessment to learning, possibly every aspect of assessment noted in this book could constitute a worthy object of research. Three pivotal aspects of assessment and learning, widely acknowledged as such throughout the higher education literature, certainly demand further scrutiny. Firstly, the role of assessment as a driver of student learning. We cannot assume, from the empirical studies of the 1960s and 1970s, that assessment dominates students’ academic lives in the ways often supposed in much contemporary writing about assessment. There are two reasons for this. The first, noted in Chapter 3, is that the earlier research itself was equivocal: while assessment loomed large in the considerations of many students, this was far from a universal experience and contextual factors needed to be taken into account. The second is the truism that times have changed, and the nature of students, universities, and teaching practices presumably have changed too. There is a need to consider anew the role that assessment plays in students’ academic lives, including, perhaps, replicating (with appropriate variations) those seminal studies reviewed in Chapter 3. Secondly, the role of assessment in influencing students’ approaches to learning. Despite the 30 years that have elapsed since the pioneering work of Marton and Sa¨ljo¨ (see Marton & Sa¨ljo, ¨ 1997), there is no clear indication that forms of assessment per se can induce a deep approach to learning amongst students, nor do we have detailed studies on the ways in which forms of assessment interact with what Ramsden (2003) describes as a student’s overall orientation to study or tendency to adopt a deep or surface approach to learning irrespective of context. If assessment plays a major role in learning, and if deep approaches to learning are essential for high quality learning outcomes, more focused research into the relationship between assessment formats and approaches to learning is needed.
12
Assessment, Learning and Judgement
219
Thirdly, the role of feedback in students’ experience of learning. If feedback is indeed essential to learning, how do we respond to the growing number of studies indicating inadequacies in the quantity, quality and timeliness of feedback and the difficulties, especially in semesterised courses, of incorporating assessment processes that ensure that feedback is actually used by students to improve their work and learning? Has the role of feedback in learning been exaggerated? Or is feedback less dependent on the overt actions of teachers than we have thought? Qualitative studies of students’ experience of feedback in relation to their learning may provide essential insights into this key aspect of learning through assessment.
Practice Challenges While the emphasis of this book has been on emerging understandings of assessment, the implications of these understandings for the practice of assessment are considerable. While some of these implications have been noted by individual authors, many of them cluster around three sets of challenges: the redesign of assessment tasks; realigning the role of students in assessment; and the development of academics’ professional expertise as assessors of their students’ learning.
Re-designing Assessment Tasks Notwithstanding the considerable developments that have occurred in moving towards forms of assessment that support learning in all of the ways considered in this book, much assessment remains dominated by a measurement paradigm principally designed to determine a student’s achievement. Essays, unseen examinations, and multiple-choice tests, for example, continue to be the staple forms of assessment in many disciplines and in many universities, and they continue to be the forms of assessment with which many academics are most familiar. Where this is the case, the arguments presented in this book call for the re-design of assessment, with assessment tasks that incorporate certain requisite qualities and perform certain critical functions in relation to learning. The following are some of the more important of these consequences:
Assessment tasks should incorporate the characteristics of practice, including the contextual and embodied nature of practice, requiring the engagement of the student as a whole person. Assessment tasks need to both develop and reflect students’ generic abilities – for example, to communicate effectively within their discipline, to work collaboratively, and to act in ways that are socially responsible – as well as developing discipline specific knowledge and skills.
220
G. Joughin
Assessment tasks should require responses that students need to create for themselves, and should be designed to avoid responses that can be simply found, whether on the Internet, in the work of past students, or in prescribed texts or readings. Assessment tasks should become the basis of learning rather than its result, not only in the sense of students’ responses informing ongoing teaching as proposed by Suskie in Chapter 8, but perhaps more importantly in Sadler’s sense of assessment as a process whereby students produce and appraise rather than study and learn (Sadler, Chapter 4).
Reassigning Assessment Roles: Placing Students at the Centre of Assessment Central to the argument of this book is the role of students as active agents in the acts of judgement that are at the heart of assessment. This requires recognising that assessment is appropriately a matter that engages students, not just teachers, in acts of appraisal or judgement, and therefore working with students whose conceptions of assessment tend to place all authority in the hands of teachers, in order to reshape their conceptions of assessment. In short, this entails making assessment an object of learning, devoting time to help students understand the nature of assessment, learn about assessment and their role in it, especially in terms of self-monitoring, and learn about assessment in ways that parallel how they go about the substantive content of their discipline.
Professional Development These challenges to practice clearly call for more than a simple change in assessment methods. They require a highly professional approach to assessment in a context where, as Ramsden has argued, ‘‘university teachers frequently assess as amateurs’’ (Ramsden, 2003, p. 177). Developing such an approach places considerable demands on the ongoing formation of university teachers, though fortunately at a time when the professional development of academics as teachers is being given increasing attention in many countries. This formation is undoubtedly a complex process, but it would seem to entail at least the following: providing access to existing expertise in assessment; providing time and appropriate contexts for professional development activities; motivating staff to engage in professional development through recognising and rewarding innovations in assessment; developing banks of exemplary practices; incorporating the expertise of practitioners outside the academy; and, perhaps critically, making assessment a focus of scholarly activity for academics grappling with its challenges.
12
Assessment, Learning and Judgement
221
The Challenge of Change In Chapter 11 we posed the question, ‘‘What does it take to improve assessment across an institution?’’ Clearly, while more professional approaches to assessment by academics may be a sine qua non of such improvement, this is but one factor amongst many, as the arguments presented in that chapter make clear. Reconceptualizing assessment as judgement and reconfiguring the relationship between assessment and learning occurs in the context of universities as complex adaptive systems comprising multifarious agents operating within and across different organisational levels, and with identities constituted both within and outside the university. Making assessment a focus of conversation across the institution and locating this discussion in relation to the concerns of the various agents within the university, including course development committees, policy developers, examination committees, deans and heads of departments, staff and student associations, and those agents outside the university with vested interest in programs, including parents, professional organisations, and politicians, requires an exceptional level of informed and skilled leadership.
References Eisner, E. (1985). The art of educational evaluation: A personal view. London: Falmer Press. Marton, F., & Sa¨ljo, ¨ R. (1997). Approaches to learning. In F. Marton, D. Hounsell, & N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 39–58). Edinburgh: Scottish Academic Press. Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: RoutledgeFalmer.
Author Index
A Adelman, C., 73 Amrein, A. L., 86 Anderson, L. W., 70, 77 Anderson, V. J., 46, 137 Angelo, T., 123 Angelo, T. A., 134, 148, 149 Arenson, K. W., 175 Askham, P., 88, 91, 92 Astin, A. W., 134 Atherton, J., 117 Au, C., 118
B Baartman, L. K. J., 95, 106 Bagnato, S., 98 Bain, J., 88, 90 Bain, J. D., 21, 22 Baker, E., 95 Ball, S. J., 154, 160 Banta, T. W., 139 Barr, R. B., 134 Bastiaens, T. J., 95, 106 Bates, I., 168 Bateson, D., 100 Baume, D., 69 Baxter Magolda, M., 124 Becker, H. S., 17, 18 Beer, S., 194 Bekhradnia, B., 81 Bennet, Y., 98 Berliner, D. C., 86 Biesta, G., 162, 163 Biggs, J. B., 17 Biggs, J., 89, 91, 93, 135, 136 Birenbaum, M., 87, 88, 90, 94, 96 Biswas, R., 71 Black, P., 23, 35, 86, 92, 93, 153, 155, 156, 166
Bloom, B. S., 77 Bloxham, S., 46, 172 Bollag, B., 175 Borich, G. D., 145 Boud, D., 2, 3, 6, 7, 9, 13, 14, 15, 20, 29, 30, 35, 36, 88, 89, 106, 153, 195, 215, 217, 218 Boyer, E., 185 Braddock, R., 46 Branch, W. T., 146 Brandon, J., 70 Bransford, J. D., 186 Brennan, R. L., 99 Bridges, P., 69, 72 Brown, A. L., 121 Brown, E., 24 Brown, G., 1 Brown, S., 88, 106, 195 Brumfield, C., 73 Bruner, J., 117 Bryan, C., 17, 195 Bull, J., 1 Burke, E., 6, 46 Butler, D. L., 92, 93 Butler, J., 15, 35
C Campbell, D. T., 139 Carless, D., 2, 13 Carroll, J., 7, 8, 115, 125, 216 Cascallar, E., 86 Chanock, K., 24, 71 Checkland, P., 199, 200 Chelimsky, E., 208 Chen, M., 99 Chi, M. T. H., 46 Chickering, A. W., 134 Clark, R., 24 Clegg, K., 17, 195
223
224 Coffey, M., 69 Collins, A., 89, 95, 100, 106 Cooperrider, D. L., 207, 208 Costa, A., 146 Coulson, R. L., 88 Cox, K., 195 Cronbach, L. J., 95, 96, 99 Crooks, T., 88, 89, 90, 92, 93 Cross, K. P., 148, 149
D Dalziel, J., 72 Dancer, D., 106 Davey, C., 87 David, M., 154 Davies, J., 154, 161, 163 Davies, M., 70 De Sousa, D. J., 136 Deakin Crick, R., 87 DeMulder, E. K., 86, 87 Derrick, J., 158 Dewey, J., 57, 117 Dierick, S., 88, 89, 95, 100, 106 Dochy, F., 7, 8, 10, 21, 85, 86, 87, 88, 89, 91, 92, 93, 95, 96, 100, 104, 106, 134, 135, 136, 215, 217 Dooley, K., 204 Dreschel, B., 158 Drummond, M. J., 157, 158 Dunbar, S., 95 Dunbar, S. B., 134 Dunbar-Goddett, H., 19 Dunn, L., 46
E Eastcott, D., 106 Ecclestone, K., 9, 153, 154, 156, 157, 158, 161, 163, 164, 166, 168, 216 Echauz, J. R., 71 Eckel, P., 200, 203 Eisner, E., 77, 218 Eison, J., 65 Ekstrom, R. B., 69 Ellen, N., 120 Elmholdt, C., 34, 37 Elton, L., 19, 22 Elton L. R. B., 90 Entwistle, N., 118, 135 Eoyang, G., 202 Eraut, M., 75, 76 Ericsson, K. A., 46
Author Index Eurlings, A., 194 Evans, A. W., 107 Ewell, P. T., 134
F Falchikov, N., 30, 36, 85, 88, 153, 195 Fan, X., 99 Farmer, B., 106 Farr, M. J., 46 Felton, J., 66 Feltovich, P. J., 88 Firestone, W. A., 87 Fowles, M., 100 Franklyn-Stokes, A., 129 Frederiksen, J. R., 89, 95, 100, 106 Frederiksen, N., 88, 90 Freed, J. E., 46, 134, 142 Freeman, R., 46 Fullan, M., 206
G Gagne, R. M., 23 Gamson, Z., 134 Gardner, H., 178 Gardner, J., 153 Gawn, J., 158 Geer, B., 17, 18 Gibbons, M., 76 Gibbs, G., 2, 17, 18, 19, 23, 35, 88, 129, 195 Gielen, S., 21, 87, 88, 89, 100, 106, 108 Gijbels, D., 87, 88 Gijselaers, W., 106 Gladwell, M., 198 Glaser, R., 46, 62 Glasner, A., 195 Gleser, G. C., 99 Glover, C., 24 Green, M., 200 Griffin, D., 211, 212 Gronlund, N. E., 145 Gulliksen H., 86
H Haertel, E. H., 95, 99, 100 Hager, P., 15, 35 Haggis, T., 22 Haladyna, T. M., 145 Handa, N., 118 Hargreaves, E., 156 Harlen, W., 87
Author Index Hartley, P., 24 Haswell, R., 137 Haug, G., 80 Hawe, E., 69, 70 Hayes, N., 118 Heller, J. I., 98 Henscheid, J. M., 135 Heywood, J., 195 Higgins, R., 24 Hill, B., 200 Hopkins, D., 206, 207, 208 Hornby, W., 69 Hounsell, D., 23 Hounsell, J., 23 Howard, R. M., 121 Huba, M. E., 46, 134 Hughes, E. C., 17, 18 Hyland, P., 24
I Imrie, B. W., 195 Introna, L., 118 Ivanic, R., 24
J James, D., 162, 163 Janssens, S., 21, 87 Jenkins, A., 70 Jessup, G., 66, 155 Johnson, E. G., 99 Jones, D. P., 134 Joughin, G., 1, 2, 5, 9, 10, 13, 19, 22, 193, 211, 215
K Kallick, B., 146, 151 Kamvounias, P., 106 Kane, M., 95, 96 Karran, T., 80 Keeves, J. P., 194 Kezar, A., 202 Kirschner, P. A., 95, 106 Knight, P., 3, 7, 15, 39, 72, 75, 76, 77, 78, 79, 80, 194, 195, 196, 199 Kolb, D., 185 Koper, P. T., 66 Kramer, K., 158 Krathwohl, D. R., 77 Kubiszyn, T., 145 Kuh, G., 134
225 Kuh, G. D., 134 Kuin, L., 118 Kvale, S., 34
L Lambert, K., 120 Langan, A. M., 107 Laurillard, D., 19, 20, 22, 23, 90 Lave, J., 34, 206 Law, S., 19 Lens, W., 93 Leonard, M., 87 Levi, A. J., 46 Levine, A., 175 Lewis, R., 46 Light, R., 134 Lindblad, J. H., 137 Ling, R. G., 202 Linn, R., 95, 99, 100, 106, 134 Litjens, J., 23 Liu, N-F., 2, 13 Livingston, S. A., 140 Lloyd-Jones, R., 46 Loacker, G., 9, 175, 187, 216, 218 Logan, C. R., 98
M Macdonald, R., 9, 10, 125, 193, 211 Macfarlane, R., 116 MacFarlane-Dick, D., 23, 35 MacGregor, J., 137 Macrae, S., 154, 160 Mager, R. F., 69 Maguire, M., 154, 160 Mallon, W., 200 Malott, R. W., 134 Marshall, B., 157, 158 Marshall, C., 147 Martens, R., 91, 92 Marton, F., 19, 21, 90, 218 Maslen, G., 119 Matthews, K. M., 202 Mayford, C. M., 98 Mayrowitz, D., 87 McCabe, D., 120 McCune, V., 23 McDermott, R., 206 McDowell, L., 22, 88, 91, 93, 107 McKeachie, W. J., 134 McKenna, C., 107 McNair, S., 155
226 McTighe, J., 178 Meadows, D., 198 Meehl, P. E., 46 Mentkowski, M., 134, 175, 187 Merry, S., 46, 106 Messick, S., 8, 95, 96, 97, 101, 102, 106, 134 Meyer, G., 20 Miller, A. H., 195 Miller, C. M. L., 17, 18, 195 Milton, O., 65 Moerkerke G., 88, 91, 94, 106 Moon, J., 146 Moran, D. J., 134 Morgan, C., 46 Muijtjens, A., 100
N Nanda, H., 99 Nash, J., 194 Neisworth J. T., 98 Nevo, D., 89 Newstead, S. E., 129 Nicol, D. J., 23, 35 Nightingale, P., 19 Nijhuis, J., 106 Norton, L. S., 106, 129
O O’Donovan, B., 117 O’Donovan, R., 46 O’Neil, M., 19 O’Reilly, M., 46 Oliver, M., 107 Orsmond, P., 46, 106 Ovando, M. N., 136 Owen, H., 206
P Palmer, P. J., 134 Paranjape, A., 146 Park, C., 122 Parlett, M., 17, 18, 195 Parry, S., 46 Pascarella, E. T., 134 Patton, M. Q., 147 Pecorari, D., 121 Pendlebury, M., 1 Pepper, D., 70 Perry, W., 8, 123, 124, 186 Piaget, J., 117
Author Index Pike, G. R., 139 Plugge, L., 194 Polanyi, M., 53, 56 Pollio, H. R., 65 Pond, K., 107 Pope, N., 107 Power, C., 118 Powers, D., 100 Prenzel, M., 158, 161, 165 Price, M., 46, 117 Prosser, M., 72, 88, 90
R Rajaratnam, N., 99 Ramsden, P., 17, 22, 23, 218, 220 Reay, D., 154 Reiling, K., 46, 106 Rigsby, L. C., 86, 87 Rimmershaw, R., 24 Riordan, T., 9, 175, 187, 216, 218 Robinson, V., 118 Rogers, G., 187 Romer, R., 134 Rossman, G. B., 147 Roth, J., 187 Rothblatt, S., 73 Rowley, G. L., 99 Rowntree, D., 14, 17 Royce Sadler, D., 45, 156 Rust, C., 19, 46, 117
S Sa¨ljo¨, R., 19, 21, 90, 218 Sadler, D. R., 2, 3, 6, 7, 9, 16, 23, 35, 47, 48, 49, 50, 52, 53, 59, 69, 72, 80, 81, 135, 141, 156 Sambell, K., 22, 88, 91, 104, 107 Saunders, M., 194 Schatzki, T. R., 31 Schoer, L., 46 Schuh, J. H., 134 Schwandt, T., 31, 32 Schwarz, P., 195 Scouller, K., 21, 88, 90 Scouller, K. M., 90 Seel, R., 203, 204 Segers, R., 86, 87, 88, 93, 106, 107 Shavelson, R. J., 88, 99 Shaw, P., 206, 211 Sheingold, K., 98 Shepard, L., 86, 100
Author Index Silvey, G., 21 Simpson, C., 2, 19, 23, 127 Skelton, A., 27 Sluijsmans, D., 93, 106 Smith, J., 46 Snyder, B. R., 17, 18, 195 Snyder, W. M., 206 Spiro, R. J., 88 Stacey, R., 201, 203, 204, 206, 211, 212 Stanley, J. C., 139 Starren, H., 89 Stavros, J. M., 207, 208 Stevens, D. D., 46 Stone, R., 194 Strachey, C., 194 Struyf, E., 93 Struyven, K., 21, 87 Suen, H. K., 98 Suskie, L., 7, 8, 9, 46, 133, 134, 135, 138, 145, 216, 218, 220 Szabo, A., 120
T Tagg, J., 134 Tan, C. M., 88, 90, 93 Tan, K. H. K., 72 Tang, K. C. C., 21 Tanggaard, L., 34, 37 Taylor, L., 120 Terenzini, P. T., 134 Terry, P. W., 20 Thomas, P., 21, 22, 88, 90 Thomson, K., 88 Tilley, A., 129 Tinto, V., 137 Topping, K., 88, 104 Torrance, H., 154, 157, 163, 167, 169, 171 Torrance, J., 194 Tosey, P., 203 Trigwell, K., 88 Trowler, P., 194, 196, 199 Twohey, M., 175
227 U Ui-Haq, R., 107 Underwood, J., 120
V Vachtsevanos, G. J., 71 Van de Watering, G., 100 Van der Vleuten, C. P. M., 95, 106 Vandenberghe, R., 93 Vermunt, J. D. H. M., 91 Villegas, A. M., 69 Vygotsky, L., 117
W Wade, W., 107 Walvoord, B., 46, 70, 137 Ward, V., 202 Webb, N. M., 99 Webster, F., 70, 71, 72 Wenger, E., 34, 206 West, A., 46, 172 White, M. C., 202 Whitney, D., 207, 208 Whitt, E. J., 134 Wiggins, G., 178 Wiliam, D., 23, 35, 86, 92, 153, 155, 156 Willard, A., 100 Winne, P. H., 92 Woolf, H., 46, 69, 72
Y Yellowthunder, L., 202 Yorke, M., 7, 9, 65, 68, 74, 75, 80, 196, 197, 216
Z Zieky, M. J., 140
Subject Index
A Abilities, 7, 47, 105, 163, 175, 177, 180, 183, 216, 219 Ability-based curriculum, 9, 175, 179, 189 Ability-based Learning Outcomes: Teaching and Assessment at Alverno College, 187 Aesthetic engagement, 177, 183 Agents, 36, 41, 201–202, 210 Alverno College, 4, 9, 175, 181, 187, 189, 216 Amotivated learners, 158 Analysis, 9, 71, 73, 79, 95, 144, 145, 163, 167, 172, 177 Analytic, 6, 7, 45, 46, 48, 50, 51, 52, 53, 54, 55, 56, 57, 62, 69, 160, 178, 183 Analytic rating scales, 51 Analytic rubrics, 51, 62 Appraising, 16, 45, 48, 54, 58 Appreciative inquiry, 207, 208, 211 Apprenticeship, 37–38 Approaches to learning, 13, 16, 19–22, 24, 158, 218 Assessment at Alverno College, 175, 187 Assessment-driven instruction, 90 Assessment Experience Questionnaire, 19 Assessment for learning, 34–37, 85, 91–93, 107, 155, 157, 209 Assessment of learning, 85, 155, 188 Assessment for practice, 38–40 Assessment Reform Group, 153, 156 Authentic assessment, 6, 8, 21, 33, 34, 40, 47, 54, 58, 61, 87, 91, 94, 98, 99, 100, 101, 105, 106, 107, 108, 109, 127, 129, 161, 217 Authenticity, 8, 40, 100, 101, 108, 109 Authentic practice, 33 Autonomy, 153, 156, 158, 161, 168, 170, 173, 184, 207
B Backwash, 89, 91, 93 Bias, 90, 100, 105, 107 Bologna Declaration, 191
C Carnegie Foundation for the Advancement of Teaching and Learning, 185 CATWOE, 200 Change, 1, 4, 74, 204–210, 221 Cheating, 107, 121, 122–123 Citation practices, 121 Claims-making, 80–82 Classification, 67, 78 Classroom assessment, 35, 86 Clinical decision making, 46 Co-assessment, 85, 88 Co-construction of knowledge, 32 Code of Practice on Assessment, 198 Cognitive complexity, 8, 99, 100, 101, 108, 109 Cognitive development, 123, 124 Committee on the Foundations of Assessment, 16 Communication, 76, 104, 139, 142–143, 177, 204 Community of judgement, 30, 39 Community of practice, 30, 34, 38 Competency grades, 37, 102 Complex adaptive systems, 4, 10, 201, 204–205, 208, 221 Complexity theory, 201, 202, 204, 208, 211, 215 Conditions of learning, 23 Conference on College Composition and Communication, 147 Connoisseurship, 57, 77, 218 Consequential validity, 88, 91, 93, 101, 102, 107, 108, 109, 134
229
230
Subject Index
Constructivist learning theories, 117 Construct validity, 95–97, 102, 104, 105 Content validity, 95, 98, 101, 144 Conversation, 10, 23, 179, 180, 189, 205–211, 221 Conversational framework, 23 Co-production of practice, 32 Course work, 155 Criteria, 3, 5, 6, 7, 8, 10, 14, 23, 40, 41, 45, 46, 47, 49–62, 69, 70, 71, 72, 76, 77, 85–88, 95, 96, 97, 99–101, 103, 104, 105, 107, 108, 109, 119, 122, 128, 139, 141, 148, 155, 156, 159, 165, 167, 168, 169, 171, 172, 179–181, 184, 185, 216, 218 Criterion-based assessment, 46 Critical reasoning, 47 Critical thinking, 100, 117 Cue-conscious, 17 Cue-deaf, 17 Cue- seeking, 17 Curriculum, 7, 9, 17, 18, 66, 68, 72, 74, 85, 87, 133, 137, 138, 142, 149, 175, 177, 178, 179, 180, 181, 182, 183, 184, 187, 188, 189, 191, 196, 197
Experiences of assessment, 4, 14, 23, 91 External examinations, 34, 155 External motives, 159
D Dearing Report, 74, 80 Deep approach, 16, 19, 21, 22, 218 Definition of assessment, 6, 13–16, 215, 217 Diagnostic assessment, 155 Directness, 8, 99, 100, 101, 104, 108 Disciplines as Frameworks for Student Learning: Teaching the Practice of the Disciplines, 187 Discrimination, 145, 146
G Generalisabilty, 77, 96, 97, 99, 102, 103, 105–106, 109 Generic abilities, 7, 216, 219 Ghost writers, 120 Global grading, 46 Global perspective, 177 Grade inflation, 69, 73–74 Grade point perspective, 18 Grades, 3, 65, 72–73, 74, 77, 78, 82, 125, 149, 159 Grading, 6, 7, 14, 35, 39, 42, 45–62, 65–82, 138, 139 Graduateness, 75, 81
E Edumetric, 85–109 Effective citizenship, 177 Emergence, 201, 203–204 Emotion, 13, 14, 24, 33, 39, 76 Enhancing Student Employability Co-ordination Team, 75 Evaluative expertise, 49, 58, 60 Examination, 2, 14, 17, 20, 29, 30, 34, 68, 72, 81, 90, 98, 118, 128, 129, 138, 139, 140, 144, 145, 155, 157, 219, 221 Expectations, 9, 29, 48, 69, 70, 71, 81, 104, 107, 125, 134, 136, 149, 150, 153, 154, 163, 165, 169, 170, 171, 172, 180, 185, 186, 197
F Fairness, 4, 5, 8, 94, 99, 100, 101, 105, 106, 108, 109, 141, 142, 143 Feedback, 2, 6, 10, 13, 14, 22–24, 34, 35, 36, 38, 45, 48, 49, 51, 52, 58, 59, 60, 61, 63, 71, 85, 86, 89, 92, 93, 104, 106, 107, 108, 123, 127, 133, 136, 137, 141, 144, 155, 156, 157, 159, 161, 166, 169, 170, 179–181, 182, 195, 198, 202, 204, 208, 209, 210, 219 Feedback loop, 23, 202, 204 Feed-forward, 23, 24, 89 First International Conference on Enhancing Teaching and Learning Through Assessment, 191 Formative assessment, 6, 9, 25, 35, 88, 90, 91, 92, 93, 107, 108, 127, 153–173, 217 Fuzzy set theory, 71
H Hard systems, 199–200 The Hidden Curriculum, 17, 18 Higher Education Academy, 75, 211 Higher Education Funding Council for England, 75 Higher Education Quality Council, 71, 75, 81 High stakes assessment, 4, 8, 86, 87 Holistic assessment, 3, 5, 6, 7, 8, 31, 40, 45–62, 69, 138, 157, 216, 218
Subject Index I Improving Formative Assessment, 157 Inference, 16, 56, 102, 103 Institutional impacts, 194–195 Integrity, 50, 62, 115, 116, 119, 121, 123, 124, 210 Internet Plagiarism Advisory Service, 115 Inter-rater reliability, 98, 99 Intrinsic motivation, 90, 158, 160, 165, 170 Introjected motivation, 159, 161, 168
J Judgement/judgment, 1, 3–4, 5, 6, 7, 8, 10, 13–25, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 76, 77, 78, 79, 81, 82, 93, 96, 97, 98, 100, 105, 119, 215–221
L Latent criteria, 58 Learning culture, 9, 161–173, 187 Learning to learn, 6, 153 Learning-oriented assessment, 216, 217 Learning outcomes, 3, 7, 9, 29, 30, 35, 37, 66, 76, 125, 133, 137, 138, 147, 149, 150, 159, 172, 175, 176, 177, 180, 182, 183, 184, 187, 188, 189, 190, 191, 193, 197, 218 Learning styles, 135, 185 Learning that Lasts, 187 Loose-coupling, 200
M Making the Grade, 17, 18 Manifest criteria, 58 Marking, 14, 34, 46, 48, 68, 69, 70, 77, 107, 137, 169 Massachusetts Institute of Technology, 18 Measurement, 3, 4, 5, 7, 15, 29, 35, 36, 48, 75, 77, 86, 94, 95, 96, 98, 99, 103, 139, 216, 217, 219 Measurement model, 15, 35 Memorising, 91, 107 Menu marking, 69 Metacognition, 49, 135, 146 Minimal marking method, 137 Misconduct, 122 Models, 15, 31, 32, 34, 35, 46, 194, 201, 211 Monitor, 45, 48, 49, 56, 127 Motivation, 8, 9, 49, 74, 86, 87, 90, 93, 104, 106, 107, 125, 153, 155, 158–161, 163, 165–166, 168–170, 171, 172, 173, 180 Multi-criterion judgements, 57
231 Multidisciplinary, 32, 96 Multiple choice, 2, 20, 21, 22, 90, 94, 134, 141, 144–146, 147, 150, 219 Multiple choice tests/Multiple-choice tests, 94, 134, 141, 144, 145, 147, 150, 219
N National Committee of Inquiry into Higher Education, 74, 80 New modes of assessment, 5, 7, 8, 85–109, 215, 217 Norm-referenced (assessment), 7, 34, 41, 155
O Objectives, 3, 49, 69, 70, 134, 144, 146, 156, 200, 205 Objective test, 22 Open book examination, 22 Oral assessment, 22 Oral presentation, 133 Outcomes, 3, 4, 8, 14, 21, 37, 39, 40, 41, 42, 53, 60, 66, 68, 70, 78, 90, 96, 133, 136, 138, 153, 155, 156, 159, 169, 171, 188, 189, 190, 199, 203, 205, 209
P Patch writing, 121 Pedagogical validity, 134 Pedagogy, 45, 61, 68, 138, 149, 166 Peer appraisal, 50, 60 Peer-assessment, 46, 61 Peer review, 127 Perceptions of assessment, 14 Performance, 2, 3, 7, 9, 10, 14, 23, 32, 34, 35, 39, 41, 47, 49, 51, 61, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 79, 85, 87, 94, 97, 100, 101, 102, 103, 104, 106, 107, 108, 109, 117, 138, 141, 147, 156, 158, 178, 179, 180, 181, 185, 188, 189, 190, 191 Performance assessment, 10, 97, 102, 109 Personal development planning, 80 Plagiarism, 5, 7, 8, 115–130, 198, 199, 215, 217 Portfolio, 21, 80, 81, 85, 88, 153, 181 Post-assessment effects, 89–91 Post-compulsory education, 154–160, 171, 206 Practice, 1, 2, 3, 4, 5, 6, 8, 9, 10, 13, 15, 22–24, 29–42, 46, 50, 51, 52, 56, 58, 59, 60, 65, 66, 68, 73, 76, 79, 86, 87, 88, 92, 120, 122, 123, 126, 129, 133–150, 155, 156, 161, 163, 166, 170, 175–191, 193, 198, 199, 200, 206, 208, 215, 216, 218, 219–220
232 Pre-assessment effects, 89–91 Primary trait analysis, 46 Problem-solving, 76, 104, 180 Professional judgement, 1, 76, 77, 218 Professional practice, 32, 40 Purpose (of assessment), 3, 6, 14, 16, 137, 141
Q Quality assurance, 65, 133, 149, 197, 198, 204, 208 Quality Assurance Agency, 198, 208 Quasi-measurement, 1, 77, 216
R Referencing, 54, 59, 68–69, 80, 121, 123, 125, 126, 169 Reflection, 46, 80, 87, 95, 96, 97, 99, 101, 105, 106, 135, 142, 145, 147, 149, 175, 176, 189 Reflective writing, 134, 141, 146–149 Reflexivity, 36–37 Regulation, 23, 36, 37, 82 Relativistic students, 124 Reliability, 4, 5, 8, 34, 46, 65, 86, 94, 95, 97–99, 101, 103, 106, 108, 109, 145 Reproduction, 90, 94, 117, 118 Research, 1, 4, 6, 9, 10, 17, 18, 19, 20, 22, 23, 24, 29, 35, 46, 52, 53, 71, 86, 87, 88, 90, 92, 93, 96, 100, 107, 118, 124, 134, 135, 136, 139, 142, 145, 147, 153, 156, 157, 158, 176, 178, 181, 184, 186, 187, 193, 194, 195, 196, 197, 200, 202, 206, 207, 210, 215, 218–219 Rubrics, 46, 51, 52, 54, 60, 62, 134, 137, 141–144, 147, 150 Rules, 45, 51, 53, 54, 57, 62, 72, 73, 76, 116, 122, 123, 124, 157, 201, 202, 203
S Scholarship, 15, 176, 184, 185, 186, 189, 215 Scholarship Reconsidered, 185 Scholarship of teaching, 185 Selection, 29, 34, 49, 54, 79, 94, 117, 155, 171 Self-assessment, 10, 15, 23, 106, 156, 163, 166, 170, 216 Self Assessment at Alverno College: Student, Program, and Institutional, 187 Self-monitoring, 48, 49, 60, 61, 62, 80, 220
Subject Index Self-regulation, 23, 36, 82 Situated action, 32 Social constructivism, 117 Social interaction, 108, 117, 177, 183, 206 Soft systems, 199–201 Standardised testing, 94, 216 Standards, 1, 3, 4, 5, 6, 9, 34, 35, 39, 41, 51, 52, 55, 68, 69, 71, 72, 74, 86, 87, 89, 90, 94, 95, 96, 97, 98, 100, 105, 109, 136, 140, 144, 145, 155, 189, 215, 216, 217 Standards-based (assessment), 34 Standards for Educational and Psychological Testing, 4, 96 Student Assessment and Classification Working Group, 67 Substantial validity, 101 Summative assessment, 34, 36, 77, 78, 79, 80, 81, 89–91, 92, 93, 107, 154, 155 Summative testing, 86, 107 Surface approach, 19, 21, 22, 218 Sustainable learning, 9, 153–173 Systemic validity, 89 Systems approaches, 199, 208
T Teaching-learning-assessment cycle, 133, 134 Temporal reliability, 46 Test, 2, 20, 21, 22, 37, 38, 90, 94, 95, 96, 97, 98, 99, 103, 106, 118, 140, 144, 145, 147, 150, 164, 178, 179, 180, 197, 201 Test bias, 90 Test blueprint, 144, 145 Test-driven instruction, 90 Transformation, 22, 122, 191, 200 Transmission, 156, 158 Transparency, 8, 51, 99, 100, 101, 104, 108, 109, 171, 172
U Understanding, 1, 4, 5, 6, 9, 10, 14, 19, 23, 39, 48, 71, 80, 103, 117, 118, 122, 123, 125, 129, 135, 138, 141, 146, 153, 154–163, 165, 166, 167, 170, 177, 178, 179, 182, 187, 191, 193, 194, 199, 200, 201, 202, 205, 207, 208, 215, 216, 217, 218, 219 Unseen exam, 22, 219 Up to the Mark, 17
Subject Index V Validity, 4, 5, 8, 45, 51, 53, 54, 55, 65, 66, 72, 73, 78, 86, 88, 89, 91, 93, 94, 95–97, 98, 99, 101, 102, 104, 105, 106, 107, 108, 109, 134, 144, 171 Valuing in decision-making, 177, 183 Vocational, 153–173 Vocational Qualifications, 66, 154, 155, 167
233 W Warranting, 77, 78, 79 Watchful anticipation, 203 Wellesley College, 18 Who Needs Harvard?, 175 Wicked competencies, 3, 7 Work-based assessment, 74, 75, 76, 154 Write Now, 116, 121 Written assignment, 22, 24, 47, 198