Ctl Number 7

Uploaded by: Amy Adams
0
0

June 2020
PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Ctl Number 7 as PDF for free.

More details

Words: 2,806
Pages: 4

Preview
Full text

CTL Number 7

September 1990

Writing and Grading Essay Questions You’ve got to give the students room to analyze, to synthesize, to show both sides of an issue, or to develop an idea, and the essay question is the only way that I know to do that. English professor

Many teachers consider essay questions the ideal form of testing since essays seem to require more effort from the student than other types of questions. Students cannot answer an essay question correctly by simply recognizing the correct answer, nor can they study for an essay exam by memorizing factual material. Essay questions can test complex thought processes, critical thinking, and problem solving, and essays require students to use the English language to communicate in sentences and paragraphs — a skill that undergraduates need to exercise more frequently. In the field of testing and measurement, essay questions are categorized as "supply" items (questions for which students must develop the answers themselves) to distinguish them from "select" items (in which students choose a response from a menu). The cognitive capabilities required to answer supply items are different from those required by select items, irrespective of content. Since short-answer and identification questions are also supply items, they can be serviceable alternatives to multiple-choice questions — they also measure very specific elements of learning without taking much time to score. Indeed, a set of short essay questions may be more appropriate for some testing situations than the traditional lengthy essay.

There are some potential drawbacks to using essay tests. For example, research studies have shown that scoring of essays is often unreliable (results cannot be duplicated in separate trials); scores not only vary across different graders, they vary with the individual grader at different times. Scorers can be influenced by extraneous factors such as handwriting, color of ink, and word spacing. If the scorer knows the identity of the student (a poor grading practice), his/her overall impressions of that student’s work will unavoidably influence the scoring of the test. Canny students sometimes play on these weaknesses by learning to disguise ignorance with a cloak of flashy verbiage. Finally, essay exams place limitations on the amount of material that can be sampled in the test, a fact that may cause a student to complain (sometimes legitimately) that “I knew a lot more about the subject than the test measured,” or “Your test didn’t reflect the material we covered.” Following the simple guidelines suggested below, one can avoid many of the drawbacks associated with essay tests.

Validity The most important characteristic of any test is its content validity, which means how well it samples the range of knowledge, skills, and abilities that students were supposed to acquire in the period covered by the exam. Single-item essay tests rarely meet this criterion unless they are broken down into a number of subcomponents, in effect becoming a set of short essays. In many cases it is preferable to use a number of short essay questions to insure that the material has been sampled adequately. The principle of content validity also includes the element of suitability — how well a test measures what it is supposed to measure. Essay questions are best suited for testing the upper levels of cognition (analysis,

Center for Teaching and Learning •University of North Carolina at Chapel Hill

synthesis, evaluation), but these traits are unstable and often difficult to define. For example, is “critical thinking” the ability to construct a reasoned argument from evidence, to select the best course of action in a novel situation, to analyze weaknesses in competing arguments, or some combination of all of these things? If you wish to evaluate whether students have developed critical thinking skills in a course, the meaning of that phrase must be clearly defined, and your course objectives and essay test items should reflect the definition you have chosen. Problem-solving skills can also be tested through essay items, but the format and method for solving problems must be specified by the teacher and clearly communicated to the student. Essay questions are often used in courses in which the development of writing skills is an important objective. But, again, one should stipulate the kinds of writing skills that students must demonstrate and provide some test time for thinking and for organizing the answer (otherwise, the combined effects of time pressure and test anxiety will usually result in poor writing). Of course, students should have ample opportunities to practice these skills before they have to demonstrate them on an exam. It is helpful to distinguish between essay questions that require objectively verifiable answers (that is, those that can be agreed upon by independent evaluators), and those that ask students to express their attitudes, opinions, or creativity. The latter are much more difficult to construct and evaluate than the former, since grading criteria are harder to specify, and they tend therefore to be less valid measures of learning. Most authorities advise against using the latter type as test questions and suggest instead that testing for creativity is more appropriately accomplished through out-of-class writing assignments that can be graded holistically. (Holistic scoring is a system in which the grader evaluates the entire essay as a unit of expression rather than as a set of isolated skills.) One exception to this caveat is the literature class, in which an instructor may wish to test students’ interpretive abilities as well as objectively verifiable information. In this case, students should be reminded that they will be judged on how well they support their creative ideas with evidence from the texts in question. Another threat to validity is the practice of allowing students to choose which essay questions they wish to answer (e.g. “choose two out of five”). It is virtually impossible to compose five equivalent essay questions, and students will usually choose the weaker questions, thereby reducing the validity of the exam. Some teachers follow this practice because students have complained that their exams are too difficult. The element of choice does serve as a safety valve to divert student anger, but if their complaints are well-founded, the teacher would be wise to seek help in composing better questions rather than risk creating invalid exams.

Reliability Test reliability is the degree to which a test discriminates between students of differing performance levels and the consistency with which the tests are graded. Essay tests often have relatively low reliability because grading criteria can be difficult to write and many teachers don’t realize that it is not only necessary to compose a model answer, but to provide students with instructions that will elicit the desired answer. The way questions are written often invites a wide variety of responses, only a few of which may reflect the criteria that the teacher intended. When teachers maintain that they must read all the essays through before they can decide on the “best” answer, it is a sure sign that their tests lack specific grading criteria and clear instructions. To take an extreme example, the teacher who asks the question “Describe the origins of World War I” might expect an essay that reviews the roles of the great European powers in the political events from 1870 to 1914 and how each event contributed to the situation that led to war. However, given the sparse instructions and ambiguity of wording, one student might well respond with a survey of geopolitical movements — nationalism, imperialism, communism — while another student might consider only the diplomatic crises in the period 1905 to 1914, and yet another might focus solely on the events of August 1914. All of these answers could be correct, and, if written well, all could receive top marks. To improve the reliability and validity of this question, the teacher would at least need to specify the period of time (1870-1914), the area of analysis (politics, social movements, or economics), and the countries of interest (Great Britain, Germany, Austria, France, Russia). Structural advice, such as “Your essay should have five parts ...” will help students focus their answers and make scoring easier. In addition, the teacher should specify the amount of time the student should spend on the question (or its parts) and the number of points assigned to the question (or its parts). Here is an example from a mid-term in Anthropology: Lectures covering Piltdown Man, Gradualism, Punctuated Equilibrium, and Catastrophism were given sequentially to illustrate the interplay of theory and fact in the formulation of an Anthropological account of the evolution of Humankind. Write a three-part essay addressing the following questions: I. Name the major proponents of the above underlined concepts and briefly describe the significance of these people for the history of a science of evolution. (10 minutes, 10 points) II. Select any two of the four concepts above and explain how they illustrate the relationship between fact and theory. (10 minutes, 10 points)

III. In your opinion, are new discoveries or theories really new or are they just repetitions of past ideas that have fallen out of favor? Your answer to part III must draw upon the four concepts underlined above and be consistent with what you have already written in parts I and II. (20 minutes, 20 points) This question not only exemplifies the guidelines for increasing the reliability of essay questions, it also illustrates three levels of cognitive complexity. Part I is primarily a recall/comprehension question, Part II is application/analysis, and Part III is synthesis/evaluation.

post facto, since grades tend to lose their meaning if the system is altered to compensate for poor testing practices.

I make a key for each question that lists the main points that should be in the answer. I read through several tests to check the key, perhaps to add some points mentioned by students, or drop one if no one included a certain point. Education professor

Grading Good grading practices can also increase the reliability of essay tests. In the first place, all tests should be graded anonymously to counteract the “halo effect” of a student’s prior performance. Some teachers require students to write their social security numbers (or some other code) on test papers rather than signing their names, to eliminate accidental identifications during the grading process.

Blind-grading exams lets the students know that I am interested only in the quality of their work. History professor

It is also a good idea to grade each essay question separately rather than grading a student’s entire test at once. A brilliant performance on the first question may overshadow weaker answers later on (or vice-versa), and it is easier for the grader to keep in mind one answer key at a time. Shuffling the papers after grading each question will help compensate for the tendency to give later papers lower scores as the grader grows tired and increasingly bored. Unless elements of grammar, syntax, spelling, and punctuation are being evaluated as part of the examination, the grader should try to overlook flaws in these elements of composition. In this case, accuracy and completeness should be the only criteria against which the answers are judged. As a matter of practice, quickly skimming several essays before beginning the formal process of grading will help determine whether or not the model answer needs to be modified. If, through some quirk in wording, students misinterpret your intent, or if your standards are unrealistically high (or low), you should alter the model answer in light of this information. This procedure is preferable to altering the grading scheme ex

It is important to write comments on the test papers as you grade them, but comments do not have to be extensive in order to be effective (especially if you provide a model answer). The grader should point out specific elements of the answer that were omitted or incorrect, and the number of points lost as a result. Penalties can be assessed for incorrect statements, the omission of relevant material, the inclusion of irrelevant material, or errors in logic that lead to unsound conclusions. Students have a right to know the reasons for the grades they receive

Grading with TAs in Large Classes In large sections, the course professor must often share the grading responsibility with one or more TAs or grading assistants. In this situation, it is critical that the professor follow the guidelines for test construction and grading described above because problems with essay tests are magnified when more than one grader is involved. On the other hand, multiple graders can increase the validity and reliability of essays if they share in the development of the questions and follow appropriate grading procedures. The course professor should meet with the TAs to discuss the intent of each essay question, where it fits in the course, how well it samples the material, and the criteria for grading it. TAs can compose model answers that can also be discussed and refined before the exam. It is not a good idea for TAs to grade the papers of their own discussion sections, at least not exclusively, because the temptation to reward (or punish) their own students is very great. Quality control can also be increased by requiring each TA to provide a sample of an “A” essay and an “F” essay (or its equivalent) for the professor to re-check. Having two TAs grade each exam and negotiate differences in their assessments is an even better practice, since the more experienced TAs will teach the less experienced ones, but the time required for this exercise may make it impractical in most contexts.

Another approach to the problem is to require the TAs to grade papers together, in the same room, and compare their grades for “A” essays and “F” essays so they can come to a consensus on the criteria. This method may accomplish the same objective as the double grading method without the same time expenditure. It is advisable for the course professor to start the grading sessions and be present for a time to provide clarification of the grading criteria.

The TAs and I create the questions together. That way we all understand the questions as they are written and we talk about what answers we are expecting, what issues we expect students will raise, and what constitutes a good answer. History professor

Using Tests in Instruction Always provide a model answer when returning essays and, when possible, provide time to discuss the questions in class. Students are usually anxious to find out how well they performed and their motivation and attention levels are quite high, so the instructor can use this opportunity to correct errors in their learning and to reinforce important points. Some teachers use essay questions as teaching tools throughout the course by making them the focus of class discussions. Students are given the questions prior to the day of the discussion so they can prepare answers. The class discussion is an exercise in exploring the ways the questions can be answered. Students thereby have an opportunity to practice their thinking skills and also become familiar with the type of questions favored by the teacher. Teachers who use this method report that it not only improves student performance on essay exams, but it also raises the quality of class discussions.

Checklist for Writing and Grading Essay Exams • Are essays the appropriate means to test the material you have covered? • Have you been using essay-type questions throughout the semester as means of generating discussion in class? • If there is a choice of questions, are they truly equivalent ? Would it be better to have several short essays? • What are your specific grading criteria? Have you made these criteria clear in the instructions? • Are students expected to show a mastery of critical thinking? If so, how do you define that term? Have you made this clear to your students? • Have you provided for anonymous grading? • Do you have a model answer against which you can judge student responses? • If there are several TAs, are the grading criteria clear to all involved? Are TAs grading students not in their own sections? • Do you intend to discuss the exam when you return it?

Bibliography Allen, R. R. and Rueter, T. (1990) Teaching assistant strategies: An introduction to college teaching . Dubuque, IA: Kendall-Hunt. Cashin, W. E. (1987) Improving essay tests. Idea Paper No. 17. Manhattan, KS: Center for Faculty Development and Evaluation, Kansas State University. Dressel, P. L. & Associates. (1961) Evaluation in higher education. Boston: Houghton Mifflin. Lowman, J. (1984) Mastering the techniques of teaching . San Francisco: Jossey-Bass. McMillan, J. H. (Ed.). New directions in teaching and learning: Assessing students’ learning . No. 34. San Francisco: Jossey-Bass. Sax, G. (1974) Principles of educational measurement and evaluation. Belmont, CA: Wadsworth.

Center for Teaching and Learning CB# 3470, 316 Wilson Library Chapel Hill, NC 27599-3470

919-966-1289

Ctl Number 7

Overview

More details

Related Documents

Ctl Number 7

Ctl Fix.docx

7 Number Pyramids

Le Monde Arabe-ctl

Ctl Call For Speakers

Number

More Documents from ""

The Majors In Economics And Economics-accounting Are Part Of The

Vasutiu O - Xml Based Legal Document Drafting Information System

Tb72moore Quiz

Health Document Explanation By Virtual Agents

Homework 101

0910 Info Medicine Sellers