Norm and criterion-referenced tests
1
Norm-referenced and criterion-referenced tests as indicators of success in the classroom Norm-referenced and criterion-referenced tests serve a variety of purposes due to the array of educational situations that exist in today’s schools. Testing can rank students with each other or some other sociocultural norm, or testing can be based on some performance criteria that focus on assessing certain understandings or skill set. Ideally, a combination of both testing types exists in a way that is valid, reliable, and fair. Thus, given that many classrooms contain students with different socioeconomic and cultural backgrounds, testing becomes quite a challenge. Therefore, in order to assure that all students receive the most appropriate feedback, a variety of testing techniques is needed so that proper decisions and actions can be made that best suit the learner. Virtually all students have taken some kind of standardized test by the time they enter high school or college. Moreover, many standardized tests (i.e., high stakes tests) are used as a condition of graduation, acceptance, or financial aid. Because these tests are used as a way to rank or compare students, they are often referred to as norm-referenced tests (NRT) (Kubiszyn and Borich, 2007). NRTs are commonly used when stakeholders are interested in the central tendency of the results of a group of students, as when descriptive statistics are used to find the average, mean, median, and mode of a particular data set. When using tests to diagnose or to figure the aptitude of a student, inferences are made based on how students compare with each other or some other sample based on a social norm. Since results are “objective” – test items are usually in terms of right and wrong answers – and since many tests can be applied at once, NRTs are typically more appropriate for making decisions that are non-instructional based. In addition to NRTs being used externally to rank students (e.g., SAT, ACT, etc.), teachers oftentimes use NRTs to test students in the classroom. Multiple-choice, true-false,
Norm and criterion-referenced tests
2
matching, and essay questions are common testing types that fall under this same category. Test results are gathered, averaged, and ranked in order for teachers to make their best inference as to what level a student has understood, obtained the necessary skill set, or developed the intended disposition based on the goals and objectives of the classroom. Subsequently, instructional decisions are often made based on these results either by reviewing past information that students continue to struggle with or continuing on with new information that makes up part of the curriculum. Having framed NRT first as an external instrument, such as an ACT, then as an internal instrument used by teachers in their classrooms, one can see a noticeable difference in why they are being used in each circumstance. The former is to make decisions regarding achievement while the latter is to make decisions regarding instruction. This distinction is important when talking about a second type of test that is based on criteria. Instead of ranking students to some certain norm, another testing method aids in basing students performance in terms of meeting certain criteria. Kubiszyn and Borich (2007) define criterion-referenced test (CRT) as tests that “tells us about a student’s level of proficiency in or mastery of some skill or set of skills” (p. 66). Wiggins and McTighe (2005) also put forth the notion of promoting the six facets of understanding (e.g., explain, interpret, apply, perspective, empathy, and self-knowledge) when testing students regarding what they know and their disposition they possess. In other words, CRTs can provide teachers with greater insight on instructional decision-making adjustments when student performances are assessed in terms of performance criteria. Rubrics are often used in order to qualitatively assess performances and products. Arter and McTighe (2001) distinguish between a holistic and analytical trait rubric when they state “A holistic rubric gives a single score or rating for an entire product or performance based on an overall impression of a student’s work” and “an analytical trait rubric
Norm and criterion-referenced tests
3
divides a product or performance into essential traits or dimensions so that they can be judged separately-one analyzes a product or performance for essential traits” (p. 18). Communicating these “essential traits” with students provides the basis for what constitutes a “good” and “bad” performance or product, and is essential in setting the expectations between teacher and student. Indeed, CRTs are specifically suited for assessing understandings, knowledge, skills, and dispositions in terms of subsequent inferences towards instructional decision-making adjustments and adjustments to student learning tactics. Regardless of the test being administered, reliability, validity, and “absence-of-bias” (Popham, 2008, p. 73) drive the level of predictability an instrument has in making proper inferences on a student’s achievement. Reliability in NRTs is of high concern since many versions of the ACT, for example, are expected to contain test items that measure the same content. Similarly, the same ACT should yield similar results (i.e., a high correlation coefficient) if students retake the exam without being exposed to a learning intervention in the interim. The validity of a test pertains to the three Cs: “content, criterion, and construct” (Popham, 2008, p. 53). Content validity addresses how test items represent concepts that are covered in the curriculum. Criterion validity in NRTs deals with how accurate the testing items are in predicting future behavior (e.g., ACT and SAT scores and subsequent academic success or failure). Criterion validity in CRTs deals with rubric traits and how valid they are in terms of a student’s future performance. The final C, construct validity, has to do with how a student’s performance over time is gauged in terms of meeting criteria that is aligned to the curriculum. And finally, absence-of-bias centers on how test items present information that is fair; that is, does not lean towards a certain group of people based on socioeconomic status, race, ethnic background, gender, or sexual orientation.
Norm and criterion-referenced tests
4
NRTs and CRTs should not be considered dichotomous, but are two different approaches to assessing students in a complementary way. Ranking and comparing students has a purpose when the goal is to measure achievement and to predict future academic success. Conversely, testing understandings, knowledge, skills, and disposition through performance and product criteria serves a vital role in making inferences that influence instructional decisions and student tactic adjustments. In order for tests to be valid, reliable, and absent of bias, test designers should conduct a variety of reviews to assure that tests measure curricular aims, are reliable within the same and different versions of an exam, and do not discriminate minority groups based on age, race, gender, socioeconomic status, or sexual orientation. Tests are the link between the written and taught curriculum, the ideal and the reality of what schools are for all its stakeholders. Thus, in order to continue the development and improve the feedback that tests provide all of its stakeholders, a collaborative effort is needed in bringing together a community of practice that addresses these important aspects of testing and assessment.
Norm and criterion-referenced tests References Arter, J. and McTighe, J. (2001). Scoring rubrics in the classroom: Using performance criteria for assessing and improving student performance. Thousand Oaks: CA: Corwin Press. Kubiszyn, T. and Borich, G. (2007). Educational testing and measurement: Classroom application and practice. Hoboken, NJ: Wiley and Jossey-Bass Education. Popham, W. (2008). Classroom assessment: What teachers need to know. New York: Pearson. Wiggins, G. and McTighe, J. (2005). Understanding by design. Alexandria, VA: ASCD.
5