EVALUATION Basic Concepts Sylvester Saimon Simin Keningau Teachers Training College
You should be able to…. Knowledge 6. Evaluation 6.1 Test Blue Print and Question Construction (10 hours)
6.2 Centralised examinations (PMR and SPM) Format of PMR and SPM examination papers
Skills
Values / Remarks
Prepare a test blue print based on KBSM Science Syllabus Construct objective, structure, and essay questions based on the test blue print Prepare marking schemes for the above questions
T & L Resources: •Kementerian Pendidikan Malaysia (1995d) •Past year PMR and SPM Examination papers
Study and analyze the format of the examination papers on the following aspects: distribution of multiple choice, structure and essay questions. Analyze each MCQ and classify it based on Bloom’s Taxonomy
Values: To be aware of the importance of planning and preparation before administering a test. To be aware of the accountability of the test. 2
I Have a Dream About Assessment Roger Farr • I have a dream that assessment... – ...will be accepted as a means to help teachers plan instruction rather than as a contrivance to force teachers to jump through hoops; – ...will be based on trust in a teachers judgment as much as numbers on a page are trusted;
• I have a dream that assessment...
– ...will become a helpful means to guide children to identify their own literacy strengths rather than a means to conveniently label them; – ...will support each child in becoming the best he or she can be rather than a means to sort children into groups of the best and the worst;
• And I have a dream that assessment...
– ...will be put to use to honor what children can do rather than destroying them for what they can’t do.
• If we all work together we can make such dreams become a reality as we work to help each child grow.
3
4
Purpose of Evaluation • • • • • • • •
to determine the students’ achievement of certain knowledge and skills as specified by the syllabus of the subject to measure students’ progress over time, to rank students’ in terms of their achievement, to diagnose the main difficulties faced by the students in the areas of study, to determine how effective are the teacher’s instructional strategies, to determine the effectiveness of the curriculum, its strengths and weaknesses, to encourage good study habits, to motivate students 5
Assessment Terms • Performance Assessments – Assessment requiring students to demonstrate their acheivement of understandings and skills by actually performing a task or set of tasks (eg. Writing a story, giving a speech, conducting an experiment, operating a machine)
6
Assessment Terms • Alternative Assessment – A title for performance assessments that emphasizes that these assessment methods provide an alternative to traditional paper-andpencil testing.
7
Assessment Terms • Authentic Assessment – A title for performance assessments that stresses the importance of focusing on the application of understandings and skills to real problems in “real-world” contextual settings
8
TERM
WORKING DEFINITION
accreditation
The official endorsement of the procedures and/or standards of an institution by an authority. For example, an examination board may accredit a center for the assessment of course work.
aim (educational aim)
A long-term goal which may or may not be achievable within the teaching program.
appeal
A challenge by a candidate or a school to the results awarded by an examining authority.
assessment
General term used for the 'measurement' of a behavior or characteristic
assessment component
One part of an assessment package - e.g., a written paper, a practical test, an oral exam, a piece of coursework.
assessment objective
A statement of an expected learning outcome which will be assessed.
assessment package
The total assessment scheme which may be composed of one or more components
aural examination
Listening test (not to be confused with an 'oral test' i.e., a test of speaking.)
backwash effect (occasionally 'washback effect')
The effect (positive or negative) of the scheme of assessment on the teaching/ learning program which precedes it.
bias
Tendency of a test, or an item, to place one group at an advantage over another on the basis of a factor (e.g., gender, ethnicity, language) other than that which the test purports to assess.
camera-ready copy (CRC)
Final proof of an examination paper as it will appear, after printing, on the candidate's desk.
centralized marking
Administrative arrangement where all answer scripts are brought to a central location for marking. Where markers remain at the center throughout the marking period, this may be referred to as 'residential marking'.
certification
Use of examination results to provide individuals with documentary evidence of achievement (i.e., a certificate).
classical item statistics
Statistics describing the behavior of a test item (typically its level of difficulty and its discriminatory power) by analysis the responses of a particular group of test-takers. Note that such statistics are dependent on the group taking the test. (See also IRT).
coaching
Special preparation of candidates for an examination typically by practicing the techniques of test taking, rote learning of past questions and answers, 'question spotting' etc.
code of practice
Set of guidelines and/or regulations controlling the procedures of assessment authorities in the conduct of public examinations. Where examination bodies have constitutional autonomy, this may have to be a voluntary code of practice.
9
curriculum
All educational aspects of an institution and its teaching programs including non-examined subjects
cut-off point
Test score at which students are deemed successful (and below which they are deemed unsuccessful). See also grade threshold.
double marking
Procedure in which answer scripts are independently scored by two raters. Where there is a discrepancy between scores, set procedures apply for reaching the final score. Typically these include averaging small differences and using an 'expert marker' as an arbiter where differences are large.
end-users
Individuals or institutions who use examination results for their own purposes e.g., universities, schools, employers.
equity
An equitable examination ensures that all students who possess the same degree of ability receive the same result. Where there are inequities, an individual or group gains an unfair advantage over others. It follows that inequity places some individuals and/or groups at a disadvantage due to factors other than the ability that the examination purports to assess.
evaluation
Assessment for the purpose of making a value judgment, e.g., to judge the effectiveness of a teaching program
examination center
Place officially recognized for the conduct of examinations. Typically centers are state schools, private schools, university halls or private buildings hired for examination purposes.
feedback
The systematic flow of information gained from an assessment to educationists, policy makers, and others e.g., examiner reports for teachers.
formative assessment
Assessment which takes place as an integral part of the teaching-learning program (see also summative assessment).
grade threshold
Test score between two reporting grades. For example, if the A-grade threshold is 81%, students scoring 80% will be awarded grade B and those scoring 81%, grade A.
group certificate
Examination system which requires candidates to take a prescribed number and combination of subjects. The award of the certificate is dependent on the candidate meeting pre-determined criteria for success.
10
high-stakes examination
An examination where students, parents and teachers invest a great deal of effort, and perhaps money, in preparing because success can potentially bring great rewards whilst failure may damage the candidate's life-chances.
impersonation
Form of malpractice where someone takes an examination in place of the registered candidate.
invigilator
Person who supervises and is responsible for the conduct of an examination in a particular examination room/hall.
IRT
Item Response Theory (sometimes IRM - Item Response Modeling). Psychometric tool which, in its simplest form, uses a mathematical model to link a student's chance of being successful on an item with the student's ability and the item's difficulty. This allows items to be calibrated on an absolute measurement scale.
item bank
A collection of items categorized according to their characteristics e.g., type of item, topic, skill being assessed, level of difficulty, etc. Items are then drawn from the bank to build a test according to predetermined test specifications.
league table
A table which ranks schools on the basis of examination results and other indicators (see also 'value added').
leakage
Unauthorized release of examination materials and/or information prior to the official release date.
localization
Where an independent country takes responsibility for the maintenance and further development of an examination system introduced by a former colonial authority.
malpractice
Any deliberate act of wrongdoing, contrary to the rules of the examination, designed to give a candidate an unfair advantage or, albeit less frequently, to place a candidate at a disadvantage.
marker
One who marks/scores candidate responses (also rater).
marking scheme
Instructions as to how marks are to be allocated to student responses (answers). These may be detailed for objective and semi-objective tasks. For open-ended and subjective tasks, they may take the form of general descriptions ('band descriptors').
measurement
An assessment made using the concept of a well-defined ability scale to quantify a behavior or characteristic e.g. mathematical ability.
moderation
General term used by examining authorities for the process of checking quality. Question paper moderation typically involves the review of draft question papers by an expert panel. Moderation of school-based assessment may involve a Board representative visiting the school to look at work and interview teachers and students. Alternatively, samples of student work may be sent for review by a Board moderator.
National Assessment
Assessment designed to determine national standards usually conducted using a representative sample of students.
11
objective item
Item that can be scored without the marker making a personal judgment as to the quality of the response e.g., multiple-choice.
OMR
Optical Mark Reader - scanning device for reading marks from special forms thereby allowing the automatic input of student responses to, for example, multiple-choice question papers.
parastatal
Term applied, especially in Africa, to an organization established by a government but which, through its constitution and budgetary arrangements, enjoys a great degree of operational freedom and insulation from direct political interference.
pedagogy
The science of teaching including both theory and practice.
private candidate
Candidate who enters, and pays for, his/her own entry to a public examination as compared with a candidate who is entered by the institution (school) in which he/she is studying and which is recognized by the examining authority as an authorized center.
psychometry (psychometrics)
Field concerned with the measurement, and hence quantification, of human behaviors and characteristics. Psychometric strategies are built on statistical models of measurement and human behavior.
public examination
An examination offered by a national or provincial (state) authority, or on behalf of such an authority, to students at a particular level of an education system. The primary purpose is to certify the level of achievement of individual students and/or to select students for the next level of the education system.
quota system
Form of selection system where the share of available opportunities to be awarded to a particular group is pre-determined. For example, in order to ensure gender balance in a selective secondary school system, 50% of places may be awarded to boys and 50% to girls. As a consequence, some boys may be selected with lower examination scores than those achieved by girls who are rejected (or vice versa).
rater
One who marks/scores candidate responses - a marker.
registration
Key process whereby the details of individuals (students) are entered into the administrative database as candidates for forthcoming examinations.
regular candidate
Term used, particularly in the Asian sub-continent, for candidates registering through recognized centers for a series of examinations for the first time. Private candidates and those re-sitting examinations are considered irregular.
reliability
A measure of the stability of the results produced by an examination. This includes the stability of scores on re-testing, the stability of scores with remarking, and the correlation of scores for sub-sections within the test (homogeneity).
school-based assessment
Any assessment of student performance which takes place in a school and is incorporated into the public examination result. Note that the degree of freedom allowed to the school will depend on the regulations and moderation procedures of the examining authority.
12
script
General term for an answer booklet or sheets produced by a candidate in response to an assessment task.
selection
Use of examination results to select individuals for educational or employment opportunities where the number of such opportunities is limited. In many developing countries, examination results are used to select students for the next phase of education e.g. primary-secondary, lower secondary-higher secondary, secondary-tertiary.
specification grid
A plan or 'blueprint' giving the format of a question paper or other assessment component.
stakes (of an examination)
The importance of an examination as judged by what may be gained through success - and what may be lost through failure. Therefore, a 'highstakes' examination will typically be highly competitive because the successful will enjoy greatly enhanced opportunities.
structured question
Task composed of a number of sub-questions (items) linked by a common context or piece of stimulus material. The sub-questions may be independent of each other or may be sequenced to lead candidates through a more complex task (progressive).
subjective item
Item that requires the marker (rater) to make a personal judgment as to the quality of the response e.g. the literary merits of an essay or the artistic merits of a painting. Note that in order to minimize variation, rater judgments may be guided and constrained by marking schemes and descriptors of performance.
summative assessment
Assessment which takes place at the end of the teaching-learning program to record 'final achievement' (see also formative assessment).
supplementary examination
A follow-up examination allowing students to retake subjects in which they have not reached the required level. This issue is of particular importance in systems awarding group certificates.
syllabus (examination syllabus)
A document formally specifying what will be assessed by the examination and how the assessment will be carried out.
tamper-evident packaging
Plastic envelopes for examination materials which cannot be resealed without showing obvious signs of being opened.
teaching objective (curriculum objective)
A specific short-term goal of the teaching program.
teaching program
The program of instruction.
teaching/learning program
The instruction delivered by a teacher coupled with the learning that takes place during the program.
transparency
Extent to which the processes involved in the examination system are visible to the public - especially schools, teachers and students.
validity
A measure of the extent to which an examination measures what it purports to measure.
13
Achievement
A demonstration of learning at a particular moment in time
Alternative assessment
Any and all assessments that differ from multiple choice, one word answer, timed items that characterize standard tests
Assessment
The gathering of data about students or program, often used as a formative process to guide instruction
Criterion
The standard against which performance is measured
Criterion-referenced
Judgement of performance against a previously agreed standard
Diagnostic assessment
Determines the level of achievement/performance prior to entering a
Evaluation
The application of judgement to the data in the form of a grade or comment, placing a value on that work
Formative assessment
Ongoing feedback on a student=s performance throughout the learning process
Grading
Assigning a letter, percentage or score
Ipsative assessment
The measure of student growth
Learning outcome
A general statement which describes an observable result by which a student demonstrates knowledge, skill or attitude
Norm-referenced
Judgement of performance against the norm for the group
Objective
A specific statement of intent
Peer assessment
Reflective practice in which students make observations about the performance of their peers
14
Performance assessment
Usually an alternate or authentic assessment, where a student completes a relevant task which demonstrates learning by using or applying knowledge
Portfolio assessment
The assessment of a representative collection of a students work over time
Process assessment
Focuses on the variety of strategies, thinking skills and processes that a student uses to complete a task
Product assessment
Focuses on the end product of a learning process
Reporting
Communicating process or achievement to the student or his/her parents or guardian
Rubric
A set of quality criteria
Self-assessment
Reflective practice in which students make observations about their own performance
Self-referenced
Reflective practice in which students make observations about their own performance
Standard
A point of reference against which judgements can be made
Summative Evaluation
A report on the final achievement -- given at the end of a unit or work or semester or year 15
What is performance assessment? • A performance assessment is an assessment activity that requires students to construct a response, create a product or demonstrate a skill they have acquired. Rubrics, based on the selected criteria, are given to students to ensure that they know what they need to do to meet or exceed the learner outcomes. • Well-constructed performance assessments: – are the most authentic types of assessment since they replicate out of school experiences, encourage self-evaluation and demonstrate what students know and can do; – put students in a role (e.g. scientist, newspaper editor) and provide an audience for their task – provide degrees of proficiency based on criteria and make public the criteria. 16
A few things to know …… • Bloom’s taxonomy • Difference between; – Testing, measurement, evaluation – objective & subjective items – formative & summative evaluation – critrion reference test & norm reference test
• Validity & Reliability
17
18
19
20
21
The Assessment Process •
Preparation (including Test / Task Blueprint) •
Determine the kind of information needed and decide how and when to obtain it.
2. Information gathering –
Obtain a variety of information as accurately as possible
3. Forming judgements –
Judgements are made by comparing the information to selected criteria.
4. Decision making and reporting –
Record significant findings and determine appropriate courses of action. 22
INFORMATION GATHERING • Information gathering techniques – Procedure for obtaining information – Inquiry (asking), observations (senses), analysis (performance, product), testing (common situation to which all students respond,common set of instructions governing response, set of rules for scoring responses & description of performance ie score)
• Information gathering instrument – Tools used to gather information – 3 basic types : tests, rubrics and questionnaires • Teacher made test / classroom tests vs standardized tests • Rubric : set of rules for scoring student products or performance. Typically take the form of a checklist or a rating scale • Questionnaires : useful for getting opinions, feelings and interests 23
Information Gathering Techniques
Kind of information obtainable
Objectivity
cost
inquiry
observation
analysis
testing
•Opinions •Self-perceptions •Subjective judgements •Affective (especially attitudes) •Social perceptions
•Performance or end products of some performance •Affective (especially emotional reactions) •Social interaction •Psychomotor skills •Typical behavior
•Learning outcomes during the learning process (intermediate goals) •Cognitive and psychomotor skills •Some affective outcomes
•Attitude and acheivement •Terminal goals •Cognitive outcomes •Maximum performance
•Least objective •Highly subject to bias and error
•Subjective, but can be objective if care is taken in the construction and use of the instruments
•Objective but not stable over time
•Most objective and reliable
•Inexpensive but can be time consuming
•Inexpensive but time-consuming
•Fairly inexpensive •Preparation time is somewhat lenghty but crucial
•Most expensive, but most information gained per unit of time
24
Information Gathering Instrument Type Standardized test
Teacher made test
Rubrics To assess the quality of student performance
Questionnaires
Used
Advantage
Disadvantage
when accurate information is needed
Usually well developed and reliable. Include norms for comparing the performance of a class or an individual
Often not measuring exactly what had been taught. Expensive. Limited in what is measured.
Routinely as a way to obtain achievement information
Usually measure exactly what has been taught. Inexpensive. Can be constructed as need arises.
No norms beyond the class are available. Often unreliable. Require quite a bit of time to construct.
Checklists To determine the presence or absence of specific charateristics of performance
Helpful in keeping observations focused on key points or critical behaviors.
Measure only presence or absence of a trait or behavior.
Rating Scales To judge quality of performance
Allow observational data to be used in making qualitative as well as quantitative judgements
Take time and effort to construct. Can be clumsy to use if too complex.
To inquire about feelings, opinions, and interests
Keep inquiry focused and help teacher obtain the same information from each student.
Take time and effort to construct. Difficult to score. No right or wrong answers. Data difficult to summarize. 25