1.
Introduction
Teachers all over the world develop test items and administer them (test items) to their learners. Seema explained the concepts test items as a method that is being used to review items in a test, both qualitatively and quantitatively. In this report a quantitative or statistical approach is going to be used to determine the reliability and validity of each test item. The responses of each student to each test item will be shown in figures and tables which will subsequently be interpreted to test whether each test met the minimum quality control criteria. According to Seema Varna, most of these test items are diagnostic tools and are not meant to measure growth. The diagnostic nature of most developed test items make them to be of inferior quality, and makes them prone to not being able to test their reliability and validity. The test items administered to the learners in the classrooms enable the teachers to identify learners with problems, it also helps with class instructions, curriculum and teacher development. For test items to be fully operational, the teacher should make thorough preparations of the test items that will be administered to the learners. The thorough planning that could be incorporated into the development of the test items, it is to check each test items for quality. These will enable the teacher to obtain the highest quality and also ensure that reliable test results are obtained.
2.
Purpose of the report
The purpose of this report is to disseminate information pertaining to the descriptive statistics done on 20 multiple choice questions which was administered to 25 students.
3.
Test analysis
3.1 Descriptive statistics Descriptive statistics refers to the use of statistics to depict the set of scores’ central tendency, how they are dispersed from one another and how they vary from one another. In short, it refers to the mean, mode, median and the standard deviation. In this report, a set of test scores were used to calculate the mean, the mode, the median and the standard deviation. The output was the figures shown in Table 1 for each calculation, i.e. the calculation of the mean, the median, the mode and the standard deviation. The mean is the average of a set of scores. The median is the midpoint or the middle value of a distribution. The mode is the number or the score that appear or occur most frequently than the other numbers in a set of scores.
1
Refer to Table 1 to see calculated descriptive statistics.
Table 1: Descriptive statistics Mean Mode Median STDEV2 STDEV
65.79 65.00 65.00 479.57 21.90
The descriptive statistics can be interpreted as being a normal distribution as the central tendency of the mean, the mode and the median are the same. In a bell shaped curve or a normal distribution 68% of the test scores fall within one standard deviation to the mean. About 95% fall within two standard deviation of the mean and about 99% fall within three standard deviation of the mean. Refer to Figure 1 to see the normal distribution curve
Figure 1: Normal distribution curve
3.2 Frequency graphs 3.2.1
Grouped frequency table
The grouped frequency table’s calculations are used to draw graphs. The highest score obtained in the test item is 100 and the lowest score is 15. The range was obtained by subtracting the lowest score from the highest score and it gave us 85. The number of intervals can be decided upon by an individual teacher or researcher. The size of the interval is obtained by dividing the Range by the number of intervals, of which the quotient was 8.5. Refer to Table 2 to see Grouped frequency table
2
Table 2: Grouped frequency table H L Range Number of intervals Size of intervals
3.2.2
100 15 85 10 8.5
Cumulative frequency distribution
To calculate the cumulative frequency distribution, you have to add 1 to the first frequency value and add 1 to the total sum of the added frequency value plus 1, this you continue to do until you arrive at the last frequency value. Refer to Table 3 to see the Cumulative frequency distribution
Table 3: Cumulative frequency distribution Lower Upper Middle Cumulative limit limit Interval value Frequency frequency 15.00 25.00 35.00 45.00 55.00 65.00 75.00 85.00 95.00 3.2.3
24 34 44 54 64 74 84 94 104
15 -24 25 - 34 35 - 44 45 - 54 55 - 64 65 - 74 75 - 84 85 - 94 95 - 104
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
1 2 0 4 3 6 1 6 2
1 3 3 7 10 16 17 23 25
Frequency histogram
A histogram is a type of summarising data either in the form discrete and continuous interval scale. It is mainly used to illustrate the major distribution of data in a convenient way. A histogram divides the range of possible values in a set of data into groups or classes. For every class or group of data, a rectangle is with a base length equal to the range values in the specific group. The result may be that the rectangles may be of different height. Refer to Figure 2 to see the frequency histogram
3
Figure 2: Frequency histogram
Frequency
Frequency Histogram 7 6 5 4 3 2 1 0 15-24
25-34
35-44
45-54
55-64
65-74
75-84
85-94
95-104
Interval
Frequency polygon
3.2.4
The middle values were plotted against the frequency and a straight line drawn on the joining points of the values and the frequencies. The points obtained were (19.5 & 1), (29.5 & 2), (39.5 & 0), (49.5 & 4), (59.5 & 3), (69.5 & 6), (79.5 & 1), (89.5 & 6) and (99.5 & 2). Refer to Figure 3 to see the Frequency Polygon
Figure 3: Frequency polygon
Frequency
Frequency Polygon 7 6 5 4 3 2 1 0
69.5
89.5
49.5 59.5 29.5
99.5
19.5
79.5 39.5
9.5
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5 Middle values
3.2.5
Cumulative frequency graph (An ogive)
An ogive is a cumulative frequency polygon, and is sometimes presented in a percentage form. It is plotted on the X and Y axis. Its major use is to estimate
4
the percentile. The important percentile of the ogive is the median, which is 50%, the lower quartile, which is 25% and the upper quartile which is 75%. Refer to Figure 4 to see the Cumulative Frequency Graph
Figure 4: Cumulative frequency graph (An ogive)
Cumulative Frequency
Cumulative frequency graph (An Ogive) 30 25 20 15 10 5 0 14
24
34
44
54
64
74
84
94
104
Upper Values
3.3 Reliability coefficients Reliability means the extent to which the test will consistently yield the same test scores. The test scores are free from random errors of measurement. A test score of 1.00 has a standard error of zero, which means that it is perfectly reliable Refer to Table 4 to see the reliability coefficients
Table 4: Reliability coefficients K K–1 Total pq STDEV (STDEV)2
20 19 3.83 21.90 479.57
KR20
1.04
The reliability coefficients of the test scores of the 20 multiple choice test items were perfect as the score is 1.04. It means that the test will yields the same results even if it can be administered to other students.
5
4.
Item analysis
The concept means to test the quality of test items by examining the responses of students to each test item. The process uses mostly the difficulty and discrimination indices.
4.1 Difficulty index The concepts refer to the number of students who answered each test item correctly, and the number of those who answered each test item incorrectly. It is a way of indicating the difficulty of each test item and thus its quality. If all students have answered a test item correctly, it could indicate that the test item was too easy, and if one test item was not answered correctly by any of the student, then the test item could have been too difficult. Refer to Table 5 to see the Difficulty index
Table 5: Difficulty index (p) Difficulty index # Questions
# Correct
# Answered
p
Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Question 7 Question 8 Question 9 Question10 Question 11 Question 12 Question 13 Question 14 Question 15 Question 16 Question 17 Question 18 Question 19 Question 20
21 22 17 12 21 17 11 12 13 8 23 19 15 21 20 22 15 8 13 16
25 25 25 25 25 25 25 23 25 24 25 25 25 25 25 24 24 24 25 25
0. 84 0. 88 0. 68 0. 48 0. 84 0. 68 0. 44 0. 52 0. 52 0. 33 0. 92 0. 76 0. 6 0. 84 0. 8 0. 92 0. 63 0. 33 0. 52 0. 64
6
4.2 Interpretation of the difficulty level of questions The test items are now analysed individually to see the difficulty level of each. 20 questions were answered of which questions 1, 2, 5, 11, 12, 14 and 16 were unacceptable, as they were too easy. Questions 3, 4, 6, 7, 8, 9, 10, 13, 15, 17, 18, 19 and 20 were acceptable, meaning that they were fine and not difficult. In percentage form we can say that 35 % of the 20 questions were unacceptable and 65 % were acceptable. Refer to Table 6 to see the Interpretation of the difficulty level of questions
Table 6: Interpretation of the difficulty level of questions # Questions
Proportion
Interpretation
Reason
Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Question 7 Question 8 Question 9 Question10 Question 11 Question 12 Question 13 Question 14 Question 15 Question 16 Question 17 Question 18 Question 19 Question 20
0. 84 0. 88 0. 68 0. 48 0. 84 0. 68 0. 44 0. 52 0. 52 0. 33 0. 92 0. 76 0. 6 0. 84 0. 8 0. 92 0. 63 0. 33 0. 52 0. 64
Unacceptable Unacceptable Acceptable Acceptable Unacceptable Acceptable Acceptable Acceptable Acceptable Acceptable Unacceptable Unacceptable Acceptable Unacceptable Acceptable Unacceptable Acceptable Acceptable Acceptable Acceptable
Too easy Too easy Fine Fine Too easy Fine Fine Fine Fine Fine Too easy Too easy Fine Too easy Fine Too easy Fine Fine Fine Fine
4.3 Discrimination index (D) This is an extent to which a test differentiates between high scoring students and low scoring students. The discrimination index usually ranges from -1.00 to +1.00. Items in negative discrimination are the items that usually need to be rewritten by the students and the one that the teacher has to develop in such a way that the reliability and validity of the test items can be ensured. Refer to Table 8 to see the Discrimination index
7
Table 7: Discrimination index (D) Discrimination index #U 15 15 14 8 15 12 9 10 10 8 14 14 12 15 14 15 12 5 12 11
#L 6 7 3 4 6 5 2 2 3 0 9 5 3 6 6 7 3 3 1 5
D 0. 60 0. 53 0. 79 0. 50 0. 60 0. 58 0. 78 0. 80 0. 70 1. 00 0. 36 0. 64 0. 75 0. 60 0. 57 0. 53 0. 75 0. 40 0. 92 0. 55
The discrimination index questions were all in the positive values. It means that the high scoring students were able to choose the key and not the destructors in the test items of the multiple choices.
4.4 Number of students in upper and lower group The concept refer to the measure of the test item’s ability to can differentiate between the students who are more likely to can answer each test item correctly and those who are likely to can answer each test item incorrectly. Refer to Table 8 to see the number of students in the upper and lower group
Table 8: Number of students in upper and lower group Upper Lower
15 10
8
The number of the students who are in the upper group is 15. It means that 60% of the students were able to discriminate the correct answer from all the destructors and 40% were not able to discriminate the correct key from all the other destructors.
5.
Conclusion
The reliability and validity of test items cannot be stressed hard enough. A teacher has to meticulously plan each and every test item to ensure that the minimum high quality of test items is maintained. Each and every test item should be of a high quality, and even if it is only one test item out of twenty, then that test item should be develop up until the highest quality is achieved. Test items should always test that which they were purported to test and they should always yield the same results consistently. Teachers should incorporate the use of standardize testing, in this case it was multiple choice test items with testing techniques that incorporates higher order cognitive skills such as performance based and criterion based assessments. These can be in the form of essays, open ended problems, interviews and oral presentation. For the test items to be adequately be assessed, a predetermined evaluation criteria should be used by the teacher to ensure the highest level of reliability is maintained
9
6.
References 1.
Glenwood high school [Image] Retrieved October 10, 2007 from http://www.glenwoodhighschool.co.za/images/photos/exams2003.jp g
2.
Glossary of Measurement Term. Retrieved October 05, 2007 from http://harcourtassessment.com
3.
Kubiszyn, T & Borich G. (2007) Educational Testing and Measurement. Classroom Application and Measurement. 8th Edition. United States of America. John Wiley & Sons, Inc.
4.
Valerie J Easton & McColl, J.H. Statistical Glossary (n.d.). Retrieved October 8, 2007 from http://www.stats.gla.ac.uk/steps/glossary/index.html
5.
Varna, S. (2007). Retrieved October 10, 2007 from http://www.descriptive.statistics.gla.ac.uk/homenet.html
6.
Hunt, N (2002). Ogive. Retrieved October 10, 2007 from http://home.ched.coventry.ac.uk/Volume/vol0/ogive.htm
10
7.
Appendices
7.1 Appendix A #Question Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Question 7 Question 8 Question 9 Question 10 Question 11 Question 12 Question 13 Question 14 Question 15 Question 16 Question 17 Question 18 Question 19 Question 20
Prop Prop #Correct #Incorrect Correct Incorrect pq (p) 21 22 17 12 21 17 11 12 13 8 23 19 15 21 20 22 15 8 13 16
4 3 8 13 4 8 14 11 12 16 2 6 10 4 5 2 9 16 12 9
0. 84 0. 88 0. 68 0. 48 0. 84 0. 68 0. 44 0. 52 0. 52 0. 33 0. 92 0. 76 0. 6 0. 84 0. 8 0. 92 0. 63 0. 33 0. 52 0. 64
0.16 0.12 0. 32 0. 52 0. 16 0. 32 0. 56 0. 48 0. 48 0. 67 0. 08 0. 24 0. 4 0. 16 0. 2 0. 08 0. 38 0. 67 0. 48 0. 36 Total
0. 13 0. 12 0. 22 0. 25 0. 13 0. 22 0. 25 0. 25 0. 25 0. 22 0. 07 0. 18 0. 24 0. 13 0. 16 0. 08 0. 23 0. 22 0. 25 0. 23 3.83
11
Appendix B: Spreadsheet of the test items answered by 25 students
7.2 Key St No
C Q1
B Q2
D Q3
D Q4
B Q5
C Q6
D Q7
A Q8
C Q9
B Q10
A Q11
C Q12
B Q13
D Q14
A Q15
A Q16
C Q17
D Q18
B Q19
C Q20
1 2 3 4 5 6 7 8 9 10 11 12 13
C C C C C C B C C C C C C
B B B B B A B B B B B B B
B D D D D D A D D B D D D
A D D B C D B B A A D D A
C B B B B C B B B B B B B
D D C C C C C C C C C C C
A A D B B A B B D D D D D
A A A A D B D D C A D A
D C C C C C D B B D C D C
D B B B D D D C D C B A B
A A A A A A A A A A A A A
D C C C C C C C C B C C C
A B B A B A B B B A B A B
A D D D D D D D D D D D D
A A A C A A C A A D A A A
A A A A A A A A A A A A A
C C C C A A A C C C C C A
B D B B B B D A B D D B B
D B D C B D D B D B B B B
B C C C C C C A A C C D C
14 15 16
C C C
B B B
D D D
A D D
B B B
C B C
D A D
A A A
C B C
B D B
A A A
C C C
B D B
D A D
A A A
A C A
A B C
B D
B D B
C D C
17 18 19 20 21 22 23
B C D C C B C
B B C B A B B
C B A D D A D
C D D D D B B
B B B B C B B
A A A C C C C
D D B D A B B
D D A A D B D
C D D C C D B
D C A D D C
A A C C A A A
D C D D C C C
B A A B A B B
D D A D D D D
A A D A A C A
A B A A A A
C B C A A C
C B B D B D A
A B B D D B
D C B C C C A
24 25
C C
B B
B D
A D
C B
D D
A A
A
D C
D B
A A
D C
A B
A D
A A
A A
C C
B D
D B
B C
12