1.
Introduction
This is a report on test and test items analysis using descriptive statistics (measure of tendency and variability) for a given set of scores. Twenty five students wrote a multiple choice test containing twenty questions with four distracters each, (see appendix A).
2.
Purpose of report
The purpose of this report is to disseminate information pertaining to test and item analysis for a given set of scores.
3.
Test analysis
Test analysis examines how the items perform as a set. According to Kubiszyn and Borich (2007), “no test you construct will be perfect”, meaning it includes invalid or deficient items. This necessitates analysis. 3.1
Descriptive statistics
From the test data (see appendix B), the mode occurs more frequently, the median is the score that splits a distribution by half, the mean is an average of a group of scores and standard deviation is the estimate of variability given by the square root of the sum of (x-Mean)2 over the number of students. The mode, median, mean and standard deviation are given in table 1. The table shows a normal distribution because the mode, median and mean is the same. Table 1: Mode, median, mean and standard deviation Mode
Median
Mean
Standard deviation
65
65
65.79
21.90
3.2 Frequency graphs
The frequency graphs are determined by having a grouped frequency table first, given in table 2.
Table 2: Grouped frequency table H
100 1
L Range Number of Intervals Size of interval
15 85 10 8.5
The cumulative frequency graph is determined by upper values as x-axis and cumulative frequency as y-axis. Cumulative frequency table is shown in table 3. Table 3: Cumulative frequency table Lower Limit 15 25 35 45 55 65 75 85 95
Upper Limit 24 34 44 54 64 74 84 94 104
Middle Value 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Frequency 1 2 0 4 3 6 1 6 2
Cumulative Frequency 1 3 3 7 10 16 17 23 25
The cumulative frequency graph is given in figure 1. An ‘ogive’ shape is formed. Figure 1: Cumulative frequency graph Cumulative frequency
Cumulative
30 25 20 15 10 5 0 24
34
44
54
64
74
84
94
104
Upper values
The frequency histogram is determined by intervals (lower values) as x-axis and frequency as y-axis. The frequency histogram is given in figure 2.
2
Figure 2: Frequency histogram Frequency histogram
Frequency
7 6 5 4 3 2 1 0 15-24
25-34
35-44
45-54
55-64 65-774 75-84
85-94 95-104
Intervals
The frequency polygon is determined by middle values as x-axis and frequency as y-axis. The frequency polygon is given in figure 3. Figure 3: Frequency polygon Frequency polygon
Frequency
7 6 5 4 3 2 1 0 19.5
29.5
39.5
49.5
59.5
69.5
79.5
89.5
99.5
Middle values
3.3
Test reliability
Reliability coefficient (KR20) is the appropriate index of test reliability for multiple choice tests. The coefficient is determined by means of a formula which includes the number of test items (k), student performance on every item (sum of pq), for pq values (see appendix C) and the standard deviation squared (stddev2) for the set of student test scores. The index ranges from 0.00 to 1.00. The larger the number the more reliable the student scores are. The (KR20) is determined by means of values given in table 4. 3
Table 4: Determining reliability coefficient (KR20) k k-1 Total pq stdev stddev2 KR20
20 19 3.83 21.90 479.57 1.04
Reliability coefficient (KR20) =1.04. This is a reliable number because it is large (almost 1.00). The student scores are reliable.
4.
Item analysis
Item analysis can be used to identify items that are deficient in some way so as to improve or even eliminate them. Matlock-Hetzel (2007) states that item analysis “investigates the performance of items considered individually in relation to the remaining items in the test”. 4.1
Difficulty index
This indicates the proportion of students who answered the item correctly. The proportion (p) equals number of students with correct answer over number of students who attempted the item. If p<0.25 it means the item is too difficult, and if p>0.75 then the item is too easy and therefore unacceptable. Calculation and interpretation of difficulty index for each question is given in table 5. Table 5: Calculation of difficulty index Questions #Correct #Answered p q1
21
q2 q3 q4
22 17 12
q5 21 Table 5: Calculation of
Interpretation Reason Too 25 0.84 Unacceptable easy Too 25 0.88 Unacceptable easy 25 0.68 Acceptable Fine 25 0.48 Acceptable Fine Too 25 0.84 Unacceptable easy difficulty index (continued)
4
Questions #Correct #Answered p
Interpretation Reason
q6 q7 q8 q9 q10
17 11 12 13 8
25 25 23 25 24
Acceptable Acceptable Acceptable Acceptable Acceptable
q11
23
25 0.92 Unacceptable
q12 q13
19 15
25 0.76 Unacceptable 25 0.6 Acceptable
q14
21
25 0.84 Unacceptable
q15
20
25
q16 q17 q18 q19 q20
22 15 8 13 16
24 24 24 25 25
4.2
0.68 0.44 0.52 0.52 0.33
0.8 Unacceptable 0.92 0.63 0.33 0.52 0.64
Unacceptable Acceptable Acceptable Acceptable Acceptable
Fine Fine Fine Fine Fine Too easy Too easy Fine Too easy Too easy Too easy Fine Fine Fine Fine
Discrimination index
According to Special Connections (2007), the discrimination index (D) is a “basic measure of item’s ability to discriminate between those who scored high (#u) on the total test and those who scored low (#L)”. If D value is positive (closer to 1.00) there is a strong relationship between performance on that item and overall test performance. This means the discrimination is fine. If D value is negative this suggests poor validity for an item. The distracters must be looked into. Calculation and Interpretation of discrimination index for each question is given in table 6. In this instance all items indicate a positive discrimination. Table 6: Calculation of discrimination index Questions #U #L D Interpretation q1 15 6 0.60 Fine q2 15 7 0.53 Fine q3 14 3 0.73 Fine q4 8 4 0.27 Fine Table 6: Calculation of discrimination index (continued)
5
Questions #U #L D Interpretation q5 15 6 0.60 Fine q6 12 5 0.47 Fine q7 9 2 0.47 Fine q8 10 2 0.53 Fine q9 10 3 0.47 Fine q10 8 0 0.53 Fine q11 14 9 0.33 Fine q12 14 5 0.60 Fine q13 12 3 0.60 Fine q14 15 6 0.60 Fine q15 14 6 0.53 Fine q16 15 7 0.53 Fine q17 12 3 0.60 Fine q18 5 3 0.13 Fine q19 12 1 0.73 Fine q20 11 5 0.40 Fine The number of students in upper and lower group is the measure of ability of an item to discriminate among students who have a high score on the test and those with a low score on the test. It is the difference between the correct responses in the upper group and of the correct responses in the lower group. The number of students in upper and lower group is given in table 7. Table 7: Number of students in upper and lower group #Upper #Lower
5.
15 10
Conclusion
In conclusion, since the (KR20) is reliable, while sixty percent of the items under difficulty index are acceptable and the discrimination index is positive on all items, the overall test is valid. Analysis of response options allow educators to fine tune and improve items they may wish to use again with future classes. If items are too difficult teachers can adjust the way they teach. The greater the number of plausible distracters, the more accurate, valid and reliable the test becomes.
References
6
Kuiszyn, T. and Borich, G. (2007). Educational Testing and Measurement: Classroom Application and Practice, p (204-326). Eighth edition. John Wiley & Sons, INC. USA. Matlock-Hetzel, S. (2007). Basic Concepts in Item and Test Analysis. Texas A & M University. Retrieved October 02 2007, from http://ericae.net/ft/tamu/Espy.htm Special Connections. (2007). Retrieved October 02 2007, from http://www.Specialconnections.ku.edu/cgibin/cgiwrap/cpecconn/print.php?path=page/ass..
7
Appendix A Key St No
C
B
D
D
B
C
D
A
C
B
A
C
B
D
A
A
C
D
B
C
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q19
Q20
1 2 3 4 5 6 7 8 9 10 11 12 13
C C C C C C B C C C C C C
B B B B B A B B B B B B B
B D D D D D A D D B D D D
A D D B C D B B A A D D A
C B B B B C B B B B B B B
D D C C C C C C C C C C C
A A D B B A B B D D D D D
A A A A D B D D C A D A
D C C C C C D B B D C D C
D B B B D D D C D C B A B
A A A A A A A A A A A A A
D C C C C C C C C B C C C
A B B A B A B B B A B A B
A D D D D D D D D D D D D
A A A C A A C A A D A A A
A A A A A A A A A A A A A
C C C C A A A C C C C C A
B D B B B B D A B D D B B
D B D C B D D B D B B B B
B C C C C C C A A C C D C
14 15 16
C C C
B B B
D D D
A D D
B B B
C B C
D A D
A A A
C B C
B D B
A A A
C C C
B D B
D A D
A A A
A C A
A B C
B D
B D B
C D C
17 18 19 20 21 22 23
B C D C C B C
B B C B A B B
C B A D D A D
C D D D D B B
B B B B C B B
A A A C C C C
D D B D A B B
D D A A D B D
C D D C C D B
D C A D D C
A A C C A A A
D C D D C C C
B A A B A B B
D D A D D D D
A A D A A C A
A B A A A A
C B C A A C
C B B D B D A
A B A B D D B
D C B C C C A
24 25
C C
B B
B D
A D
C B
D D
A A
A
D C
D B
A A
D C
A B
A D
A A
A A
C C
B D
D B
B C
8
Appendix B x 100.00 100.00 90.00 90.00 90.00 89.47 85.00 85.00 75.00 70.00 70.00 65.00 65.00 65.00 65.00 60.00 55.00 55.00 50.00 50.00 47.06 45.00 31.58 31.58 15.00
Group x-Mean (x-Mean)2 U 34.21 1170.49 U 34.21 1170.49 U 24.21 586.24 U 24.21 586.24 U 24.21 586.24 U 23.69 561.03 U 19.21 369.12 U 19.21 369.12 U 9.21 84.87 U 4.21 17.74 U 4.21 17.74 U -0.79 0.62 U -0.79 0.62 U -0.79 0.62 U -0.79 0.62 L -5.79 33.50 L -10.79 116.37 L -10.79 116.37 L -15.79 249.25 L -15.79 249.25 L -18.73 350.77 L -20.79 432.12 L -34.21 1170.23 L -34.21 1170.23 L -50.79 2579.38
9
Appendix C
Question #Correct #Answered q1 21 25 q2 22 25 q3 17 25 q4 12 25 q5 21 25 q6 17 25 q7 11 25 q8 12 23 q9 13 25 q10 8 24 q11 23 25 q12 19 25 q13 15 25 q14 21 25 q15 20 25 q16 22 24 q17 15 24 q18 8 24 q19 13 25 q20 16 25
Pro correct (p) 0.84 0.88 0.68 0.48 0.84 0.68 0.44 0.52 0.52 0.33 0.92 0.76 0.6 0.84 0.8 0.92 0.63 0.33 0.52 0.64
Pro incorrect (q) 0.16 0.12 0.32 0.52 0.16 0.32 0.56 0.48 0.48 0.67 0.08 0.24 0.4 0.16 0.2 0.08 0.38 0.67 0.48 0.36
pq 0.13 0.11 0.22 0.25 0.13 0.22 0.25 0.25 0.25 0.22 0.07 0.18 0.24 0.13 0.16 0.08 0.23 0.22 0.25 0.23
10