Microsoft Word - Report On Descriptive Statistics And Item Analysis

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Microsoft Word - Report On Descriptive Statistics And Item Analysis as PDF for free.

More details

  • Words: 3,490
  • Pages: 23
Report on Descriptive Statistics and Item Analysis of Objective Test Items

Report on Descriptive Statistics and Item Analysis of Objective Test Items on Data Extracted From the Grade 12 Final English Second Language Exam 2008.

by Stephan Freysen

Prof. T Kuhn CIA 722 7 April 2008

ii

Acknowledgements

I would like to express extreme gratitude to the Gauteng Department of Education for the professional and cooperative manner in which they dealt. The data-gathering for this report would have been far much more gruelling had it not been for the selfless assistance that Mr. Y Zafir and Ms. L Bongani provided me with.

I would also like to thank Prof. Knoetze for tabulating the test data. Thanks to Prof. Kuhn for setting up a template with formulas. It has been a great help.

iii

Descriptive Abstract

This report is written so that judgement can be passed on the reliability of the multiple –choice test in the grade 12 English second language final exam.

iv

Table of contents

Acknowledgements

iii

Descriptive abstract

iv

List of Tables

vi

List of Figures

vii

Terminology list

viii

1. Introduction and purpose

1

2. Test analysis

2

2.1 Descriptive Test Analysis.

2

2.2 Graphic Representation

5

2.3 Reliability Coefficient.

7

3. Item analysis

8

3.1 Difficulty Index

10

3.2 Discrimination Index

11

4. Conclusion

12

Bibliography

13

Appendix A: Test Data

v

List of tables Table 2.1: Tabulated Test Scores Table 2.2: Measure of Central Tendency Table 2.3: Frequency Distribution Table 2.4: Test scores with pq values Table 3.1: Item Difficulty Indices Table 3.2: Item Discrimination Indices

vi

List of Figures

Figure 2.1: Histogram of Frequency Figure 2.2: Polygon of Frequency Figure 2.3: Ogive of Frequency Figure 4.1: Percentage of acceptability

vii

Terminology List Descriptive Statistics

The term used to refer to the mode, median and mean.

Difficulty Index

“Proportion of students who answered the item correctly.” Borich & Kubiszyn (2007: 205)

Discrimination Index

“Measure of the extent to which a test item discriminates or differentiates between students who do well on the overall test an those who do not do well on the overall test.” Borich & Kubiszyn (2007: 205)

Mean

The average of a set of numbers

Median

The score that splits the distribution in half. Borich & Kubiszyn (2007: 259)

Mode

The score that appears most frequently in a set of scores. Borich & Kubiszyn (2007: 264)

Quantitative Item Analysis

“A numerical method for analyzing test items employing student response alternatives or options.” Borich & Kubiszyn (2007: 205)

Reliability

Refers to the internal consistency of a test. Borich & Kubiszyn (2007: 318)

Standard Deviation

“The estimate of variability that accompanies the mean in describing a distribution.” Borich & Kubiszyn (2007: 272)

viii

1. Introduction As we have all experienced, objective test items are a very popular tool for testing knowledge. One of the most popular objective test item types is the multiplechoice format. According to Borich & Kubiszyn (2007: 116), the uniqueness of multiple-choice items is that these items allow you to measure knowledge at higher levels in Bloom’s taxonomy than other objective test items. This provides a problem, as assessors often do not consider any academic guidelines to set these questions. The result being that the items differ vastly from one another in difficulty indices and that they often present unrealistic discrimination indices. Borich & Kubiszyn (2007: 205) The purpose of this report is to analyse the multiple-choice test item data that was extracted from the final English second language grammar exam of 2008. This will be achieved through analysis of the measure of central tendency and variability of the data. The first part of the analysis will consist of the analysis of the question (test) as a whole. The second part of the analysis will consist of individual itemanalysis. The data includes the answers of twenty questions that were given by twenty five learners. This is a small sample group, but it should provide enough critique on the multiple-choice section of the exam to offer a detailed overview of the test’s reliability. The findings in the report will be used to determine whether the multiple-choice test items present in the exam was of adequate and fair difficulty.

1

2. Test Analysis. 2.1 Descriptive Test Analysis. In quantitative analysis, the first step is to tabulate the raw test scores. According to Borich & Kubiszyn (2007: 204), this type of analysis is the ideal for multiplechoice tests. Consider table 2.1 for the ascending numerical sorting of the test scores.

Table 2.1: Tabulated Test Scores Learner

Percentage of items correct

L19

15

L1

30

L17

35

L24

40

L7

45

L15

50

L22

50

L6

55

L21

55

L10

60

L18

65

L4

65

L8

65

L9

65

L23

70

L5

70

L12

75

L13

85

2

L14

85

L20

85

L25

85

L2

90

L3

90

L11

100

L16

100

As depicted in table 2.1, we can determine the lower scores, higher scores and the middle scores. We can see that considering the 40% cut-off rate, only three students failed this test, while eight students obtained a distinction The measure of central tendency for these test scores in table 2.1 can be seen in table 2.2

Table 2.2: Measure of Central Tendency. Mean 65.2

Median 65

Mode 65, 80

Standard Deviation 21.7

An equal distribution of 65% and 80% among these scores shows that it is bimodal. Most scores are above the mean. The next step is to group the scores in table 2.1 into intervals. This is done in order to determine a simple frequency distribution. In table 2.3, one can see the intervals, the lower and upper limits of the intervals, the frequency and the cumulative frequency.

3

Table 2.3: Frequency Distribution Mid Value

Interval

Frequency

15

Upper Limit 22

18.5

15-22

1

Cumulative Frequency 1

30

23

30

26.5

23-30

1

2

L17

35

31

38

34.5

31-38

1

3

L24

40

39

46

42.5

39-46

2

5

L7

45

47

54

50.5

47-54

2

7

L15

50

55

62

58.5

55-62

3

10

L22

50

63

70

66.5

63-70

6

16

L6

55

71

78

74.5

71-78

1

17

L21

55

79

86

82.5

79-86

4

21

L10

60

87

94

90.5

87-94

2

23

L18

65

95

102

98.5

95-102

2

25

L4

65

L8

65

L9

65

L23

70

L5

70

L12

75

L13

85

L14

85

L20

85

L25

85

L2

90

L3

90

L11

100

L16

100

Learner

Scores

Lower limit

L19

15

L1

4

2.2 Graphic Representation In Figure 2.1, we can see that one learner scored between 20% and 26%. Three learners scored between 34% % and 4 47%. As the cut-off for passing is 40%, this graph shows that between one ne and three of these students passed. Two T more learners ers scored between 41% and 47% and another two wo learners scored between 48% and 54%. Another three learners scored between 55% and 6 62%. %. Six learners scored between 63% and 70%. %. One learner scored between 71% % and 78%. 7 Four learners scored between 79% % and 8 86% and four learners scored above that. Between four and eight of the learners achieved distinctions. If we consider table 2.3 once again, we can see that although the graph is accurate, the detail of the distribution is still unclear, due to the large ga gap in scores implied by the intervals.

Frequency Histogram 7 6 5 4 3 2 1 0 20-26

27-33

34-40

41-47

48-54

Intervals 55-62 63-70

71-78

79-86

87-94

95-102 102

Figure 2.1: Histogram of Frequency

In figure 2.2, the average e of the int interval is depicted on the horizontal axis. We can see that the graph correlates with figure 2.1 and can thus trust that the data analysis done in figure 2.1 is reliable.

5

Frequency Polygon 7 6 5

f

4 3

Series1

2

Linear (Series1)

1 0 0

20

60

40

80

100

120

Middle Values

Figure 2.2: Polygon of Frequency

Figure 2.3 concentrates on the upper values of the intervals. This curve also correlates with figures 2.2 and 2.1.

Frequency Ogive 7 6

70, 6

5 4 f

86, 4

3

62, 3 Series1

2

46, 2 54, 2

1

94, 2 102, 2

22, 1 30, 1 38, 1

78, 1

0 0

20

40

60

80

Upper Values

Figure 2.3: Ogive of Frequency

6

100

120

All three graphs are leptokurtic and negatively skewed. This implies that that the sample group did truly well in the multiple-choice test. According to Borich & Kubiszyn (2007: 257), there can be multiple reasons for this, for example, that the sample group might have been of high intelligence, that the test may have been too easy or that the time-constraints for the test was too lenient. 2.3 Reliability Coefficient. “Another way of estimating the internal consistency of a test is through one of the Kuder-Richardson methods.” Borich & Kubiszyn (2007: 321) For the purpose of this analysis, we will use the KR20 method, as it is the more accurate way of determining the reliability of a test. Borich & Kubiszyn (2007: 322) The formula for this test is:   

 1  ∑ 

 1 

From the data found in table 2.4, we can determine the reliability coefficient. 20 2.830336 KR 20  



20‐1 240668 KR 20 1.05‐0.0000118

KR 20 1.05‐0.0000118

KR 20  ‐0.00001239

The answer is a negative value and this can be interpreted that the test is not reliable. Since the KR20 is equal to a very small negative amount, it is safe to assume that the reliability is not far out, but the test is still too easy.

Based on the diminutive magnitude of the answer to the KR20, the KR21 method was used as well to verify the reliability of the test.

7

1  !  !

1 " 20 1  65.220  65.2  

20  1 434 20 1  65.245.2  

19 188356 64.245.2  1.05 

188356

    



  0.015

Since the outcome of this formula is a positive value, it complicates the decision of whether the test is acceptable or not. The reason for this contradiction may lie therein that the KR20 is more accurate than the KR21 Borich & Kubiszyn (2007: 322) and since both formulas provide small answers, it is probably safe to assume that the test lies on the border of reliability. Since this is the case we will need to analyse the difficulty and discrimination indices of each item individually.

8

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q11

Q12

Q13

Q14

Q15

Q16

Q17

Q18

Q19

Q20

Total

%

L1

1

1

0

0

0

0

0

0

0

0

1

0

0

0

1

1

1

0

0

0

6

30

L2

1

1

1

1

1

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

18

90

L3

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

0

1

18

90

L4

1

1

1

0

1

1

0

1

1

1

1

1

0

1

0

1

1

0

0

1

14

70

L5

1

1

1

0

1

1

0

1

1

0

1

1

1

1

1

1

0

0

1

1

15

75

L6

1

0

1

1

0

1

0

0

1

0

1

1

0

1

1

1

0

0

0

1

11

55

L7

0

1

0

0

1

1

0

0

0

0

1

1

1

1

0

1

0

1

0

1

10

50

L8

1

1

1

0

1

1

0

0

0

0

1

1

1

1

1

1

1

0

1

0

13

65

L9

1

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

0

0

0

13

65

L10

1

1

0

0

1

1

1

0

0

0

1

0

0

1

0

1

1

1

1

1

12

60

L11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

20

100

L12

1

1

1

1

1

1

1

0

0

0

1

1

0

1

1

1

1

0

1

0

14

70

L13

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

0

0

1

1

17

85

L14

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

0

0

1

1

17

85

L15

1

1

1

1

1

0

0

1

0

0

1

1

0

0

1

0

0

0

0

0

9

45

L16

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

20

100

L17

0

1

0

0

1

0

1

0

1

0

1

0

1

1

1

0

0

0

0

0

8

40

L18

1

1

0

1

1

0

1

0

0

0

1

1

0

1

1

1

1

0

1

1

13

65

L19

0

0

0

1

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

3

15

L20

1

1

1

1

1

1

1

1

1

0

0

0

1

1

1

1

1

1

1

1

17

85

L21

1

0

1

1

0

1

0

0

1

0

1

1

0

1

1

1

0

0

0

1

11

55

L22

0

1

0

0

1

1

0

0

0

0

1

1

1

1

0

1

0

1

0

1

10

50

L23

1

1

1

0

1

1

0

0

0

0

1

1

1

1

1

1

1

0

1

0

13

65

L24

1

1

0

0

0

0

0

0

0

0

1

0

0

0

1

1

1

0

1

0

7

35

L25

1

1

1

1

1

0

0

1

1

1

1

1

1

1

1

1

1

1

0

1

17

85

p q pq

0.8 4 0.1

0.88

0.68

0.48

0.84

0.68

0.44

0.76

0.6

0.84

0.8

0.32

0.56

0.48

0.08

0.24

0.4

0.16

0.2

0.62 5 0.37

0.3333 33 0.6666

0.64

0.16

0.916 667 0.083

0.52

0.52

0.3333 33 0.6666

0.92

0.32

0.5217 39 0.4782

0.52

0.12

0.48

0.36

0.24 64

61 0.2495 27

0.24 96

67 0.2222 22

333 0.076 389

5 0.23 4375

67 0.2222 22

0.24 96

0.2304

6 0.1 34

0.10 56

0.21 76

0.24 96

0.13 44

0.21 76

0.07 36

0.18 24

0.24

0.13 44

0.16

3.830 336 Var Part 1 Part 2

Table 2.4: Test scores with p and 9

490.58 33 1.0526 32 0.9939 63

Level L U U U U U U U U U U U U U L U L U L U U L U L U

3. Item Analysis 3.1 Difficulty Index When considering table 3.1, we find that the difficulty indices demonstrate that seven of the twenty questions were unacceptable because they were too easy. These include Questions 1, 2, 5, 11, 14, 15 and16. Questions 6 and 12 were a bit easy and the rest of the questions were of acceptable difficulty.

Table 3.1: Item Difficulty Indices Question

Difficulty

Rating

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20

.84 .88 .68 .48 .84 .68 .44 .52 .52 .33 .92 .76 .60 .84 .80 .91 .62 .33 .52 .64

Unacceptable (too easy) Unacceptable (too easy) Acceptable Acceptable Unacceptable (too easy) Easy Acceptable Acceptable Acceptable Acceptable Unacceptable (too easy) Easy Acceptable Unacceptable (too easy) Unacceptable (too easy) Unacceptable (too easy) Acceptable Acceptable Acceptable Acceptable

10

3.2 Discrimination Index In table 3.2, we can see that there are six items with a low discrimination index. These items will have to be revised. It is also rather interesting to note the correlation between the unacceptable difficulty indices and the unacceptable discrimination indices as well as the correlation between the acceptable difficulty indices and the acceptable discrimination indices.

Table 3.2: Item Discrimination Indices

Question

Discrimination

Rating

Q1

0.16

Q2

0.12

Q3

0.32

Q4

0.52

Q5

0.16

Q6

0.32

Q7

0.56

Q8

0.48

Q9

0.48

Q10

0.67

Q11

0.08

Q12

0.24

Q13

0.40

Q14

0.16

Q15

0.20

Q16

0.08

Q17

0.38

Q18

0.67

Q19

0.48

Q20

0.36

Negative Negative Positive Positive Negative Positive Positive Positive Positive Positive Negative Positive Positive Negative Negative Positive Positive Positive Positive Positive

11

4.

Conclusion

Reliability

24%

Acceptable 76%

Figure 4.1: Percentage of acceptability

In this report on the 2008 Grade 12 English second language exam, the assumption can be made that the multiple multiple-choice choice test was rather easy. The thorough analysis of the freque frequency, ncy, standard deviation, discrimination indices, difficulty indices and the reliab reliability coefficient clearly proved this assumption. Items 1,2,5,11,14 and 15 will need revision so that this test may be graded as reliable. Consider that 76% of the test test, as seen in figure 4.1, is reliable and the other 24% of the test is too easy. The questions mentioned were all rather easy and therefore not really applicable for a final exam.

12

Bibliography Borich, T. &. (2007). Educational Testing and Measurement: Classroom Application and Practice. NJ: John Wiley & Sons. Inc. Knoetze, J. (2007). Test Data. Retrieved April 1, 2008, from http://www.jknoetze.co.za/CIA_722/testdata.xls

13

Appendix A: Test Data Key St No

C Q1

B Q2

D Q3

D Q4

B Q5

C Q6

D Q7

A Q8

C Q9

B Q10

A Q11

C Q12

B Q13

D Q14

A Q15

A Q16

C Q17

D Q18

B Q19

C Q20

1 2 3 4 5 6 7 8 9 10 11 12 13

C C C C C C B C C C C C C

B B B B B A B B B B B B B

B D D D D D A D D B D D D

A D D B C D B B A A D D A

C B B B B C B B B B B B B

D D C C C C C C C C C C C

A A D B B A B B D D D D D

A A A A D B D D C A D A

D C C C C C D B B D C D C

D B B B D D D C D C B A B

A A A A A A A A A A A A A

D C C C C C C C C B C C C

A B B A B A B B B A B A B

A D D D D D D D D D D D D

A A A C A A C A A D A A A

A A A A A A A A A A A A A

C C C C A A A C C C C C A

B D B B B B D A B D D B B

D B D C B D D B D B B B B

B C C C C C C A A C C D C

14 15 16

C C C

B B B

D D D

A D D

B B B

C B C

D A D

A A A

C B C

B D B

A A A

C C C

B D B

D A D

A A A

A C A

A B C

B D

B D B

C D C

17 18 19 20 21 22 23

B C D C C B C

B B C B A B B

C B A D D A D

C D D D D B B

B B B B C B B

A A A C C C C

D D B D A B B

D D A A D B D

C D D C C D B

D C A D D C

A A C C A A A

D C D D C C C

B A A B A B B

D D A D D D D

A A D A A C A

A B A A A A

C B C A A C

C B B D B D A

A B A B D D B

D C B C C C A

24 25

C C

B B

B D

A D

C B

D D

A A

A

D C

D B

A A

D C

A B

A D

A A

A A

C C

B D

D B

B C

Related Documents