Construction And Development Of A Test Instrument

  • Uploaded by: Carlo Magno
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Construction And Development Of A Test Instrument as PDF for free.

More details

  • Words: 4,241
  • Pages: 12
Construction and Development of a Test Instrument Carlo Magno Ateneo De Manila University Abstract This study investigated the psychometric properties and item analysis of a one-unit test in geography for grade three students. The skills and contents of the test were based on the contents covered for the first quarter that is indicated in the syllabus. A table of specifications was constructed to frame the items into three cognitive skills that includes knowledge, comprehension, and application. The test has a total of 40 items on 10 different test types. The items were reviewed by a social studies teacher and academic coordinator. The split-half reliability was used and a correlation of .3 was obtained. Each test type was correlated and resulted from low and high coefficients. The item analysis showed that most of the items turned out to be easy and most are good items. Introduction The purpose of this study is to construct and analyze the items of a one-unit geography test for grade three students. The test basically measures grade three student’s achievement on Philippine Geography for the first quarter that served as a quarterly test. The test when standardized through validation and reliability would be used for future achievement test in Philippine Geography. There is a need to construct and standardize a particular achievement test in Philippine Geography since there is none yet available locally. The test is in Filipino language because of the nature of the subject. The subject cover topics on (1) Kapuluan ng Pilipinas; (2) Malalaki at Maliliit na Pulo ng Bansa; (3) Mapa at Uri ng Mapa; (4) Mga Direksyon; (5) Anyong Lupa at Anyong Tubig; (6) Simbolong Ginagamit sa Mapa; (7) Panahon at Klima; (8) Mga Salik na may Kinalaman sa Klima; (9) Mga Pangunahing Hanapbuhay sa Bansa; (10) Pag-aangkop sa Kapaligiran. The topics were based upon the lessons provided by the Elementary Learning Competence from the Department of Education. The test aims for the students to: (1) Identify the important concepts and definitions; (2) comprehend and explain the reasons for given situations and phenomena; (3) Use and analyze different kinds of maps in identifying important symbols and familiarity of places. Method Search for Skills and Content Domain The skills and contents of the test were identified based on the topics covered for grade three students in the first quarter. The test is intended to be administered for the first quarter exam. The skills intended for the first quarter’s topic include identifying concepts and terms, comprehending explanations, applying principles on situations, using and analyzing maps,

synthesizing different explanations for a particular event, and evaluating the truthfulness and validity of reasons and statements through inference. In constructing the test, a table of specifications was first constructed to plan out the distribution of items for each topic and the objectives to be gained by the students. Table 1. Table of Specification for a unit in Philippine Geography for Grade 3 Nilalaman Natutukoy ang Nauunawaan ang Nagagamit at mahahalagang mga dahilan sa nasusuri ang konsepto at mahahalagang mapa sa pagtukoy kahulugan kapaliwangan sa ng mga bawat sitwasyon mahahalagang pananda Kapuluang 4 Pilipinas Malalaki at 4 maliliit na pulo ng bansa Mapa at Uri ng 4 mapa Mga direksyon 6 Anying lupa at 5 Anyong Tubig Simbolong 4 ginagamit sa mapa Panahon at Klima 2 3 Mga salik na may 2 kinalaman sa lima Mga pangunahing 3 hanapbuhay ng bansa Pag-aangkop sa 3 kapaligiran Total Number of 11 16 13 Items Percentage 27.5% 40% 32.5%

Total Number of Items

4 4 4 6 5 4 5 2 3 3 40 100

Table of Specifications The Table of Specification contains 10 topics taken which is a unit about Philippine Geography. The 27.5% of the items were placed for the knowledge level, 40% were placed for comprehension, and 32.5% were placed on the application level. Most of the items were concentrated on the comprehension since the main purpose is for the students to understand and comprehend the unit on Philippine Geography and it is the foundation knowledge for the entire lesson for the school year. Having mastered this base knowledge will help students explain and give reasons for the next lessons that will be taken. Also, most of the items were distributed on

the application level since the students need to learn practically how to use maps, and how could they benefit from using maps and figures of the unit. Few items were placed on the knowledge part since there is a little need for the students to recall and memorize concepts and terms. The main highlight of this unit is to gain the ability to explain geographical principles on Philippine geography and its relatedness to our culture. Item Writing There were 40 items constructed based on the Table of Specification (see Table 1). A 40item test is just enough for grade three students since it is not too much or few for their capacity. Also in determining the amount of items to place on the test, the attention span and time frame for testing is considered. Basically in the quarterly test, a particular test on a subject is given a time limit of one hour. The items were based more from what the students gained from the discussion in the classroom, reflection on the topic, work exercises, group works, activities in school, and from the book. The items were divided into 10 parts in the test. Test I contains four items in a True or False type. Test II contains 5 items in a matching type of test. Test III contains 2 items in a multiple choice type and the stem item is bases on a figure presented. Test IV contains 4 items within 2 situations. Test V contains 4 items in a multiple choice type, a physical map as a basis for answering. Test VI another multiple choice type and concentrates on the use of different types of map. Test VII a short answer type of test in which the students will supply what direction is asked from the question base on a map presented containing 6 items. Test VIII a 5-item interpretive exercise type of test in which a situation is given and for each situation inferences were listed and the task of the students is to choose the best inference applicable for the given situation. Test IX a three-item multiple choice type in which the students will answer depending on a figure of a Philippine map and whether condition id given. Test X a three-point essay question evaluated according to the (a) correctness of answer (1.5pts) ; (b) Explanation (1 pt); and, (c) followed instruction (o.5 pt). There were two raters who evaluated the answer for the essay type of test. Content Validation The test was content validated and reviewed by a teacher in Social Studies from Ateneo de Davao. The suggestions were considered and the test was revised accordingly. Also, before arriving with final draft of test for administration, it was checked by the Academic coordinator of the School where the test will be administered whether the items are appropriate for the level of grade three students and some typographical errors. In the process of content validation, the topics covered and the table of specification was provided in order to determine whether the items were generally covered for the topics studied. Test Administration

Respondents. There were 88 grade 3 students in three sections who took the test for the purpose of a Quarter Examination. Out of the 88 students, the top 40 students were the ones that were included in the sample. There are 11 (27%) respondents each for the upper and lower group which scores is subjected for item analysis for difficulty and discrimination. Procedure. The teacher for grade 3 Sibika at Kultura directly instructed the two other teachers who will administer the test for the two other sections. It was kept into consideration the constancy and the other factors that would affect the students’ performance on the test. The test was administered simultaneously for the three classes in the morning as the first test to be taken for that day. The students took the test for one hour, some students were able to finish the test ahead of time, and they were just advised to review their work. When the bell rang the teacher instructed the students to pass their paper forward. All the test papers were gathered and were checked. After a week the students were informed about their results and the top 40 students that were included in the sample for study was informed about the teachers’ concern for their test. A letter of request for the parents was sent to inform them about the purpose of the research and the students’ score, the parents replied positively. Data-Analysis. The scores were tabulated and encoded in so that the computation of the results will be easy. The split-half method for obtaining the internal consistency among the scores was employed. The odd and the even items were separated and were correlated in using the Pearson’s r moment correlation coefficient. The upper and lower groups were chosen according to 27% of the lowest and the highest among the 40 respondents. The item analysis was employed by computing for each item’s difficulty and the item discrimination. The remark for each item was then given according to the standards of difficulty and its discrimination, whether a good item or not. The Coefficient of Concordance was used in order to inter-rater reliability of the essay type of test. There were two judges who evaluated and used criteria to score the essay part of the test. Result and Discussion Reliability The test’s reliability was generated through the split-half method by correlating the odd numbered and even numbered items. The arrived internal consistency is 0.3, which is low but definite correlation among the items. The low correlation between the odd and even numbered items can be accounted with the different topic contents within the 40-item test. It should have been more appropriate to construct a large pool of items for the 10 content topics or factors that the test have, but 40 items is the usual standard of items of the school for the quarterly test. The test has been administered for the purpose of quarterly test because the usability of the test is considered. With regards with this type of measure it can only be accounted with the reliability of half of the test. This explains the low value of the correlation coefficient. The split-half coefficient is then transformed into a spearman brown coefficient since the correlation is only for the half of the test. The resulting Spearman-Brown coefficient is 0.46 which means that the items have a moderate relationship.

Also, it is a rule of thumb that there should at least be 30 pairs of scores to be correlated, but in this case there were only 18 scores correlated. The last item was not included since it has no partner item to be correlated with because the other items were essay type in which subjected to a different analysis. The low coefficient of internal consistency can also be accounted with the various types of tests used, thus can be accounted with the variation and difference s in the performance of the respondents. In other words, the respondents may respond and perform differently for each type of test. The nature of the test cannot be measured on its general homogeneity since the test contains several topics and several types of format responses. Thus, respondents perform differently for different types of test. The test has 10 types measuring different skills such as identifying the important concepts and definitions, comprehension and explanations on the reasons for given situations and phenomena, and using and analyzing different kinds of maps in identifying important symbols and familiarity of places. Although the dilemma is that the content domains included in the test is part of a general topic on Philippine geography. To test the internal consistency among the 9 different contents, correlation matrix was done. Table2. Intercorrelation among the Nine contents of the Test.

I II III IV V VI VII VIII IX

I --0.13 0.98* 0.18 -0.21 0.19 -0.73 0.07 0.85*

II

III

IV

V

VI

VII

VIII

IX

-1 -0.81* -0.42 0.58* 0.28 -0.19 -0.58*

--0.48* 0.47* 0.47* 0.4I* -0.47* 0.48*

--0.19 0.6 -0.56* 0.96* 0.15

--0.65* 0.73* 0.08 0.97*

--0.24 -0.8 -0.52*

--0.25 -0.52*

-0.28

--

There is a high relationship between test I and test IX. The higher the scores on identification of concepts the higher the scores on comprehension of weather map. Also, a high relationship existed between test V and test IX. The higher the scores on the interpretation of a physical map the higher the scores on interpretation of the weather map. There is also a high relationship between test IV and test VIII. The higher the scores on the inference about the Philippine islands, the higher the scores on the comprehension on weather. Generally, the results on inter-correlation among the contents showed pretty crude results due to the few items and the items for each type of the test were not equal. The pairing in the computation was done base on the minimum number of items for each test type. Item Difficulty and Index Discrimination

To evaluate the quality of each type of item in the test, item analysis was done by determining each items difficulty and index discrimination. The proportion of examinees getting each of items correctly was evaluated according to the scale below. Difficulty Index Remark .76 or higher Easy Item .25 to .75 Average Item .24 or lower Difficult Item Source: Lamberte, B. (1998). Determining the Scientific Usefulness of Classroom Achievement Test. Cutting Edge Seminar. De La Salle University. Table 3 indicates each item’s difficulty value and discrimination index value. The difficulty index shows a pattern that 67.6% of the items are easy and 32.43% of the test is on the average scale. Considering that the test was constructed or grade three students the teacher was putting it down on the level of the student’s capacity and ability. But it may also mean that the students gained mastery of the subject matter that most of them are able to answer it correctly. It should be taken note that the easiness and difficulty of the items are dictated on the proportion of the students who answered the item correctly. In this case, most of the respondents got the answer that is why most of the items turned out to be easy. It can be accounted that in general, the test was fairly easy since most of the items turned out 76% and above. Also, Table 3 indicated the index discrimination of each item. There were 27% items that are considered poor. These items were rejected since most scores is in the high range of the low group and some scores of the low group are near to the scores of the high group who have answered it correctly. Considering the poor items such as item 2,4, 9, 13, 15, 30, 31, 32, 33, and 34 the pattern is indicative. There are very few marginal items that are subjected for improvement. There are only 8% (3 items) that are remarked as marginal since the scores of the low group and the high groups are almost the same. This means that both the high and the low group can answer this item fairly. 21.6% (8 items) of the items are reasonably good items since there is enough interval between the high and low groups. Also there are few items remarked as good items and enough to be considered as very good items. 16.21% of the items are good items and 24.3% are very good items. There is a pattern that there is a wide distance of scores between the high group and the low group. Interrater Reliability The coefficient of concordance was used to determine the degree of agreement between the two raters who judged the essay type in the test. The essay type basically measures the student’s knowledge on the adaptation of farmers in farming. The criteria used for rating the essay is that: (a) at least 2 answers are correct (1.5pts); (b) the answer was explained (1 pt); (c) and the instruction on answering was followed (0.5 pt). The results indicate that here is low agreement between the two raters. A high value of W which is 0.74 was computed indicating close concordance between the raters. This means that the two raters showed a small variation in

rating the answers in the essay. The small error of variance can be accounted with the difference of the disposition of the two raters. The first rater was the actual teacher in the subject but the second rater was also an Araling Panlipunan teacher but teaching in the higher level. There was a difference on how they view the answer even though they talked about the rating procedure at the start. Conclusion A low internal consistency was generated due to the different subject content in the test and each test measures different skills. These two factors affected the internal consistency of the test. It is indeed difficult to make it entirely uniform since the subject contents are required as minimum learning competence by the Department of education. Also the listed subject contents are the planned focus for the first quarter of the schools subject matter budgeting. A multiple regression analysis was performed to observe the relationship among the test types. It was found that the higher the scores on the interpretation of a physical map the higher the scores on the interpretation of the weather map and also the higher the scores on the inference about the Philippine Islands, the higher the scores on the comprehension on topics about weather. A high correlation coefficient was found between these types. Although the results may not be too accurate since the basis for the matrix comparison does not have equal number of items and the minimum number of items were the only ones subjected in the analysis. It is recommended that equal number of items for each test should be made to account a more accurate result in the regression analysis. There is also a low agreement between the two raters for the essay type since they have different perceptions on giving points for the answers. The item difficulty showed the most of the items are easy since the students have gained mastery of the subject matter. The index discrimination showed that the items are distributed according to its power. There are almost equal number of items that are poor (27%), marginal item (8%), reasonably good (22%), good (16%) and very good (24%). Table 3. Item Discrimination and Index Discrimination. Item Total High Low PH PL Difficul Remark No. Group Group ty Index 1

32

11

7

1

0.636

0.818

Easy Item

2

26

7

6

0.636

0.545

0.591

3

34

11

7

1

0.636

0.818

Average Item Easy Item

4

38

11

10

1

0.909

0.955

Easy Item

5

36

11

8

1

0.727

0.864

Easy Item

Item Remar Discrimi k nation 0.364 Good item 0.091 Poor item 0.364 Good item 0.909 Poor item 0.273 Reason ably Good

6

34

11

5

1

0.455

0.727

Average Item

0.545

7

33

10

8

0.909

0.727

0.818

Easy Item

0.182

8

34

11

8

1

0.909

0.864

Easy Item

0.273

9

39

11

10

1

0.634

0.955

Easy Item

0.091

10

24

9

4

0.818

0.456

0.591

Average Item

0.455

11

23

9

5

0.818

0.273

0.636

0.364

12

22

10

3

0.818

0.818

0.545

Average Item Average Item

13

36

11

9

0.909

0.727

0.864

Easy Item

0.091

14

34

11

8

1

1

0.864

Easy Item

0.273

15

39

10

11

1

1

1

Easy Item

0

16

28

10

5

0.909

0.455

0.682

Average Item

0.455

17

28

11

5

0.909

0.455

0.682

Average Item

0.455

18

34

10

7

1

0.636

0.818

Easy Item

0.364

19

24

11

5

0.909

0.455

0.682

Average Item

0.455

20

37

11

8

1

0.727

0.864

Easy Item

0.273

21

29

11

5

1

0.455

0.727

Average

0.545

0.545

item Very Good item Margin al item Reason ably Good item Poor item Very Good item Good item Very Good item Poor item Margin al item Poor item Very Good item Very Good item Good item Very Good item Reason ably Good item Very

Item 22

26

11

5

1

0.455

0.727

Average Item

0.545

23

33

11

7

1

0.636

0.818

Easy Item

0.364

24

37

11

8

1

0.727

0.864

Easy Item

0.273

25

37

7

8

1

0.818

0.864

Easy Item

0.273

26

24

11

9

1

0.364

0.909

Easy Item

0.182

27

37

11

4

0.636

0.818

0.5

Average Item

0.273

28

35

11

9

1

0.636

0.909

Easy Item

0.182

29

39

11

7

1

0.909

0.818

Easy Item

0.364

30

40

11

10

1

1

0.955

Easy Item

0.091

31

40

11

11

1

1

1

Easy Item

0

32

40

11

11

1

1

1

Easy Item

0

33

40

11

11

1

1

1

Easy Item

0

34

40

11

11

1

1

1

Easy Item

0

35

27

11

3

1

0.273

0.636

Easy Item

0.727

36

24

9

4

0.818

0.364

0.591

Average Item

0.455

37

36

11

7

1

0.636

0.818

Easy Item

0.364

Good item Very Good item Good item Reason ably Good item Reason ably Good item Margin al item Reason ably Good item Margin al item Good item Poor item Poor item Poor item Poor item Poor item Margin al item Very Good item Good item

Table 4. Coefficient of Concordance cases R1 Ranks R2 1 3 12 3 2 3 12 3 3 3 12 3 4 3 12 3 5 3 12 3 6 3 12 3 7 3 12 3 8 3 12 3 9 3 12 3 10 3 12 3 11 3 12 3 12 3 12 3 13 3 12 3 14 3 12 3 15 3 12 3 16 3 12 3 17 3 12 3 18 3 12 3 19 3 12 3 20 3 12 3 21 3 12 2.5 22 3 12 2.5 23 3 12 2.5 24 3 12 2.5 25 2.5 26 3 26 2.5 26 2.5 27 2.5 26 3 28 2 29 1.5 29 2 29 1.5 30 2 29 1 31 1.5 33.5 1 32 1.5 33.5 1 33 1.5 33.5 1 34 1.5 33.5 2 35 1.5 33.5 2 36 1.5 33.5 1.5 37 1 37 0 38 0.5 38 0 39 0 39.5 0.5 40 0 39.5 0

Ranks 11.5 11.5 11.5 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 24 24 24 24 11 24 11 28 28 33.5 33.5 33.5 33.5 30.5 30.5 28 38 38 36 38

sum 23.5 23.5 23.5 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 36 36 36 36 37 50 37 57 57 62.5 67 67 67 64 64 61.5 75 76 75.5 77.5 1600.5 40.025

D -16.5 -16.5 -16.5 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -17 -4 -4 -4 -4 -3 10 -3 17 17 22.5 27 27 27 24 24 21.5 35 36 35.5 37.5

D2 272.25 272.25 272.25 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 16 16 16 16 9 100 9 289 289 506.25 729 729 729 576 576 462.25 1225 1296 1260.25 1406.25 15984.8 .74977

APPENDIX C Table 4. Coefficient of Concordance cases R1 Ranks R2 1 3 10.5 3 2 3 10.5 3

Ranks 11.5 11.5

D -1 -1

D2 1 1

3

3

10.5

3

11.5

-1

1

4

3

10.5

3

11.5

-1

1

5

3

10.5

3

11.5

-1

1

6

3

10.5

3

11.5

-1

1

7

3

10.5

3

11.5

-1

1

8

3

10.5

3

11.5

-1

1

9

3

10.5

3

11.5

-1

1

10

3

10.5

3

11.5

-1

1

11

3

10.5

3

11.5

-1

1

12

3

10.5

3

11.5

-1

1

13

3

10.5

3

11.5

-1

1

14

3

10.5

3

11.5

-1

1

15

3

10.5

3

11.5

-1

1

16

3

10.5

3

11.5

-1

1

17

3

10.5

3

11.5

-1

1

18

3

10.5

3

11.5

-1

1

19

3

10.5

3

11.5

-1

1

20

3

10.5

3

11.5

-1

1

21

3

10.5

2.5

25

-14.5

210.25

22

3

10.5

2.5

25

-14.5

210.25

23

3

10.5

2.5

25

-14.5

210.25

24

3

10.5

2.5

25

-14.5

210.25

25 26

2.5 2.5

22 22

3 2.5

11.5 25

10.5 3

110.25 9

27

2.5

22

3

11.5

10.5

110.25

28 29 30 31

2 2 2 1.5

25 25 25 29.5

1.5 1.5 1 1

31 31 34.5 34.5

-6 -6 -9.5 -5

36 36 90.25 25

32

1.5

29.5

1

34.5

-5

25

33

1.5

29.5

1

34.5

-5

25

34

1.5

29.5

2

28.5

1

1

35

1.5

29.5

2

28.5

1

1

36

1.5

29.5

1.5

31

-1.5

2.25

37 38 39 40

1 0.5 0 0

33 34 35.5 35.5

0 0 0.5 0

39 39 37 39

-6 -5 -1.5 -1.5

36 25 2.25 12.25

Related Documents


More Documents from ""