Project 1

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Project 1 as PDF for free.

More details

  • Words: 3,169
  • Pages: 15
Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

English Proficiency Test for M.2 Students at Bangna Demonstration School

The test aims to measure language proficiency of students. This test was administered by Anuchit Nasomboon, an English teacher teaching at Bangna Demonstration School. This private school is located in Bangna area. Thus most students are of rich family. Surprisingly, quite a few students turn to go to this school. There are at least two classes each for primary level. However, there is only one class each for secondary level. Some of the students have foreign family: their parents come to work in Thailand. This school is attempting to create its own teaching curriculum for every subject. Mr. Nasomboon then tried to measure how much his students know before beginning the lesson. The participants were twenty-six Mathayom Two students. Time for taking this test was fifty minutes. Test objective This test was given to students to measure background knowledge of students. The test score analysis will be used to adjust curriculum for English for Mathayom Two at Bangna Demonstration School. At the same time, score of this test will be analyzed to see how and where improvement is needed form each item. Subjects

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

There were twenty-six Mathayom Two students attending the test. All of the participants are in the same class. Among the participants there were one student whose family immigrated from the United States, and the other one from Japan. The test then sounded quite unequal for other Thai students. However, the test score came out quite unexpectedly dissatisfied. Students Narit Porntip Maturot Wannisa Pravee Sorratat Piyada Warunya Sutthida Wareewan Manecha Utomphorn Wiliya Mean = 14.6154

Total 29 25 19 19 19 19 17 17 16 16 16 16 15

Students Total Nattaporn 13 Chalrmachai 13 Staporn 13 Prakorn 13 Pawetre 12 Phornphan 11 Witawat 11 Tanasan 11 Kanok-karn 10 Julawat 9 Teerapat 9 Jinnaput 7 Chatchai 5 Standard Deviation = 14.6511

Table 1 Test Score From the test score, I then make an analysis of the whole score into individual item score per item number. Table 2 shows how many points each student get from individual items. At the bottom of the table are mean score and standard deviation of total score. From the raw score, mean score, and standard deviation, I then turn to analyze item facility of the test. Item Facility can be measured by adding up the number of students who correctly answered a particular item, and divide that sum by the total number of students who took the test (James Brown, 1996: p. 65). The formula can be written like this:

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

IF = Ncorrect Ntotal where Ncorrect = number of students answering correctly Ntotal = number of students taking the test This formula can range from 0.00 to 1.00 for different items. Items left blank are assumed incorrect answers. The IF score indicates difficulty or easiness of each test item. The IF value then gives another useful score for interpretation of the item. The Item Discrimination score is the degree to which an item separates the students who performed well from those who performed poorly. The ID score helps the teachers to contrast the performance of the upper-group students on the test with that of the lower-group students. From both table 1 and table 2 you can see the discrimination into groups of the students taking the test. The ID score can be calculated by this formula: ID = IFupper – IFlower where ID means item discrimination for an individual item IFupper = item facility for the upper group of the whole test IFlower = item facility for the lower group on the whole test Below are table 3 indicating IF score and ID score of individual items. IF score and ID score Part I Item Statistics IF total IF upper IF lower ID

1 0.69 0.85 0.54 0.31

2 0.65 0.85 0.46 0.38

3 0.65 0.85 0.46 0.38

4 0.58 0.92 0.23 0.69

Item Number 5 6 0.62 0.58 0.77 0.85 0.46 0.31 0.31 0.54

7 0.50 0.62 0.38 0.23

8 0.46 0.62 0.31 0.31

9 0.62 0.62 0.62 0.00

10 0.62 0.77 0.46 0.31

Item Statistics 1 IF total 0.35

2 0.35

3 0.27

4 0.77

Item Number 5 6 7 0.38 0.50 0.54

8 0.38

9 0.38

10 0.38

Part II

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development IF upper IF lower ID

0.46 0.23 0.23

0.46 0.23 0.23

0.46 0.08 0.38

0.77 0.77 0.00

Item Statistics IF total IF upper IF lower ID

11 0.69 0.77 0.62 0.15

12 0.38 0.38 0.38 0.00

13 0.58 0.85 0.31 0.54

14 0.31 0.38 0.23 0.15

0.54 0.23 0.31

0.62 0.38 0.23

Item Number 15 16 0.50 0.46 0.69 0.62 0.31 0.31 0.38 0.31

0.69 0.38 0.31

0.46 0.31 0.15

0.46 0.31 0.15

0.62 0.15 0.46

17 0.27 0.38 0.15 0.23

18 0.35 0.31 0.38 -0.08

19 0.31 0.38 0.23 0.15

20 0.38 0.46 0.31 0.15

Table 3 IF score and ID score of the whole test Since this proficiency test is one of the Norm-referenced test (NRT) type, ideal item should have IF value of 0.50 as average, and the highest possible ID. It is considered acceptable for IF value between 0.30 and 0.70. Ebel (1979, p. 267) has suggested the following guidelines for making decisions based on ID: 0.40 and up 0.30 to 0.39

Very good items Reasonably good but possibly subject to improvement

0.20 to 0.29

Marginal items, usually needing and being subject to improvement

Below 0.19

Poor items, to be rejected or improved by revision

Considering IF and ID of the test, they bring to analyze distractor efficiency. The goal of distractor efficiency analysis is to examine the degree to which the distractors are attracting students who do not know the correct answer. And also, it investigates the degree to which the distractors are functioning efficiently. As mentioned above, IF value helps to see which items need improvement or elimination. For example, one item might be considered too easy when ID value is very low. But an easy item is sometimes good to see that the students can get from the simplest

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

item to the harder one. The percentages of students who chose each option are analyzed. Below shows table 4 Distractors Efficiency of the test. Distractor Efficiency Part I Item Number 1

IF

ID

0.69

0.31

2

0.65

0.38

3

0.65

0.38

4

0.58

0.69

5

0.62

0.31

6

0.58

0.54

7

0.5

0.23

8

0.46

0.31

9

0.62

0

10

0.62

0.31

Group High Low High Low High Low High Low High Low High Low High Low High Low High Low High Low

Options + ing + ed 0.85* 0.15 0.54* 0.46 0.85* 0.15 0.46* 0.54 0.15 0.85* 0.54 0.46* 0.92* 0.08 0.23* 0.77 0.23 0.77* 0.54 0.46* 0.15 0.85* 0.69 0.31* 0.31 0.69* 0.62 0.38* 0.62* 0.38 0.31* 0.69 0.31 0.69* 0.38 0.62* 0.85* 0.15 0.46* 0.54

Notes Reasonable Reasonable Reasonable Good Reasonable Good Improvement Needed Reasonable Rejected Reasonable

*correct option

Part II Item IF Number 1 0.35

ID 0.23

2

0.35

0.23

3

0.27

0.38

4

0.77

0

5

0.38

0.31

6

0.5

0.24

Group High Low High Low High Low High Low High Low High Low

Options A. B. C. 0.00 0.38 0.08 0.31 0.31 0.15 0.23 0.23 0.46* 0.31 0.23 0.23* 0.46* 0.23 0.31 0.08* 0.15 0.54 0.77* 0.08 0.08 0.77* 0.08 0.00 0.08 0.00 0.54* 0.08 0.23 0.23* 0.31 0.62* 0.00 0.31 0.38* 0.23

Notes D. 0.46* 0.23* 0.08 0.23 0.00 0.23 0.08 0.08 0.15 0.46 0.08 0.08

Improvement Needed Improvement Needed Reasonable Rejected Reasonable Improvement Needed

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

7

0.54

8

0.38

9

0.38

10

0.38

11

0.69

12

0.38

13

0.58

14

0.31

15

0.5

16

0.46

17

0.27

18

0.35

19

0.31

20

0.38

0.31 High Low 0.15 High Low 0.15 High Low 0.47 High Low 0.15 High Low 0 High Low 0.54 High Low 0.15 High Low 0.38 High Low 0.31 High Low 0.23 High Low -0.07 High Low 0.15 High Low 0.15 High Low

0.31 0.08 0.23 0.15 0.23 0.38 0.62* 0.15* 0.15 0.08 0.46 0.23 0.08 0.38 0.46 0.31 0.00 0.00 0.23 0.31 0.38* 0.15* 0.31* 0.38* 0.15 0.23 0.15 0.15

0.00 0.23 0.15 0.31 0.46* 0.31* 0.15 0.31 0.08 0.08 0.08 0.31 0.85* 0.31* 0.08 0.31 0.00 0.08 0.08 0.31 0.15 0.15 0.00 0.23 0.38* 0.23* 0.23 0.46

0.15 0.23 0.15 0.15 0.08 0.15 0.00 0.08 0.77* 0.62* 0.38* 0.38* 0.00 0.08 0.08 0.15 0.31 0.54 0.62* 0.31* 0.31 0.54 0.46 0.15 0.31 0.38 0.46* 0.31*

0.69* 0.38* 0.46* 0.31* 0.23 0.08 0.23 0.46 0.00 0.08 0.00 0.00 0.08 0.15 0.38* 0.23* 0.69* 0.31* 0.08 0.00 0.08 0.08 0.23 0.15 0.15 0.08 0.15 0.08

Reasonable Rejected Rejected Good Rejected Rejected Good Rejected Reasonable Reasonable Improvement Needed Rejected Rejected Rejected

*correct option

Table 4 Distractor Efficiency Analysis As you could see, from the test part I it seems very good in discriminating good students from poor students. But part II does not seem so. Many of the item distractors work too well that even good students could not answer correctly. For example, item 12 gets nearly equal IF values for the correct answer and the other two distractors. Item like this is good for distracting students who do not really know the correct answer. On the other hand, it indicates the way the students had

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

been taught, or the retention of previous knowledge. Item 18 is a very bad one that should be the first to eliminate because it does not discriminate between the good students and poor students: poor students answered more correctly than good students, which was unexpected. While item 4 and 12 are to be rejected as well, as the items cannot differentiate good students from the whole class. Therefore, the items to be rejected should then be replaced by new items. The followings are substitutions for those items: For Part I 9. You speak English very (good/well). For Part II 4. She sang ………… A. beautiful

B. beautifully

C. beauty

D. beautily

8. He can paint the fence ……… A. fastly

B. fasten

C. fastness

D. fast

C. quitely

D. quietly

9. He is ……….. right. A. quite

B. quiet

11. He killed a cat ……….. yesterday. A. accident

B. accidental

C. accidentally

D. accidently

12. The employees were ………. afraid of their new boss. A. terrifying

B. terrified

C. terrible

D. terribly

14. They entered the room …………… because they were …… ….. A. quiet, late

B. quietly, late

C. quietly, lately

D. quiet, lately

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

18. Our teacher explained things very ………… We all understand him …….. A. clear, perfect

B. clearly, perfect

C. clear, perfectly

D. clearly, perfectly

19. Please carry the glasses ………… They were very expensive. A. careful

B. carefully

C. carely

D. care

20. She speaks ……….. She has a ……….. voice. A. soft, soft

B. softly, soft

B. soft, softly

D. softly, softly

Reliability of the test

The test would be good in discriminating students from each other. But how much it is reliable? It is assumed that a test should give the same results every time it measures, if it is used under the same conditions, should measure what it is supposed to measure, and should be practical to use. Because in every measurement instrument it inevitably has flaws that cause inaccuracies. Then in a language test, there are various ways to examine the reliability of the test depending on what type of the test is. The English Proficiency test is of course an NRT test. The method in measuring reliability of the test can be done by using KuderRichardson Formula 20 (K-R20). The reason is that it avoids the problem of underestimating the reliability of certain language test. Using its formula to calculate, it can be shown as follow:

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

K-R20 =

(1 - ∑IV)

k

St2

k-1

where K-R20 = Kuder-Richardson Formula 20 k

= number of items

IV

= item variance

St2

= variance for the whole test (that is, the standard deviation of the test scores squared)

in calculating for the K-R20 value, there are many others variables involved. Below is calculation of item variances.

Calculating Item Variances Part I Item number 1 2 3 4 5 6 7 8 9 10

IF 0.6923 0.6538 0.6538 0.5769 0.6154 0.5769 0.5000 0.4615 0.6154 0.6154

1-IF 0.3077 0.3462 0.3462 0.4231 0.3846 0.4231 0.5000 0.5385 0.3846 0.3846

IF(1-IF) 0.2130 0.2263 0.2263 0.2441 0.2367 0.2441 0.2500 0.2485 0.2367 0.2367

Part II Item number 1 2 3 4 5 6 7 8 9

FV 0.0385 0.0769 0.1154 0.1538 0.1923 0.2308 0.2692 0.3077 0.3462

1-IF 0.9615 0.9231 0.8846 0.8462 0.8077 0.7692 0.7308 0.6923 0.6538

IF(1-IF) 0.0370 0.0710 0.1021 0.1302 0.1553 0.1775 0.1967 0.2130 0.2263

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

10 11 12 13 14 15 16 17 18 19 20

0.3846 0.6154 0.4231 0.5769 0.4615 0.5385 0.5000 0.5000 0.5385 0.4615 0.5769 0.4231 0.6154 0.3846 0.6538 0.3462 0.6923 0.3077 0.7308 0.2692 0.7692 0.2308 Variance Total Table 5 Calculation of Item Variances

0.2367 0.2441 0.2485 0.2500 0.2485 0.2441 0.2367 0.2263 0.2130 0.1967 0.1775 6.9127

In addition to the content in Table 5, there are others values needed. See from table 6 for the rest of the calculation. Test Statistics Mean 14.615 S 14.651 K-R20 0.029 Table 6 Test Statistics The reliability of this test came out to be 0.029, which is quite low. When re-administering the test, putting new items in places on items to be eliminated, the reliability will, of course, change. In that case, the participants will have to retake the test so that the consistency of the testtakers remains the same. In other cases, this test was rated by only one rater, so the question of inter-rater can be eliminated. Conclusion

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

Achieving the English Proficiency Test for Mathayom Two students at Bangna Demonstration School gave a wide range of result. The implication from this range means curriculum and course design development needed. The test, though, was done to prepare the students for the next coming year, it indicates the areas of improvement needed. Though there were many influential factors that can make the test result changed or different, the overall score proved that the students need an extensive course for preparing them to classroom. And by the result of the test, the course designer should make a better plan in directing and explaining for specific required skills. And for NRT test developers, it is recommended to make a test as long as possible, well-designed and carefully written, assess relatively homogeneous material, has items that discriminate well, is normally distributed, and is administered to a group of students whose abilities are as wide as logically possible within the context (James Brown, 1996: p. 209) The rationale why such items should be eliminated and why the scores were not satisfactory was that my ideal concept that private school provides better language classroom learning than governmental schools. But this idea was proved fault when this test was accomplished. The reasons behind this lay on the curriculum design and lesson planning. The participants are also influential that they were not ready to take the test,

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

and their concentration was not at the test, as the test was taken nearly at the end of the school day.

Reference Brown, James Dean. (1996). Testing in language programs. New Jersey: Prentice Hall Regents.

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

Appendix Circle the appropriate words in the brackets to complete the sentences. 1. I think this film is (bored/boring)……….. 2. I don’t find politics (interested/interesting) …………. 3. Walking makes me (tired/tiring) ………… 4. This book is really (excited/exciting) ……….. 5. Kate is doing her exams and is (worried/worrying) ………… 6. Are you (interested/interesting) ………….. in basketball? 7. Dang always feels (bored/boring) …………… 8. Jan finds computers (confused/confusing) ………….. 9. We were all feeling (tired/tiring) …………. 10. What an (excited/exciting) ………………. day. Circle the appropriate items to complete the sentences. 1. He bought a(n) …………. From the antique shop. A. rosewood old round table

B. old rose wood round table

C. round old rosewood table

C. old round rosewood table

2. It is a(n) …………… A. horrifying old mysterious story

B. horrifying mysterious old story

C. old horrifying mysterious story

D. mysterious old horrifying story

3. His voice is ……………. A. loud

B. aloud

C. loudly

D. aloudly

C. interestingly

D. interest

4. The lesson seems ……………. A. interesting

B. interested

5. We arrived at the destination …………… A. save

B. safe

C. safely

D. safety

6. I am sure the soup tastes ………….. A. well

B. good

C. goodness

D. goodly

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

7. The ……….. parents scolded the child for his …………. results. A. disappointing, disappointing

B. disappointed, disappointed

C. disappointing, disappointed

D. disappointed, disappointing

8. Give him that ……….. A. yellow old leather case

B. old leather yellow case

C. leather yellow old case

D. old yellow leather case

9. The curry smells …….. but it doesn’t taste ………… A. well, deliciously

B. good, delicious

C. good, deliciously

D. well, delicious

10. I feel ………… when I think of my housework. A. bad

B. badly

C. badness

D. worse

C. worried

D. worrily

11. We were already ……….. A. worry

B. worrying

12. The children are ……….. by the animals. A. frightening

B. frighten

C. frightened

D. frightingly

C. tiresome

D. tireness

13. It was a very ………… journey. A. tired

B. tiring

14. We were all very ………… in what he said. A. interesting

B. interest

C. interestingly

D. interested

15. Why do you look so …………. at school? A. boringly

B. boredom

C. boring

D. bored

C. exciting

D. excitedly

16. It was a terribly ………… day. A. excited

B. excitement

17. Didn’t you think it was an ………….. play? A. amusing

B. amusement

C. amused

D. amusingly

C. tired

D. tiredly

18. We had a ………….. trip home. A. tiring

B. tiredness

19. The last half hour was a …………. time.

Natapon Kidrai 4436733 SCAL/M SCLG 637 Testing and Evaluation Project on Test Development

A. worry

B. worrying

C. worried

D. worrily

20. I’ve never been so ………… in my life. A. frightening

B. frighten

C. frightened

D. frighteningly

Related Documents

Project 1
April 2020 7
Project 1
May 2020 8
Project 1
November 2019 5
Project 1
July 2020 12
Project 1
November 2019 8
Project#1
June 2020 4