Language Testing 2008 25 (4) 495–519
Development of a cognate awareness measure for Spanish-speaking English language learners Valerie Malabonga and Dorry M. Kenyon Center for Applied Linguistics, Washington, DC, USA Maria Carlo University of Miami, USA Diane August and Mohammed Louguit Center for Applied Linguistics, Washington, DC, USA
This paper describes the development and validation of the Cognate Awareness Test (CAT), which measures cognate awareness in Spanishspeaking English Language Learners (ELLs) in fourth and fifth grade. An investigation of differential performance on the two subtests of the CAT (cognates and noncognates) provides evidence that the instrument is sensitive to English–Spanish cognate awareness among elementary school-age Spanish-speaking ELLs. Cognates were highly correlated with the children’s Spanish WLPB-R Picture Vocabulary scores, whereas noncognates were highly correlated to children’s English WLPB-R Picture Vocabulary scores. Keywords: applied linguistics, cognates, English language learners, morphological awareness, Spanish-speaking children, vocabulary
Vocabulary knowledge plays a crucial role in the development of reading comprehension for both monolingual, native-English-speaking children (Anderson & Freebody, 1983; Snow, Burns & Griffin, 1998; Snow, Cancini, Gonzales & Shriberg, 1989) and English language learners (ELLs) (August & Hakuta, 1997; Carlisle, Beeman, Davis & Spharim, 1999; Carlisle, Beeman & Shah, 1996; Dufva & Voeten, 1999; Jiménez, García & Pearson, 1996). August, Carlo, Dressler and Snow (2005) report that many ELLs in US schools are deficient in English vocabulary, and that this deficiency impedes their reading comprehension.
Address for correspondence: Valerie Malabonga, Center for Applied Linguistics, 4646 40th Street NW, Suite 200, Washington, DC 20016, USA; email:
[email protected] © 2008 SAGE Publications (Los Angeles, London, New Delhi and Singapore)
DOI:10.1177/0265532208094274
496
Development of a cognate awareness measure
One particular type of vocabulary, cognates, is the focus of the Cognate Awareness Test (CAT) described in this paper. Whitley (2002) defines cognates as words that have similar meaning, spelling and form, and have been inherited from the same ancestor language. In the case of Spanish and English, cognates are descended from earlier derivatives of the Indo-European language family (Anthony, 1954; Lalor & Kirsner, 2000; Schelletter, 2002). Cognate awareness is the perception or knowledge that helps individuals recognize the relationship between an unfamiliar word in one language and a familiar word (cognate) in another, and thus understand the meaning of the unfamiliar word (Cunningham & Graham, 2000; Nagy, Garcia, Durgonoglu & Hancin-Bhatt, 1993). In this paper we describe the development and validation of the CAT as a measure of the ability of Spanish-speaking third through fifth graders to use knowledge of Spanish words to discern the meaning of their English cognates. According to the 2000 US Census, native Spanish speakers (NSSs) constitute 66% of the school-age population of ELLs (Batalova, 2006). The large number of English words with Spanish cognates provides some support in English text comprehension for these ELLs if they are aware of cognate relationships (Jiménez et al., 1996). As Nagy et al. (1993, p. 242) wrote: ‘If Hispanic bilingual children know the Spanish words, and recognize the cognate relationships, their Spanish knowledge should provide them with substantial help in English vocabulary, especially difficult reading vocabulary.’ Spanish and English share thousands of cognates; these often appear in content area academic texts, so increasing children’s cognate awareness is one method of accelerating their English vocabulary development and comprehension of these texts (August & Shanahan, 2006; August et al., 2005). An instrument such as the CAT, combined with L1 and L2 vocabulary tests, can assess children’s ability to use L1 vocabulary knowledge to determine the meanings of L2 cognates, and can also measure the effects of interventions designed to build cognate knowledge. Although Nagy et al. (1993) and Cunningham and Graham (2000) developed cognate awareness measures similar to the CAT, their tests have some limitations. Neither investigated test reliability and validity. Furthermore, although Cunningham and Graham piloted their test with native Spanish-speaking as well as native Englishspeaking adults, their test was developed for native English-speaking children learning Spanish in a two-way immersion program, not for NSS children learning English.
Valerie Malabonga and Dorry M. Kenyon et al. 497
I Development of the test The CAT was developed as part of a larger study (August, Carlo & Calderón, 2005) whose purpose was to investigate the transfer of reading skills from Spanish to English by Spanish-speaking elementaryschool children in transitional Spanish-to-English language programs. In developing the CAT, it was important to identify Spanish–English cognates that NSS children from third through fifth grade were likely to know in Spanish but not in English. Many Spanish–English cognates, such as infirm, castigate, and accompany, are high-frequency words in oral Spanish but low-frequency words in oral English (Cunningham & Graham, 2000). We hypothesized that knowledge of high-frequency Spanish words would help children with high cognate awareness to understand the meaning of low-frequency English words. To test this hypothesis, we designed the CAT using low-frequency English words. Half of the words had Spanish cognates with high frequency in Spanish, and the other half had no Spanish cognates. We used low English frequency as the basis for determining word difficulty because research on English monolingual children indicates that word frequency is a primary basis for the order by which children acquire words (Biemiller & Slonim, 2001). We used Nash’s (1997) dictionary of Spanish cognates and the cognates from Nagy et al. (1993) as starting points. Word frequencies were checked using the corpora of Kucera and Francis (2005), Francis and Kucera (1982), and Davies (2005). Although these corpora are based on materials adults read, we believe they provide a reasonable approximation of the word frequencies that the children in our study would be exposed to in their academic subjects. Bilingual researchers drew up a word list that included nouns, verbs, and adjectives. We chose only Spanish–English cognates that were ranked low in English frequency but generally high in Spanish frequency and that had high transparency (i.e., had almost identical spellings and the same or a very closely related meaning). From Kucera and Francis (2005) and Francis and Kucera (1982), we selected noncognates whose frequencies matched those of the cognates. We did not include register in our criteria for choosing the words, and the matched words were not always the same part of speech. Table 1 lists the English words chosen to appear on the CAT together with their frequencies per million. The frequency of the cognate and noncognate English words ranged from 1 to 8 (per million), with a mean frequency of 3. On the other hand, the Spanish
498
Development of a cognate awareness measure
Table 1 List of cognates, noncognates, and easy words and their frequencies* for the operational version of the CAT Noncognates
F
English cognates
undermine jest tattered rehearse clutch gritty feasibility strife drought maladroit haul hoist snug allot brittle drowsy trustworthy (leery) leering fiend wily flee pun
8 1 5 1 5 1 3 6 5 1 5 1 2 1 3 1 3 4 3 2 1 1
accompany adorn anterior castigate converse curative edifice epoch imitate (impede) impeded initiate jocose malevolent matrimonial pallid pensive profundity obligated odious terminus tranquil valor
Eight ‘Easy’ Words construction idea literature modern permit poet production simple
F
Spanish cognates 8 1 5 1 5 1 3 6 5 5 5 1 2 1 3 1 3 4 3 2 2 1
95 195 133 198 77 99 148 161
F
acompañar adornar anterior (castigar) castigo(a) (conversar) conversacion (curativo(a)) curar edificio época imitar impedir iniciar jocoso(a) malévolo(a) matrimonio pálido(a) (pensativo(a)) pensar profundidad obligado(a) odioso(a) terminar tranquilo(a) valor
10 12 99 30 138 6 69 227 7 20 22 2 1 60 26 159 36 41 3 65 70 107
construcción idea literatura moderno/a permitir poeta producción simple
92 196 114 48 30 71 139 81
*Frequencies are number of words/per million. Sources: English: Kucera, H. & Francis, W. N. Kucera and Francis Word Pool. Retrieved March 3, 2005, from http://memory.psych.upenn.edu/wordpools.php Spanish: Davies, M. Corpus del Español. Retrieved March 3, 2005, from http://www. corpusdelespanol.org/ Notes: 1) Davies (2005) has a corpus of 20 million words from twentieth-century materials, so the frequencies obtained for the Spanish words were first divided by 20 to get the frequencies of the words per million. 2) For castigate, converse, curative, and pensive, an alternate form of Spanish word was used instead for calculating frequency. 3) For Spanish adjectives, we added the male and female (gender) frequencies (e.g., pálido and pálida).
Valerie Malabonga and Dorry M. Kenyon et al. 499
counterparts of the English cognates ranged from 6 to 227 and had a mean frequency of 63. To add some variability and ensure that the children would not top out of the test, three words with infrequent Spanish cognates were also included: jocoso/a, malevolo/a, and odioso/a (with frequencies of 2, 1, and 3 respectively). In the operational version of the test, we also included eight cognates with high frequencies in both English and Spanish. Frequencies of these English words ranged from 77 to 248 per million, and frequencies of their Spanish cognates ranged from 30 to 196. For each test word, four high-frequency English words or phrases were provided as possible responses, only one of which was related to the test word in meaning. No Spanish words appeared on the instrument. Students were instructed to read each test word, think about what it meant, and then choose the one option that they felt was most closely related to the meaning of the test word. II Pilot study Before using the CAT in full-scale studies, we piloted it with 100 Spanish-speaking ELLs in order to gather preliminary information about its reliability and validity. We also collected feedback on the test from the children and their teachers. 1 Participants The pilot study participants were fourth and fifth graders from four schools in low-income, predominantly Spanish-speaking neighborhoods in a large mid-Atlantic city in the USA. Table 2 provides demographic information on the children. 2 Measure The pilot version of the CAT had three practice items and 61 test items (30 cognates and 31 noncognates). The following samples indicate the format of the pilot test: 1. initiate 2. strife 3. infirm
a) clean a) plane a) honest
b) balance c) begin b) choice c) king b) afraid c) confused
d) gain d) fight d) sick
During test administration, the researcher wrote the practice items on the board and reviewed them with the children. The children were then instructed to work on their own using their test booklets.
500
Development of a cognate awareness measure Table 2
Student demographic information for pilot study N
Ethnicity Latino/Hispanic
%
100
100%
Grade Fourth (9 year olds) Fifth (10 year olds)
69 31
69 31
Language spoken by child at home Only Spanish Mostly Spanish Spanish and English Mostly English Only English Missing
8 5 28 13 1 45
8 5 28 13 1 45
Language program in school Spanish Dominant Transitional Bilingual English Dominant 80–20 Bilingual English Dominant Regular
56 16 28
56 16 28
3 Analysis Analysis focused on evaluating the items in the pilot version of the CAT in order to determine the final pool of items to be included in the operational version. Following the definition of validity as ‘an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment’ (Messick, 1989, p. 13; italics in original), we examined two types of empirical evidence. First, we looked at the set of all test items together, using the Rasch model to determine whether the CAT was measuring a single construct of vocabulary knowledge. Our assumption was that if all the items fit the Rasch model, we could infer that both cognates and noncognates were measuring a single construct related to English vocabulary knowledge. Second, we examined the test’s construct validity by investigating performance on the cognates and noncognates separately. We used WINSTEPS software (Linacre & Wright, 2000) to calibrate the difficulty of the items and the ability of test takers on a common interval scale and to provide information about the test’s properties, especially its reliability, scalability, and fit to the Rasch model. We performed three separate calibrations. In the first, all 61 words were calibrated on a single logit scale. This calibration
Valerie Malabonga and Dorry M. Kenyon et al. 501
allowed us to determine the difficulty of all of the items, as well as the children’s ability based on their performance on all 61 words. With the difficulty of the items anchored at their original values, a second and third calibration produced measures of the children’s vocabulary ability with respect to the cognates and the noncognates separately. In these separate calibrations, we found that the difficulty values of the cognates and noncognates, anchored to the difficulty value from the calibration of the entire test, were within the normal range. Thus, the difficulty values of the items calibrated separately did not vary much from their difficulty when calibrated together. The range of displacement values for the cognates was !.05 to .04, and the range for the noncognates was !.02 to .08. This finding also supports the view that all 61 items were measuring one underlying construct. Table 3 shows the means, standard deviations and ranges of the scaled scores for cognates, noncognates, and all items in the pilot. (Note: The table shows N " 92 because tests with all correct and no correct answers were discarded following standard Rasch analysis procedure. Rasch logit scores were scaled such that mean item difficulty was 100 and the length of a logit was 20.) 4 Results a Map of children and items: Figure 1 shows the Rasch map of the pilot study children and the pilot test items on a single scale. The children’s ability covers a range of 3.88 logits, which is wider than the range of 2.62 logits for item difficulty. The map also shows an even spread of cognates and noncognates, with no major gaps except at the lower end. The map shows that, in general, the items on the CAT were spread evenly along the scale, but the mean difficulty for test items (marked ‘M’ by the item names) was well above the mean of the children’s Table 3 Means, standard deviations and ranges of children’s scores on cognates, noncognates and all words for pilot study*
Mean SD Range Min Max
Cognates (30 items)
Noncognates (31 items)
All words (61 items)
82.46 14.18 80.60 41.20 121.80
83.42 12.45 83.60 32.40 116.00
83.44 10.20 71.23 44.14 115.37
*Scaled scores, N " 92
502
Development of a cognate awareness measure More Able Children LOGITS 1.5
Hard Items
+ | | | |T + | . | | |S | | . T| . | |
1
fiend-NC
stodgy–NC wily–NC forlorn–NC brevity–C casualty–NC malevolent–C epoch–C leery–NC calumny–C kosher–NC allot–NC quagmire–NC frenzied–NC undermine–NC discard–NC infirm–C flee–NC brittle–NC anterior–C pithy–NC hoist–NC strife–NC pun–NC drought–NC clutch–NC impede–C edifice–C .# +M nocturnal–C curative–C 6.initiate–C .# | detain–C augment–C terminus–C amicable–C .# | obligated–C faze–NC jest–NC haul–NC maladroit–NC snug–NC navigate–C imitate–C pallid–C valor–C .# S| tattered–NC profundity–C pensive–C rehearse–NC ## | jocose–C tranquil–C gritty–NC feasibility–NC ###### | odious–C .# |S .# | .####### | adorn–C castigate–C .## M| trustworthy–NC .##### + .### |T drowsy–NC .## | accompany–C ## | matrimonial–C ### S| converse–C # | .# | . | | # | T+ | | | | | | | . | | + . | | | | +
0
21
!2
!3
–3.5 Less Able Children
Easy Items
Each ‘#’ is two persons and ‘.’ is one person.
Figure 1
Rasch map for pilot study
Valerie Malabonga and Dorry M. Kenyon et al. 503
abilities (marked ‘M’ by the #s). This finding suggested that the items were very difficult for the children and that a few less challenging items should be added to expand the range of the test and motivate the children. The separate analyses performed on the cognates and the noncognates also showed that the spread of both children and items was generally even along the scale; however, both subtests were quite difficult for the children. b Reliability: The Rasch reliability of the vocabulary measure based on all 61 items was .70. Separately, the reliability for the cognate measure was .63, and for the noncognate measure .50. The lower reliability estimate for the noncognates may indicate that these items did not allow the children’s abilities to be as well differentiated as the cognate items. c Analysis of fit to model: Fit of the items to the Rasch model was examined through the infit and outfit mean square statistics. Although Linacre (2007, p. 2) suggests that items with mean square values of between. 5 and 1.5 are ‘productive of measurement,’ we chose a more conservative approach, flagging as misfitting items with mean square values greater than 1.3. Our analysis indicated that three items in the pilot version of the CAT were misfitting, 10 were problematic in terms of answer options, and two were mischaracterized as noncognates. Likewise, the children’s and the teachers’ feedback, as well as our observations of the children, indicated that the test’s layout was difficult to follow. Lastly, the teachers recommended clearer test instructions. Based on the item analyses conducted in the pilot study, the following revisions were made to the CAT:
• • •
•
Three misfitting items, ten items which had problematic options, and two items that we erroneously classified as noncognates (discard and frenzied) and their two cognate counterparts (augment and detain) were deleted from the original 61 words. For clarity, the wording of the responses for three items was changed. Eight new cognate words with high frequencies in both English and Spanish were added to the test: construction, idea, simple, literature, modern, poet, production, and permit. These less challenging words, randomly inserted, were included to improve the children’s motivation to complete the entire test. The instructions and the test layout were modified to make the test more user-friendly.
504
Development of a cognate awareness measure
III Study 1: Operational version of the CAT, Year 1 The purpose of Study 1 was to obtain information about the reliability and fit of the CAT to the Rasch model, and to determine the test’s validity after revisions were made based on the pilot study. Analysis of the operational CAT focused on investigating the construct validity of the test through the types of calibrations used in the pilot study and through assessment of the relationship between students’ performance on the CAT (for both cognates and noncognates) and their performance on the Picture Vocabulary subtest of the Woodcock Language Proficiency Battery-Revised (WLPB-R/PV) (Woodcock, 1991a, 1991b; Woodcock & Muñoz-Sandoval, 1995). 1 Participants Participants in the study were 173 Spanish-speaking ELLs who were participants in the larger transfer study. The students were fourth graders in Success for All (SFA) reading programs in three urban schools in predominantly Spanish-speaking neighborhoods in three states in the USA. Some were being instructed only in English by fourth grade, while others were still receiving some instruction in Spanish. The first section of Table 4 presents demographic information on the children. 2 Measures The operationalized version of the CAT consisted of 52 items: 22 cognates and 22 noncognates that were scored, and eight less challenging items that were added as a result of the pilot study (see Table 1). Figure 2 illustrates the new test format. The WLPB-R/PV was used to measure the children’s English and Spanish vocabulary knowledge. In the WLPB-R/PV, a child sees a picture and is asked to name the object(s) or action(s) in the picture. The WLPB-R is one of very few vocabulary tests that have both English and Spanish versions. 3 Data analysis and results a Scalability: As in the pilot study, three separate calibrations were performed. The first included all 52 test items. The second and third calibrations were performed on the cognates and noncognates separately, while anchoring their measures on the first calibration. The
Valerie Malabonga and Dorry M. Kenyon et al. 505 Table 4 Student demographic information for Study 1 (fourth graders) and Study 2 (fifth graders) Study 1: Year 1 with fourth graders
N
%
Ethnicity: Latino/Hispanic Language spoken by child at home Spanish English Missing
173
100%
132 37 4
76.3 21.4 2.3
75 85 13
43.4 49.1 7.5
Language instruction in school Still instructed in Spanish Fully instructed in English Missing Study 2: Year 2 with fifth graders
N
Ethnicity: Latino/Hispanic Language spoken by child at home Spanish English Missing Language instruction in school Still instructed in Spanish Fully instructed in English Missing
%
155
100%
111 33 11
71.6 21.3 7.1
62 82 11
40.0 52.9 7.1
“For each item, you will read the bolded word and think about what it means. After you have thought about the bolded word and what it means, you are supposed to pick the one word that is most closely related to the meaning of the bolded word.” Cognate converse O speak with someone O fight with someone O include someone O leave out someone
Figure 2
Noncognate jest O defend O bend O joke O observe
Format and sample questions from operationalized CAT
eight less challenging items were excluded from the second and third calibration. The displacement values were within the normal range: !.11 to .10 for cognates and !.02 to .02 for noncognates. The first section of Table 5 shows the means, standard deviations and ranges of the fourth graders’ scaled scores for cognates, noncognates and the entire test. (Note: The table shows N " 170 because tests with all correct answers and no correct answers were discarded following standard Rasch analysis procedure.)
506
Development of a cognate awareness measure
Table 5 Means, standard deviations and ranges of cognates, noncognates and all words (Year 1 [fourth graders] and Year 2 [fifth graders])* Year 1 (Fourth graders)
Mean SD Range Min Max
Cognates (22 items)
Noncognates (22 items)
All words (52 items)
93.89 18.08 100.00 36.60 136.60
94.59 13.30 70.80 55.20 126.00
94.62 12.23 59.60 63.80 123.40
Cognates (22 items)
Noncognates (22 items)
All words (52 items)
102.09 20.63 135.20 61.00 196.20
98.89 17.08 99.60 39.20 138.80
101.03 14.88 75.00 60.60 135.60
*Scaled scores, N ! 170. Year 2 (Fifth graders)
Mean SD Range Min Max
*Scaled scores, N ! 155.
b Map of children and items: Figure 3 shows the Rasch map for the operational CAT, with children and items on a single scale. The map shows an even spread of cognates and noncognates, similar to the pilot results. The map also shows that the eight less challenging words were all below the mean difficulty for the items as a whole. The difference between mean item difficulty and mean student ability was reduced from one logit in the pilot to just half a logit in the first study, demonstrating that the test had become ‘easier’. The children’s ability range of 2.98 logits was narrower than the range of 3.60 logits for item difficulty. This implies that the test can be used for children with a wider range of abilities than this fourth grade group. The mean difficulty of the items was also slightly higher than the mean ability of the children, indicating that some items were still difficult for some fourth graders, but that such items would be appropriate for older children or fourth graders with higher abilities. This finding was important since the CAT was intended to be used with both fourth and fifth graders. c Reliability: The estimated reliability was a moderate .70 for the entire test and .65 for the cognates, about the same as on the pilot. Reliability for the noncognates was .37, a decrease from the pilot. Since reliability is influenced by the distribution of ability in the group
Valerie Malabonga and Dorry M. Kenyon et al. 507 More Able Children LOGITS 2
1
0
!1
!2
!3
Hard Items
+ | |T | | | | epoch–C | fiend–NC | malevolent–C undermine–NC | wily–NC curative–C XX | XXX | X + T|S maladroit–NC anterior–C XX | impede–C X | pun–NC jest–NC allot–NC pallid–C initiate–C XX | hoist–NC XXXX | jocose–C strife–NC XXXXX | drought–NC XXXXXXXXX | haul–NC edifice–C XXXXX S| leery–NC XX | brittle–NC feasibility–NC valor–C XXXXXXXXX | pensive–C XXXXXXXXX | flee–NC terminus–C XXXXXXXX +M odious–C XXXXXXXXXXXX | profundity–C snug–NC XXXXXXX | gritty–NC clutch–NC adorn–C XXXXXX M| XXXXXXXXXX | castigate–C permit–U XXXXXX | production–U rehearse–NC XXXXXXXX | obligated–C X | modern–U tranquil–C XXXXXXXXXXX | tattered–NC XXXXXXXXX | XXXXXXXXX | converse–C imitate–C XXXXXX S|S XXX + XXXXX | poet–U literature–U XXXXXX | accompany–C trustworthy–NC matrimonial–C XXX | X | XXXX | drowsy–NC T| | | | X |T | simple–U + idea–U | | construction–U | | | | | | | | | + Less Able Children Easy Items Each ‘X’ is one person.
Figure 3 Rasch map for Study 1 (operational version with fourth graders)
508
Development of a cognate awareness measure
taking an assessment, the decrease in reliability of the noncognates could be explained by the fact that only fourth graders were tested in Study 1, whereas both fourth and fifth graders were tested in the pilot. Also, with the exception of the eight less challenging words, the items were in a restricted range, that is, they were all very low frequency words. Consequently, we considered these reliabilities acceptable. d Analysis of fit to model: Using the same conservative criteria as in the pilot, we found that 96% of the items in the total test calibration fit the Rasch model; only two items (fiend and allot) were misfitting. In the noncognate calibration, the same two items were found to be misfitting, while none of the cognates was misfitting. Further analysis showed that the two misfitting words were among the most difficult words on the test. Overall, then, all three calibrations showed an acceptable fit of the data to the Rasch model. e Construct validity: Because the CAT was only one measure among many given to students in the larger study, it was not possible to interview the children to discover what strategies they used to answer the test items. However, we wanted to see if there was any evidence that students might be drawing on their cognate knowledge in taking the CAT. We hypothesized that knowledge of Spanish vocabulary would help students respond correctly to cognate items but not to noncognate items. To investigate the validity of the CAT as a measure of cognate awareness, we assessed the relationship between students’ performance on the CAT and their performance on the WLPB-R/PV. We found a moderate relationship between cognate performance and the Spanish WLPBR/PV for fourth graders (r! .50, N ! 114, p ".01). However, there was no correlation between cognate performance–and the English WLPB-R/PV (r ! #.13, N ! 114, p $ .01). On the other hand, English WLPB-R/PV was moderately related to performance on noncognates (r ! .41, N ! 114, p ".01), but Spanish WLPB-R/PV was not (r ! #.14, N ! 114, p ".01). These results provided evidence that knowledge of Spanish played a role in children’s performance on CAT cognate items. For additional analyses, we divided the children into four groups using WLPB-R scores in Spanish and English (see Tables 7 and 9):
• • • •
Low Spanish, Low English (LSLE) Low Spanish, High English (LSHE) High Spanish, Low English (HSLE) High Spanish, High English (HSHE)
Valerie Malabonga and Dorry M. Kenyon et al. 509
The cut-off score used for the WLPB-R/PV was 80. That is, a child with a standard score lower than 80 on the WLPB-R/PV measure was categorized as low for that language, whereas a child with a standard score equal to or greater than 80 was categorized as high. The cut-off score was chosen based on a scatter plot in order to have a meaningful number of students in each of the four quadrants. (Note: for WLPB-R/PV, the mean obtained for the norming sample of fourth graders was 100 and the standard deviation was 15. Thus, our cut-off score is one and a third standard deviation points lower than the mean of a typical monolingual fourth grader.) To compare children’s cognate and noncognate vocabulary measures on the CAT within each study, and from one study to the next, the logit measures were converted to scaled scores with a mean of 100 and 20 units to a logit. We then compared the mean performance on the cognate and noncognate items by the four groups. We hypothesized that if the CAT was tapping into the construct of cognate awareness, then for cognates, knowledge of Spanish vocabulary would play an important role, but not necessarily knowledge of English vocabulary. We further hypothesized that for noncognates, the opposite would be true: knowledge of English vocabulary would play an important role, but not necessarily knowledge of Spanish vocabulary. Table 6 shows the means and standard deviations of each subgroup’s scores on cognates and noncognates. To check for statistical differences between the mean performances of the four subgroups on the cognates, we conducted a nonparametric Kruskal Wallis test for independent samples because the number of children in each subgroup was unequal. The overall Kruskal Wallis chi square for cognates was significant (!2 (3) " 25.82, p # .01). Results of individual tests are presented in Table 7. The upper portion of the table shows that the subgroups with high Spanish consistently outperformed the subgroups with low Spanish on cognates, thus demonstrating their cognate vocabulary knowledge, whereas subgroups with high English did not necessarily perform statistically significantly better than subgroups with low English. The lower section of Table 7 shows the mean and standard deviations of the four subgroups scores on the noncognates. The overall Kruskal Wallis chi square was again significant (!2 (3) " 15.69, p # .01). The table shows that subgroups with high English consistently outperformed subgroups with low English on noncognates. Furthermore, for both cognates and noncognates, the high Spanish, high English subgroup consistently performed better than other
510
Development of a cognate awareness measure
Table 6 Means and standard deviations of subgroups of fourth graders on cognates and noncognates (Study 1) Means and standard deviations of fourth graders’ scaled scores on cognates Low Spanish High Spanish (Picture vocabulary ! 80) (Picture vocabulary " 80) High English (Picture Vocabulary " 80)
M # 87.90 (N # 29) SD #12.87
M # 107.91 (N # 18) SD # 15.98
Low English (Picture Vocabulary!80)
M # 84.85 (N # 12) SD #13.22
M # 100.65 (N # 55) SD # 17.62
Means and standard deviations of fourth graders’ scaled scores on noncognates Low Spanish Picture vocabulary ! 80)
High Spanish (Picture vocabulary " 80)
High English (Picture vocabulary "80)
M # 96.26 (N # 29) SD # 10.23
M # 102.16 (N # 18) SD # 14.29
Low English (Picture vocabulary!80)
M # 93.18 (N # 12) SD # 11.77
M # 88.72 (N # 55) SD # 12.34
Table 7 Means and Kruskal Wallis chi square for paired subgroups (Study 1, fourth graders) Paired Subgroups (N)
Mean Ranks
Kruskal Wallis chi-square
Cognates High Spanish vs. Low Spanish HSHE (18) vs. LSHE (29) HSHE (18) vs. LSLE (12) HSLE (55) vs. LSHE (29) HSLE (55) vs. LSLE (12)
subgroups 33.50 19.97 49.45 37.37
18.10 8.79 29.31 18.54
$2 $2 $2 $2
High Spanish vs. High Spanish subgroup HSHE (18) vs. HSLE (55) 43.67
34.82
$2 (1) # 2.38, NS
Low Spanish vs. Low Spanish subgroup LSHE (29) vs. LSLE (12) 21.97
18.67
$2 (1) # .65, NS
Noncognates High English vs. Low English subgroups HSHE (18) vs. LSLE (12) 17.72 LSHE (29) vs. HSLE (55) 52.97 HSHE (18) vs. HSLE (55) 51.31 LSHE (29) vs. LSLE (12) 22.29
12.17 36.98 32.32 17.88
$2 $2 $2 $2
High English vs. High English subgroup HSHE (18) vs. LSHE (29) 27.47
21.84
$2 (1) # 1.89, NS
Low English vs. Low English subgroup HSLE (55) vs. LSLE (12) 32.89
39.08
$2 (1) # 1.01, NS
(1) (1) (1) (1)
(1) (1) (1) (1)
# # # #
# # # #
14.08 , p ! .01 11.69, p ! .01 13.04, p ! .01 9.27, p ! .01
2.89, NS 8.28, p ! .01 11.02, p ! .01 1.17, NS
Valerie Malabonga and Dorry M. Kenyon et al. 511
subgroups, and this difference was usually (but not always) statistically significant. This finding confirms the CAT as a measure of vocabulary knowledge. In summary, our results consistently show that high Spanish vocabulary knowledge, as measured by the WLPB-R/PV, was helpful in predicting high vocabulary scores on the CAT’s cognate items, but high English knowledge was not. On the other hand, high English vocabulary knowledge, as measured by the WLPB-R/PV, was a good predictor of high vocabulary scores on noncognate items, whereas high Spanish vocabulary knowledge was not. These findings provide support for the claim that cognate items on the CAT appear to tap into some level of cognate awareness for students with high Spanish vocabulary knowledge. Lastly, children with high scores on both the Spanish and English WLPB-R/PV consistently performed at the highest levels on both cognates and noncognates, providing support for the CAT as a vocabulary measure.
IV Study 2: Operational version of the CAT, year 2 Because the larger transfer study was longitudinal, we were able to investigate the stability of the CAT across two testing occasions. One year after the first administration of the operational CAT, we administered it again to the same cohort of children, now in fifth grade. Due to attrition, only 155 children participated. The second section of Table 4 provides background information on these children. In Study 2, we used the same measures (CAT and WLPB-R/PV) and conducted the same analyses as in Study 1. 1 Results a Scalability: Twelve items (seven cognates and five noncognates) showed noticeable displacement when their difficulty values were anchored to the item difficulty values from the first calibration. The remaining 32 items did not show any major displacements. The second section of Table 5 shows the means, standard deviations and ranges of the fifth graders’ scaled scores for cognates, noncognates and the entire test. Table 5 clearly indicates that the children’s knowledge of English vocabulary had increased, particularly their cognate scores (from 93.89 as fourth graders to 102.09 as fifth graders).
512
Development of a cognate awareness measure More Able Children LOGITS 2
1
0
!1
!2
!3
Hard Items
+ | X | | | XX T| anterior–C wily–NC fiend–NC XXX | malevolent–C | curative–C X | jest–NC XX | epoch–C undermine–NC | XXX | pun–NC allot–NC XXXXXX +S leery–NC XXXXXX | jocose–C maladroit–NC XXX S| valor–C XXX | XXXXXXXXX | XXXX | hoist–NC terminus–C XXXXXXX | impede–C initiate–C edifice–C XXXXXX | XXXXXXX | brittle–NC feasibility–NC pallid–C XXXXXXXXX | strife–NC | haul–NC gritty–NC snug–NC adorn–C XXXXXXXX M| drought–NC pensive–C XXXXXXX +M profundity–C clutch–NC XXXXXXXXX | flee–NC XXXXXXXXXX | odious–C XXX | castigate–C XXXXXXXX | tattered–NC XXXX | obligated–C | rehearse–NC permit–U XXX | XXXXXX | tranquil–C XXXXX S| converse–C XX | production–U | imitate–C XX +S XX | X | XXXX | drowsy–NC XXXX | modern–U matrimonial–C literature–U | trustworthy–NC T| X | X | X | poet–U | accompany–C | construction–U + idea–U XX |T | simple–U | | | | | | | | | + Less Able Children Easy Items Each ‘X’ is one person.
Figure 4 Rasch map for Study 2 (operational version with fifth graders)
Valerie Malabonga and Dorry M. Kenyon et al. 513
b Map of children and items: Figure 4 shows the Rasch map for the CAT administered to the fifth graders. Unlike Study 1, in which the mean difficulty of the items was higher than the mean ability of thechildren, in this study the mean examinee ability was slightly above the mean item difficulty. Likewise, the range of 3.75 logits for the ability scores is about the same as the range of 3.60 logits for the item difficulty scores. This finding indicates that the items in the operational version of the CAT are also appropriate for these fifth graders. It likewise indicates that the average vocabulary ability of the children has improved from fourth to fifth grade. c Reliability: The reliability estimates improved noticeably from the pilot and Year 1 studies, to .80 for the measure based on all 52 items, .70 based on the cognates, and .62 based on the noncognates. Since reliability is a function of the heterogeneity of the sample tested, this increase is not surprising, as the spread of student ability was wider among the children as fifth graders (3.75 on the logit scale) than as fourth graders (2.98 on the logit scale). Overall, these findings indicate that the CAT is appropriate for children with abilities at the fifth-grade level. Likewise, the CAT’s moderate to high internal reliabilities on two different testing occasions provide some indication of its stability. d Analysis of fit to model: In the total test calibration, 90.4% of the items fit the Rasch model; five out of 52 (jest, malevolent, undermine, curative, and fiend) were misfitting. For the cognate calibration, three words were misfitting (anterior, curative, and malevolent), and for the noncognate calibration, three words were also misfitting (jest, undermine, and pun). Further analysis showed that, as in Study 1, the misfitting items were among the most difficult words on the test. Overall, all three calibrations showed an acceptable fit to the Rasch model. e Construct validity: As in Study 1, we divided the children into four groups according to their scores on the WLPB-R/PV: LSLE, LSHE, HSLE, and HSHE. This time the cut-off score used was 85. Table 8 shows the means and standard deviations of each subgroup’s scores on cognates and noncognates. As in Study 1, high Spanish vocabulary knowledge was related to high scores on the CAT cognates, but not on the noncognates. High English vocabulary knowledge was related to stronger performance on the noncognates. The overall Kruskal Wallis chi square for CAT cognate scores remained significant (!2 (3) " 21.37, p # .01). Moreover, as the
514
Development of a cognate awareness measure
Table 8 Means and standard deviations of subgroups of fifth graders on cognates and noncognates (Study 2) Means and standard deviations of fifth graders’ scaled scores on cognates Low Spanish (Picture vocabulary #85)
High Spanish (Picture vocabulary %85)
High English (Picture Vocabulary %85)
M " 90.48 (N " 24) SD " 13.98
M " 115.42 (N " 28) SD " 17.50
Low English (Picture Vocabulary #85)
M " 98.99 (N " 26) SD " 17.71
M " 105.77 (N " 54) SD " 24.36
Means and standard deviations of fifth graders’ scaled scores on noncognates Low Spanish (Picture vocabulary #85)
High Spanish (Picture vocabulary %85)
High English (Picture Vocabulary %85)
M " 99.47 (N " 24) SD " 18.86
M " 103.90 (N " 28) SD " 14.45
Low English (Picture Vocabulary #85)
M " 93.38 (N " 26) SD " 19.50
M " 93.17 (N " 54) SD " 13.25
upper portion of Table 9 shows, when measured by the cognates, the subgroups with high Spanish tended to outscore the subgroups with low Spanish, whereas subgroups with high English did not perform significantly better than subgroups with low English. The overall Kruskal Wallis chi square for noncognate CAT scores was also significant (!2 (3) " 9.13, p # .01). The lower section of Table 9 shows that the subgroups with high English consistently outscored the subgroups with low English on noncognates, whereas groups with high Spanish did not always perform significantly better than subgroups with low Spanish on these words. As in Study 1, the high Spanish and high English subgroup consistently performed better than the other subgroups, although this difference was not always statistically significant. Finally, as in Study 1, the correlation between CAT cognate vocabulary and the Spanish WLPB-R/PV was moderate (r " .38, N " 132, p # .01). Again, there was no correlation between CAT cognate vocabulary knowledge and English WLPB-R/PV (r" $.01, N " 132, p # .01). However, English vocabulary knowledge as measured by the English WLPB-R/PV was moderately related to CAT noncognate vocabulary (r" .37, N " 132, p # .01), while Spanish
Valerie Malabonga and Dorry M. Kenyon et al. 515 Table 9 Means and Kruskal Wallis chi-square for paired subgroups (Study 2, fifth graders) Paired Subgroups (N)
Mean Ranks
Kruskal Wallis chi-square
Cognates High Spanish vs. Low Spanish subgroups HSHE (28) vs. LSHE (24) 35.55 HSHE (28) vs. LSLE (26) 33.88 HSLE (54) vs. LSHE (24) 43.87 HSLE (54) vs. LSLE (26) 42.28
15.94 20.63 29.67 36.81
$2 $2 $2 $2
High Spanish vs. High Spanish subgroup HSHE (28) vs. HSLE (54) 49.71
37.24
$2 (1) ! 5.07, NS
Low Spanish vs. Low Spanish subgroup LSHE (24)vs. LSLE (26) 21.77
28.94
$2 (1) ! 3.06, NS
Noncognates High English vs. Low English subgroups HSHE (28) vs. LSLE (26) 31.27 LSHE (24) vs. HSLE (54) 44.58 HSHE (28) vs. HSLE (54) 52.50 LSHE (24) vs. LSLE (26) 26.63
23.44 37.24 35.80 24.46
$2 $2 $2 $2
High English vs. High English subgroup HSHE (28) vs. LSHE (24) 28.23
24.48
$2 (1) ! .80, NS
Low English vs. Low English subgroup HSLE (54) vs. LSLE (26) 38.66
44.33
$2 (1) ! 1.05, NS
(1) ! 21.74, p % .01 (1) ! 9.62, p % .01 (1) ! 6.56, p % .01 (1) ! .98, NS
(1) ! 3.36, NS (1) ! 1.76, NS (1) ! 9.13, p % .01 (1) ! .28, NS
vocabulary as measured by the Spanish WLPB-R/PV was not (r ! ".05, N ! 132, p # .01).
V Discussion The purpose of this study was to provide empirical evidence (Messick, 1989) for the claim that scores on the cognate subtest of the CAT are sensitive to awareness of cognates in Spanish-speaking ELL children and that scores on the test as a whole are related to first and second vocabulary knowledge. Our findings apparently demonstrate this. The reliability of both the cognate subtest and the entire test improved from the pilot to the operational version, and the internal reliabilities of the cognate subtest and the entire test were consistent on two testing occasions. The internal reliability of the noncognate subtest also improved from one testing occasion to the next. In comparing the children’s scores on the cognate and noncognate subtests, we found that the CAT cognate items appear to tap into a construct of cognate awareness. Higher scores on the cognate items
516
Development of a cognate awareness measure
were consistently related to higher scores on the Spanish WLPBR/PV but not to higher scores on the English WLPB-R/PV, whereas higher scores on the noncognate items were consistently related to higher scores on the English WLPB-R/PV but not to higher scores on the Spanish WLPB-R/PV. Also, the results indicate that, aside from the two distinct subtests, there appears to be a general vocabulary knowledge involved in doing well on the CAT, because children who had the highest scores on both the English and Spanish WLPBR/PV performed best on both CAT subtests. Although the CAT appears to be sensitive to the ability of Spanishspeaking children to use knowledge of Spanish words to discern the meaning of their English cognates, researchers and educators need to be cognizant of its limitations. Because we used word frequencies for adults rather than children, other researchers may want to develop the CAT further by using frequencies for children (e.g., Zeno, Ivens, Millard & Duvvuri, 1995, for English). Perhaps when word frequencies for children are used, the correlations between the CAT and the WLPB-R/PV, currently only moderate, may increase. At any rate, our results have important theoretical, research, and pedagogical implications. Theoretically, our findings provide some support for positive cross-linguistic transfer of cognate knowledge for Spanish-speaking ELLs with sufficient L1 vocabulary knowledge, but not necessarily for those with insufficient L1 vocabulary knowledge. This finding is consistent with Cummins’ (1979) theory that ELL children first need to reach a threshold or minimum proficiency in their L1 for it to transfer to their L2. Research-wise, the CAT is potentially useful for investigation of the development of cognate knowledge in ELLs. Future studies might investigate exactly what levels of Spanish vocabulary knowledge children need before that knowledge can help them with English word meanings, and whether children trained on one set of cognates could generalize to unlearned ones. Likewise, research could determine whether certain kinds of words are more susceptible to transfer, possibly by developing a scale of the distance between English and Spanish words based on orthography, morphology, and semantics. Finally, using think-alouds (as in the work of Jiménez et al., 1996) with the CAT would help elucidate how high scorers on the cognate items determine whether a word is a cognate. The think-aloud could query students on what aspects (phonological, orthographic, morphological, semantic, or a combination of these) they think they focus on to decipher a new English word; and the roles they think L1 and L2 proficiency play in their ability to use cognate knowledge.
Valerie Malabonga and Dorry M. Kenyon et al. 517
Pedagogically, the CAT could be useful for assessing the effectiveness of interventions designed to build Spanish-speaking children’s English vocabulary. Because English and Spanish share such a large number of cognates, interventions that build cognate awareness may be promising for this purpose. However, because of the CAT’s limitations mentioned above, its lower reliability when compared to standardized tests, and its primary goal of assessing cognate awareness rather than depth of vocabulary knowledge, educators should use the CAT concurrently with other vocabulary measures to provide a more accurate picture of improvements in their students’ overall vocabulary knowledge. The use of reliable and valid assessments will be crucial to assess the effectiveness of such interventions. The CAT takes the first step in providing such an assessment. Acknowledgements The Transfer of reading skills in bilingual children study was funded by Grant No. 5-P01-HD39530 from the National Institute for Child Health and Human Development and the Institute of Education Sciences of the US Department of Education to the Center for Applied Linguistics (CAL). We thank the CAL staff and the Language Testing reviewers for their helpful comments.
VI References Anderson, R.C., & Freebody, P. (1983). Reading comprehension and the assessment and acquisition of word knowledge. In B. Hudson (Ed.), Advances in reading/language research (pp. 231–256). Greenwich, CT: JAI Press. Anthony, E. (1954). The teaching of cognates. Language Learning, 79–82. August, D., Carlo, M., & Calderón, M. (2005). Development of literacy in Spanish-speaking English-language learners: Findings from longitudinal study of elementary school children. Perspectives, 31(2), 17–19. August, D., Carlo, M., Dressler, C., & Snow, C. (2005). The critical role of vocabulary development for English language learners. Learning Disabilities Research & Practice, 20(1), 50–57. August, D., & Hakuta, K. (1997). Improving schooling for language-minority children: A research agenda. Washington, DC: National Academy Press. August, D., & Shanahan, T. (Eds.). (2006). Developing literacy in secondlanguage learners. Report of the National Literacy Panel on LanguageMinority Children and Youth. Mahwah, NJ: Lawrence Erlbaum.
518
Development of a cognate awareness measure
Batalova, J. (2006). Spotlight on limited English proficient students in the United States. Washington, DC: Migration Policy Institute: Also available at http://www.migrationinformation.org/USfocus/print.cfm?ID!373 Biemiller, A., & Slonim, N. (2001). Estimating root word vocabulary growth in normative and advantaged populations: Evidence for a common sequence of vocabulary acquisition. Journal of Educational Psychology, 93(3), 498–520. Carlisle, J. F., Beeman, M., Davis, L. H., & Spharim, G. (1999). Relationship of metalinguistic capabilities and reading achievement for children who are becoming bilingual. Applied Psycholinguistics, 20(4), 459–478. Carlisle, J. F., Beeman, M., & Shah, P. P. (1996). The metalinguistic capabilities and English literacy of Hispanic high school students: An exploratory study. In D. J. Leu, C. K. Kinzer & K. A. Hinchman (Eds.), Literacies for the 21st century: Research and practice (pp. 306–316). Chicago, IL: National Reading Conference. Cunningham, T. H., & Graham, C. R. (2000). Increasing native English vocabulary recognition through Spanish: Cognate transfer from foreign to first language. Journal of Educational Psychology, 92, 37–49. Cummins, J. (1979). Linguistic interdependence and the educational development of bilingual children. Review of Educational Research, 49, 222–251. Davies, M. (2005). Corpus del Español. Retrieved 3 March 2005, from http://www.corpusdelespanol.org/ Dressler, C., & Kamil, M. L. (2006). First- and second-language literacy. In D. August & T. Shanahan (Eds.), Developing literacy in second-language learners. Mahwah, NJ: Lawrence Erlbaum. Dufva, M., & Voeten, M. J. M. (1999). Native language literacy and phonological memory as prerequisites for learning English as a foreign language. Applied Psycholinguistics, 20(3), 329–348. Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage. Boston, MA: Houghton Mifflin. Garcia, G. E. (1991). Factors influencing the English reading test performance of Spanish-speaking Hispanic children. Reading Research Quarterly, 26(4), 371–392. Jiménez, R. T., Garcia, G. E., & Pearson, P. D. (1996). The reading strategies of bilingual Latina/o students who are successful English readers: Opportunities and obstacles. Reading Research Quarterly, 31, 283–301. Kucera, H., & Francis, W. N. (2005). Kucera and Francis Word Pool. Retrieved 3 March 2005, from http://memory.psych.upenn.edu/wordpools.php Lalor, E., & Kirsner, K. (2000). Cross-lingual transfer effects between English and Italian cognates and noncognates. The International Journal of Bilingualism, 4, 385–398. Linacre, J. M. (2007). Person and item statistics in misfit order. Retrieved 16 January 2007, from http://www.winsteps.com/winman/table6_1.htm Linacre, J. M., & Wright, B. D. (2000). Winsteps. Chicago, IL: MESA Press. McNamara, T. F. (1996). Measuring second language performance. New York: Longman. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education.
Valerie Malabonga and Dorry M. Kenyon et al. 519 Nagy, W., Garcia, G. E., Durgonoglu, A. Y., & Hancin-Bhatt, B. (1993). Spanish-English bilingual students’ use of cognates in English reading. Journal of Reading Behavior, 25, 241–259. Nash, R. (1997). Dictionary of Spanish cognates. Chicago, IL: NTC Publishing. Schelletter, C. (2002). The effect of form similarity on bilingual children’s lexical development. Bilingualism: Language and Cognition, 5, 93–107. Snow, C. (2006). Cross-cutting themes and future research directions. In D. August & T. Shanahan (Eds.), Developing literacy in second-language learners. Mahwah, NJ: Lawrence Erlbaum. Snow, C. E., Burns, S., & Griffin, P. (Eds). (1998). Preventing reading difficulties in young children. Washington, DC: National Academy Press. Snow, C., Cancini, H., Gonzalez, P., & Shriberg, E. (1989). Giving formal definitions: An oral language correlate of school literacy. In D. Bloome (Ed.), Classrooms and literacy (pp. 233–249). Norwood, NJ: Ablex. Whitley, M. S. (2002). Spanish/English contrasts: A course in Spanish linguistics. (2nd ed.). Washington, DC: Georgetown University Press. Woodcock, R. W. (1991a). Woodcock language proficiency battery-revised, English and Spanish forms: Examiner’s manual. Itasca, IL: Riverside. Woodcock, R. W. (1991b). Woodcock language proficiency battery-revised, English form: Test book. Itasca, IL: Riverside. Woodcock, R. W., & Muñoz-Sandoval, A. F. (1995). Woodcock language proficiency battery-revised, Spanish form: Test Book. Itasca, IL: Riverside. Zeno, S. M., Ivens, S. H., Millard, R. T.,& Duvvuri, R. (1995). The educator’s word frequency guide. New York, NY: Touchstone Applied Science.