L2 Phonology Of Cantonese Speakers Of English Voicing And Aspiration Of Voicing And Stops In Onset And Coda

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View L2 Phonology Of Cantonese Speakers Of English Voicing And Aspiration Of Voicing And Stops In Onset And Coda as PDF for free.

More details

  • Words: 8,873
  • Pages: 37
L2 phonology of Cantonese speakers of English: Voicing and aspiration contrast of stops in onset and coda.1

Angus Fung Department of Linguistics University of Calgary

Supervisor: John Archibald April 2004

1.

1

I would like to express my gratitude to all those who helped me to complete this thesis. I am deeply indebted to my supervisor Prof. Dr. J. Archibald whose help, suggestions and encouragement helped me in all the time for writing of this thesis. I extend my thanks to him for his countless hours of discussion and commentary.

Introduction Second language acquisition is the phrase used to describe the process that people go through when confronted by a need to use a language other than their native one for communication. People acquire their first and second languages differently. Some of the issues and processes involved in language acquisition include the idea of innateness (Is language ability determined genetically?), the relevance of the language input the language learner receives, and the nature of early (developmental) grammars (O'Grady et al, 1989). In this paper, I am going to address a number of issues that have to do with the acquisition of voicing and aspiration contrast in a second language (L2). My major focus will be on what Cantonese speaker learners do when they are learning English stops. I will also look at a few other languages and their acquisition of new stop consonants in an L2. Most if not all of the pronunciation problems encountered by Cantonese learners of English may be adequately accounted for by the contrastive differences of the two languages. I will also examine the phonological differences between the two languages, ranging from their phoneme inventories, the characteristics of the phonemes, the distributions of the phonemes, syllable structure. At the segmental level, substitution by a related sound in the native language, deletion and epenthesis are by far the most common strategies. Cantonese and English are two typologically different languages. Cantonese is one of the major dialects of Chinese and the language belongs to the Sino-Tibetan language family. It is spoken in Guangdong (including Hong Kong), Macau, and in the southern part of Guangxi (Figure 1). On the other hand, English is a Germanic language which belongs to the Indo-European family. (Ethnologue, 2004).

2

Figure 1: Map of Guangdong Province (Wertz, 2003)

China

Figure 1

2. Phonetics of VOT In this section, I will take a look at the production and perception of stops in term of different voice onset time, one of the cues to contrast voicing and aspiration contrast in stops 2.1 Articulation Voice Onset Time (VOT) is the duration of the period of time between the release of a plosive and the beginning of vocal fold vibration. This period is usually

3

measured in milliseconds (ms). It is useful to distinguish at least three types of plosives with different VOT: voiceless-unaspirated, voiceless-aspirated and voiced. Figure 2 shows the waveforms of the two plosives “t” and “d”. The arrows indicate the release burst of the stop consonant and the onset of glottal vibrations for the vowel. Clearly, the VOT is longer for the voiceless than the voiced stop. This is due to the glottal abduction, which is the closure of the vocal folds for the voiceless stop and its temporal relationship to the oral closing and opening movements.

Figure 2. The picture is a waveform of English [t, d] each followed by the vowel [a]. The y-axis represents amplitude. The x-axis is time - 1.5s overall. Morton, K. (1995)

2.2 Voicing and Aspiration When a plosive sound has a fairly long positive VOT (longer than about 50ms). The air from the lungs is traveling quite quickly through the vocal tract. It is not slowed down either by the vocal folds, which are open, nor by a constriction in the vocal tract because the plosive has been released. The rapid airflow creates a weak friction noise. When a voiceless unaspirated plosive is followed by a vowel, the time when the vocal folds begin vibrating for the vowel will coincide almost exactly with the time when the plosive is released (give or take up to 20 milliseconds). After a voiceless aspirated stop, however, the vocal folds will not begin vibrating until well after the plosive is released. The production of stops is not always uniform in terms of VOT, but when you have two or more contrasting stops in a language, for example /t/ and /d/ in

4

English. The two stops would be produced within a particular range of VOT. In the following graph (Figure 3), it shows the production of a speaker of American English for words beginning with /d/ or /t/. The production of /d/ ranges from 0ms to 25ms and that of /t/ ranges from 50ms to 80ms.

Figure 3. VOT production of a single normal adult speaker of American English for words beginning with /d/ and /t/. Blumstein et al., (1980)

These are just two different possible ways of coordinating the timing between vocal fold vibration and a closure in the mouth. Various languages make use of many points along this VOT continuum. In the following diagrams (Figure 4), the top half represents the closing and opening of a plosive in the mouth and the bottom half represents the state of the vocal folds -- a straight line means voicelessness and a wavy line means voicing. Lip closure 1 fully voiced

/ba/

5

2 partially voiced

/ba/

3 voiceless

/pa/

unaspirated 4 aspirated

/pha/

5 strongly aspirated

/pha/

Figure 4. Different VOT of stops. (Russell, 1997) Languages that make voicing contrasts usually choose two or three points along this continuum (Abramson, & Lisker, 1970). English has chosen to use position 2 for its voiced sounds and either 3 or 4 (depending on position in the word or syllable) for voiceless sounds. French has chosen to use 1 (fully voiced) and 3 (voiceless unaspirated) (Flege 1987). Cantonese has chosen to use 3 (voiceless unaspirated) and 5 (strongly aspirated) (Tsui & Ciocca 2000). 2.3 Perception of VOT The perception of the voicing and aspiration contrasts (e.g. /p/ vs. /b/, /ph/ vs. /p/) in stops depends on acoustic cues such as VOT. We usually do not perceive stimuli categorically (Kess 1992). For example, we do not see a colour spectrum from blue to red as either pure blue or pure red and nothing in between. A colour can be kind of blue and kind of green. Whereas a stop cannot be kind of [d] and kind of [t]; it is either a [d] or a [t]. One of the things that people seem to perceive categorically is speech. This is called categorical perception because instead of getting a percept that is ambiguous, you get a percept that perfectly matches an example of a particular category. So even when the physical stimuli change continuously, people would still perceive it categorically. For example, both /b/ and /p/ are stop consonants and to produce these, you close your lips, then open them, release some air, and the vocal cords begin

6

vibrating. The difference between /ba/ and /pa/ is the different VOT of the two stops. For /b/, VOT is very short; voicing begins at almost the same time as the air is released. For /p/, the onset of voicing is delayed. To show the categorical perception of stops, a study by Pisoni & Tash (1974) used a series of synthetic stimuli that span the VOT continuum between /ba/ and /pa/. When people were asked to identify these stimuli, they generally have no difficulty: the lower half of the continuum is consistently identified as /ba/ and the upper half as /pa/ as show in Figure 5. People did not report hearing something that is a bit like [b] and a bit like [p]. Rather, they report hearing either [b] or [p]. Thus, the actual VOT of the individual stimulus appears to be discarded, and all that remains in the percept is category membership. Because of the categorical perception of speech, it is not an easy task for people to distinguish all speech sounds. Generally, they can only distinguish the speech sounds that result in meaningful differences in their native language. To find out an infant’s ability to discriminate different speech sound, Eimas et al, (1971) tested two groups of infant whowere1 month and 4 months of age in their study. Result showed that infants at both ages distinguished sounds that were members of separate phonemes (i.e. categories) from one another but they failed to distinguish sounds within a given category. The study also shows that infants can distinguish speech sounds before they can produce them. Figure 6 shows the result of this experiment. For stops with VOT at –20ms and 0ms, infants perceived them as the same stop; it is also true for stops with VOT at 60ms to 80ms. But for the stops with 20ms and 40ms VOT, they perceived as two different stops.

7

Figure 6. Experimental design of infant discrimination study. Eimas et al, (1971) Note S = perceives as the same stops; D = perceived as two different stops. In the next section, we are going to look at the differences between English and Cantonese phonological systems, as this would help us to account for problems and difficulties encountered by Cantonese speakers in the process of learning English pronunciation. 3. Phonology of Syllables There are 24 consonants in English and 19 consonants in Cantonese. In both English and Cantonese, they both have six stops in bilabial /p, b/, alveolar /t, d /, and velar /k, g/. In English, /p, t, k / are voiceless whereas /b, d, g / are voiced. In Cantonese, however, there are no voiced plosives; all plosives are voiceless. The feature that distinguishes between the stops is aspiration (/p, t, k/ vs. / ph, th, kh/).

8

Table 1. An overview of English and Cantonese consonants (Chan & Li ,2000). Method of articulation

Place of articulation Bilabial

(C) Plosives (E) (C) Fricatives (E) (C) Affricates (E) (C) Nasals (E) (C) Lateral (E) (C) Approximants (E)

Labio-dental

Velar

Labio-velar

t, th

k, kh

kw, kwh

t, d s

k, g

f f, v

s, z

p, ph p, b

Dental

Alveolar

ts, ts

Palatalalveolar

Palatal

h , 

h

h

t, t m

n



m

n



l l w w

Glottal

j r

j

English has a relatively complex syllable structure. There can be a maximum of three consonants before a vowel and a maximum of four consonants after a vowel. One such example is ‘strengths’ /streks /. The syllable structure of Cantonese, in contrast, is rather simple; the possible combinations of sounds are restricted. Unlike English, there are no consonant clusters in Cantonese. Thus, in terms of possible configurations of V and C, English clearly outnumbers Cantonese, the latter being limited to V, CV, VC, and CVC. Examples are given in Table 2 below: Table 2 Syllable structure Examples V // _ ‘exclamation showing surprise’ CV /fu/ _ ‘husband’ VC / an/ _ ‘late’ CVC /sik/ _ ‘colour’

In terms of distribution of consonants, all the stops in English may occur in initial or final position of a syllable except [] which cannot occur in syllale initials. In contrast, only /p, t, k / in Cantonese may occur in syllable-final position, as illustrated in Figure 7. It should be noted that unlike plosives in English, Cantonese plosives in

9

word-final position are always unreleased. For example, in the word ‘duck’, /ap/, the word ‘prosper’,/fat/, and in the word ‘house’, /uk/. Whereas in English, unreleased stops only occur in connected speech when a word-final stop is followed by a word in a word initial stop. For example, the word final [p] of the word “cup” is unreleased when it is followed by a consonant. (1) “cup to”

/kptu/

(2) “cup on” /kpn/ Figure 7. Syllable structure of Cantonese and English. Cantonese: _ onset

_ rhyme

nucleus (C)

English:

V

onset

coda (C) p, t, k m, n, 

CCC

rhyme nucleus

coda

V

C C C C

p, b, t, d, k, g, f, v, s, z, , , h, t, t, m, n, l, w, r, j

4. Explaining L2 Behavior So, as we have examined the phonological differences between the two languages, I would like to review the behavior of L2 learners, how do we predict what they will do if the target forms are not found in their native language. Second language researchers have proposed a number of theories to explain why certain target forms are more difficult to acquire than others. One of the earliest was the Contrastive Analysis Hypothesis (Lado, 1957). This hypothesis stated that when two languages are similar, positive transfer will occur and hence those form will be easy to learn; where they are different, negative transfer or interference will result and those forms will be difficult to acquire. However, it turns out that defining similarity and difference is not always easy. Some researches (Eckman & Iverson

10

1993, 1994) suggested that typological markedness be the basis of prediction. Structures that are complex and/or especially common in human language are said to be unmarked, while structures that are less complex or less common are said to be marked. A definition is given in Eckman (1981). "A phenomenon A in some language is more marked relative to some other phenomenon B if, cross-linguistically, the presence of A in a language necessarily implies the presence of B, but the presence of B does not necessarily imply the presence of A." In other words, when a language has voiced stops e.g. [d], we would expect that the language would have a voiceless counterpart, [t] but not vice versa. From that, we could say that voiced stops are more marked than voiceless ones. Sometimes something that is not in your L1 can be easy to acquire, e.g. English does not make contrast between [] and [] in word initial position. But English speakers seem able to make the contrast in French onsets without trouble. The Markedness Differential Hypothesis (MDH) investigates second language acquisition by comparing the relative markedness of structures in L1 and L2. In those areas in which there are differences between a target and a native language, the degree of difficulty will be greater when the area of difficulty is more marked in the native language and smaller when the degree of markedness is smaller. The degree of difficulty among those target language (TL) structures that are different from those in the native language (NL) will correspond directly to the degree of markedness The two

considerations made by the MDH that we need to consider when predicting L2 difficulty of the target language are as follow: -The difference between the NL and TL. -The markedness relationships holding between those areas of differences. In (3), the presence of nasal vowels implies the presence of oral vowels but not vice versa. There are languages which have [a] and [a]; languages which have [a] alone, but there are no languages which have [a] but not [a]. From that, we know that nasal vowels are more marked than oral vowels and so we would predict that the degree of

11

difficulty is higher when there are nasal vowels in the target language but not the native language. (3) [a] implies [a] ∴Nasal vowels are more marked than oral vowels. Hence, the prediction of the MDH would be that nasal vowels are more difficult to acquire. On the other hand, those TL differences that are not more marked will not be difficult. MDH can explain several major patterns of difficulties found in second language acquisition. Now we know that what kind of target forms are difficult for L2 learners, we will discuss what L2 learners will do when the target forms are difficult to acquire. 4.1 Repair Strategies It is a common phenomenon in second language learning which involves modifying an L2 word so that it fits the L1 syllable structure. For example in Japansese loanword, “strike” /straik / becomes /sutoraiki/ because Japanese mainly allows CV in its syllable structure. Another example is found in German speakers. When they are learning English, they would produce words with syllable final obstruent devoicing (producing [hæt] for [hæd] “had”) because they have no voicing contrast at the end of words in their L1. The consonant-vowel (CV) is the least marked syllable structure because it can be found in all languages in the world. In order for Cantonese speakers to pronounce the target English items, Cantonese speaker would adopt a number of strategies to break up the more complex, more marked syllabification in English. Epenthesis is one of the strategies Cantonese speaker use. A vowel, usually a schwa // is inserted between a consonant cluster or after a final consonant of the syllable. Another repair strategy is deletion. In this case, Coda consonants or one of the consonant clusters are deleted in order to obtain the more optimal syllable (CV). The final type of strategy concerning coda consonants or onset consonant clusters is replacement or substitution. This strategy doesn’t alter the syllable structure and it

12

appears quite frequently in final voiced stops (Edge, 1991). For Cantonese L2 learners of English, the most number of errors found in these items are voice feature. Devoicing is the most common in final voiced stops. The follow examples illustrate the three strategy mentioned above. (4) Solutions to syllable structure problems: a. Epenthesis /dg/  /dg/ /dg/  /d/

b.

Deletion

c.

Devoicing /dg/  /dk/

Different strategies for syllable structure simplification result in different outcomes: CVC sequences undergoing final consonant deletion or epenthesis surface as CV syllables, whereas repair strategies such as final devoicing and substitution maintain the relatively marked, closed CVC structure. Even though both deletion and epenthesis convert the relatively complex CVC syllable into relatively simple CV syllables, their outcomes differ as to what degree of ambiguity they impose on a word. According to Weinberger (1994), recoverability is a principle “subsumed under a theory of universal grammar” languages, native speakers, and language learners avoid or minimize ambiguity. Young children frequently delete segments in both onset and coda position but very rarely make use of epenthesis. This is because their phonetic ability is low and their functional knowledge (in terms of the recoverability principle) is not yet developed. Adults learning L2s seem to exhibit far more instances of epenthesis than children acquiring their L1. The reason why epenthesis is a more common simplification strategy in adult L2 acquisition is, according to Weinberger (1994), that even though adults’ phonetic skills in the target language lag behind that of native speakers, they do have access to the recoverability principle. To learn more about recoverability of L2 learners, Abrahamsson (2003) did a study of Chinese-Swedish interphonology. Three Chinese subjects were included in this longitudinal study of their L2 acquisition of Swedish. Recordings were made in a 3- to 5-week intervals from August 1990 to May 1991. This experiment was to test his hypotheses about L2 learns’ developmental aspect and selection of repair

13

strategies by L2 learners regarding grammatical and functional aspects. He predicted that the error frequencies will be relatively low in the initial stages, higher frequencies at a later stage, and relatively low frequencies again at even later stages of acquisition. Also, epenthetic forms will be relatively lower in early phrases of development but greater in later phrases. Figure 8 shows the results of the overall error frequencies in the experiment. The result agreed with his prediction that learners’ acquisition of codas can be characterized by the following four phases: (a) an initial phase with relatively high error rates, followed by a rapid decrease in error frequency; (b) a linear increase in error frequency; (c) a stable plateau phase of relatively high error frequencies; and (d) a possible decrease in error rates as acquisition proceeds.

Figure 8. overall error frequencies (deletion + epenthesis), development over time.

Figure 9 gives a summarized description of what the pattern looks like when the mean epenthesis proportions for the autumn semester 1990 are compared with the mean proportions for the spring semester 1991. Subject C1 already used epenthesis more than twice as much as deletion during the autumn semester (epenthesis-deletion proportion: 2.13) and almost three times as often during the spring semester (proportion: 2.87). C2’s use of epenthesis is barely half as frequent as his use of deletion during the first semester (proportion: 0.44), but there is a significant increase 14

in his use of epenthesis, which is almost as frequent as deletion during the second semester (proportion: 0.88). Finally, C4 increased her use of epenthesis, which was nearly as frequent as deletion during the autumn of 1990 (proportion: 0.75), to a level almost three times as frequent during the spring of 1991 (proportion: 2.77). This was a significant change

Figure 9. proportion of epenthesis to deletion errors, development over time. The functional or grammatical role of the coda also determines the use of different repair strategies. In Abrahamsson 2003’s hypothesis, word-final codas that are relatively more important for the retention of semantically relevant information will generate lower overall frequencies of simplification, greater epenthesis-deletion proportions, or both, than will codas containing information that is more recoverable (or predictable) from other segments or features in the context. In Swedish, /r/ coda can serve as a plural marker, or a tense marker and also occurred in noninflected words. According to Abrahamsson’s hypothesis, if the final consonant of an noninflected word has been deleted, it is generally not expressed by other explicit markers or features in the context, and it can be argued that deletion of the stem-final /r/ results in much greater lexical-semantic ambiguity than the partial deletion of an inflectional morpheme. It may therefore also be argued that the retention of final /r/ is

15

more beneficial in noninflected words. To test the hypothesis, inflected words that ended in either the present-tense morpheme -r/-er or the plural morpheme -r/-ar/-er/-or were compared with noninflected words with stems that ended in a single /r/. Figure 10 shows the proportions of epenthesis to deletion, although the differences again appear to be very small, all subjects produced significantly more epentheses for noninflected forms than for inflected forms.

Figure 10. proportion of epenthesis to deletion errors, inflectional vs. lexical /r/ codas.

Figure 11. proportion of epenthesis to deletion errors, present tense vs. plural /r/ codas. Two pairs of word classes were compared on the subject of epenthesis and deletion. One of them is the comparison between present tense and plural, As can be seen in Figure 11, there is no consistency between the three subjects: C1 used epenthesis significantly more often for present-tense (proportion: 0.1) than for plural codas (0.02); subject C2 did not differentiate his use of epenthesis between the two 16

inflectional categories in any significant way. The other comparison deals with differences between an open- or closed-category words. Since Swedish word-final /r/ of open-class words is less recoverable from the context, they will thus be pronounced more accurately with a lower overall error frequency and a higher proportion of epenthesis than the more recoverable or predictable /r/ of closed-class words. The result is shown in Figure 12.

Figure 12. Proportion of epenthesis to deletion errors, closed-class vs. open-class /r/ codas. It is generally believed that greater accuracy is obtained by L2 learners as style becomes more formal in learners’ production of singleton consonants (Schmidt, 1987). However, Lin (2001) found that in the case of consonant clusters, it is the learners’ choice of repair strategy but not the error rates that varies with the style of speech. Twenty Chinese adults were included in her study of production of English onset consonant clusters in four different types of tasks. The experiment include a wide variety of task types, ranging from the most formal “reading of minimal pairs”, “word list reading”, “sentence reading” to the least formal “conversation” as shown in the following Figure 13. Figure 13 reading of minimal pairs most formal

word list reading

sentence reading

conversation least formal

17

The results of the error rates support her hypotheses and do not conform to the general prediction that more accuracy will be obtained from L2 learners’ production of target items as the style becomes more formal. There is no significant difference was found in the students’ error rates in the four speech tasks as shown in Figure 14.

Figure 14. Overall error rates in the four tasks. (Lin 2000) Her study also showed that the use of epenthesis increased as the style of the task became more formal, and the percentage of deletion and replacements became higher in less formal tasks. It is also true that the proportion of epenthesis vs. deletion should be greater in tasks without linguistic context than in tasks with linguistic context. For tasks that were more formal or that require more attention to form or pronunciation rather than to content, the use of epenthesis would increase. One the other hand, when the tasks became less formal or as more attention was paid to content rather than form, more instances of deletion and replacement would be preferred. The results of her experiment indicate that what is shifted with style is the learners’ choice of the repair strategies rather than the accuracy rates.

18

Figure 15. Percentages of the three strategies in the four tasks. (Lin2000) Note: MP = minimal pair; WL = word list; S = sentence; C = conversation.

5. Phonetics of L2 Learners So, can L2 learners acquire new VOT? In this section, I will review the existing literature that studied the acquisition of different stops in L2 which are different from their L1. Curtin et al. (1998) Curtin et al. (1998) studied the acquisition of Thai voice and aspiration by English and French speakers. Thai has a 3-way voicing contrast phonemically in stops which includes voiced, voiceless unaspirated and voiceless aspirated stops. English also has the three phonetically different stops, but only two phonemically different stops. Aspiration is not the contrasting feature in the language in English and so there is no lexical distinction between aspirated and non-aspirated stops. Still there is a phonetic difference between the [p] in “spin” and the aspirated [ph] in “pin”. Underspecification means that underlying representations are not fully specified and that predictable information is not underlyingly present. Underspecification theory expresses this by assuming that underlyingly both p's are not specified for aspiration. In this study, Curtin et al. (1998) wanted to find out whether allophonic aspiration in English [p] vs. [ph] aids in the acquisition of contrastive aspiration in Thai /p/ and / ph

19

/. They also wanted to compare the developmental progression of the English learners to that of native speakers of French. Like English, French has a 2-way voicing contrast both phonemically. But phonetically, it only makes voicing contrast with no aspiration contrast. You could find voiced and voiceless stops in French, but you couldn’t find any aspirated stops in French There is some cross-language speech perception research (Abramson and Lisker, 1970; Strange, 1972; Pisoni et al., 1982) which has shown that English speakers find it easier to perceptually distinguish aspirated-unaspirated segments (e.g. /ph/ vs. /p/) than voiced-voiceless segments (e.g. /p/ vs. /b/) in the synthetic VOT study. But in Curtin et al. (1998)’s study, result showed the opposite in one of the tasks. English speakers did better in distinguishing voiced-voiceless segments than aspirated-unaspirated segments in a minimal pair task. Curtin et al. (1998) claimed that the contradictory orders (aspiration contrast are perceptually easier to distinguish by English speakers, but English subjects did better in voicing contrast in this study) of acquisition of L2 voiced and aspiration contrasts by native speakers of English can be explained by the generative phonological differences between lexical and surface representation and responses on that task must be made on the basis of lexically stored representation. The details and the result of the experiment will be discussed later in this section. Aspiration is not part of the lexical representation in English; all voiceless stops are stored as unaspirated in the lexicon and emerge in the fully specified phonetic representation. Underspecification theory expresses this by assuming that underlyingly both /p/s are not specified for aspiration in [ph in] and in [spin]. The aspiration feature in [ph in] is later specified by a context-sensitive at the beginning of a syllable; aspiration does not apply in other contexts. English has no lexical distinction between aspirated and non-aspirated stops but still there is a phonetic difference. (5) Lexical representation: Aspiration rule: Surface representation:

/pæt/ [phæt] [phæt]

/spæt/ — [spæt]

/bæt/ — [bæt] 20

Triads of words that minimally differ in both voice and aspiration are found in Thai, neither of these features is predictable and so both voice and aspiration features are represented lexically. (6) /bèt/ ‘fishhook’

/pèt/ ‘duck’

/phèt/ ‘spicy’

The first task of the study is a Minimal Pair Task. Nine Canadian English speakers, 8 Canadian French speakers and 10 native speakers of Thai (controls) were asked to choose between pictures of words that are in minimal pair relationship, when presented with one word aurally. The pictures of the minimal pair are accompanied by a picture of a foil that differs phonetically in more than one segment from the other words. An aural presentation was heard and subjects were asked to respond by pressing a key that corresponds to the position of the appropriate picture on the screen. This task was used to study the development of lexical representation and to find out if the subjects could lexically contrast both voice and aspiration, to see if they can access the correct lexical entry if they hear a word. The second task is called an ABX Task. In this task, a minimal pair ‘AB’ is presented aurally followed by a third word ‘X’ that matches either A or B. The tokens used for A, B and X were each produced by a different speaker. There were 72 trials: 16 each of Aspiration–Voiceless, Voiced–Voiceless and Aspiration–Voiced, and 24 Place controls. Subjects were asked to matches either A or B when they heard a third word ‘X’. The results of the Minimal Pair task show that aspirated–unaspirated Minimal Pairs were discriminated by both English and French groups at a level only slightly better than chance, performance on the voicing contrast was better (Figure 16). This experiment lasted for 11 days and results were collected in day 2, day 4 and day 11. From the results in the last day (day 11), we could see the developmental difference between some of the English and French subjects. This suggests that the presence of surface aspiration in English might facilitate the establishment of a lexical aspiration

21

contrast in the L2 acquisition of Thai. Because of this, Curtin et al. (1998) suggested that L1 surface features can be lexicalized in L2 acquisition.

Figure 16. Minimal Pair Task- proportion correct (Curtin et al. 1998) French only has voicing contrast in both lexical and surface representations, so as expected in the ABX task, French speakers perform better on voice contrast than on aspiration (Figure 17), similar to what they did in the Minimal Pair task. English speakers perform similarly on voicing and aspiration contrast in the ABX task as shown in Figure 17. This ABX results were quite different from what English speakers did in the Minimal Pair task in which their performance on aspiration was significantly worse than on voice.

Figure 17. ABX Task- proportion correct (Curtin et al. 1998)

22

Curtin et al. (1998) claimed that the Minimal Pair task accesses lexical representations which lack aspiration in English, while the discrimination task accesses surface representations which contain aspiration in English. We could see from the results of an ABX discrimination task that English subjects did better than the French subjects on aspiration. L2 learners initially construct lexical representations that make use of only those features that are present lexically in the L1, even though they may be able to discriminate other L2 contrasts on the basis of surface features, and may eventually lexicalize these surface features. Results show that aspirated–unaspirated Minimal Pairs were better discriminated by the English speakers than the French speaker. The French speakers perform better on the voice contrast than on aspiration. In a task which accesses lexical representations, English learners lack aspiration discrimination, while the task that accesses surface representations, English speakers did better in aspiration discrimination. It was supported by results from the discrimination task that English subjects did better than the French subjects. Flege and Eefting (1988) Flege and Eefting (1988) examined the imitation of a VOT continuum ranging from /da/ to /ta/ (-60 to +90 ms) by subjects differing in age and/or linguistic experience. Subjects were native speakers of English, native speakers of Spanish and bilingual speakers of both. Spanish and English use different phonetic categories to implement the contrast between /t/ and /d/. In Spanish, [d] is used to implement /d/ and [t] implements /t/. Spanish categories of [d] and [t] yield stops with VOT values of approximately –80 ms and 20 ms respectively, in word initial position. In English, /d/ is implemented by [d] and [t], and /t/ is implemented by [th]. English output of [d] and [t] result in VOT values of about –80 ms and 20 ms. The rule used to implement [th] yields VOT values of approximately 80 ms. (Flege and Eefting, 1986). Figure 18

23

illustrates how English and Spanish speakers divide up a VOT continuum based on their native language catergories. English

/d/

Spanish

/d/ -80

/t/ /t./ VOT in ms.

80

Figure 18. Identification of a VOT contiuum by English and Spanish speakers In the experiment, subjects were asked to identify the stimuli before imitating them. The stimuli, which consisted of a 16-member continuum ranging from /da/ to /ta/, were presented twice on each trial. Results showed that regardless of the properties of the acoustic input, children and adults who spoke only Spanish produced only lead and short-lag VOT responses, which are their phoneme boundaries in their L1 and they perceived the VOT continuum input as a member of either of their L1 categories (Figure 19). English speakers also tended to produce phoneme boundaries in their L1. They produce stop with only short-lag and long-lag VOT values (Figure 20). On the other hand, native speaker of Spanish who spoke English produced stops with VOT values falling into three modal VOT ranges (Figure 21). They had acquired a new phonetic category that isn’t in their L1.

24

Figure 19. The frequency of VOT values produced by the native Spanish subjects. SA=Spanish adult SC=Spanish children

Figure 20. The frequency of VOT values produced by the native English subjects. EA= English adult EC= English children

25

Figure 21. The frequency of VOT values produced by the native Spanish speakers of English. LCB= late childhood bilinguals. ECB= early childhood bilinguals BC= bilingual children

6. Phonology of L2 Learners After looking at the phonetics of L2 learners, we will now consider what is acquired to be acquired in the domain of phonology. In this section, we are reviewing literatures that examined segmental level, which has to do with phonological segments (consonants) and prosodic level, which has to do with syllabification in L2 phonology. Eckman & Iverson (1993) Even when the L1 has no clusters, some clusters are easier to acquire than other. E.g. [pl] is easier to acquire by L2 learners than [fl]. To explain the phenomenon, Broselow & Finer (1991) proposed that a Minimal Sonority Distance (MSD) parameter can give us the prediction on the acquisition of L2 consonant clusters in syllable onsets. The basis for the markedness of the clusters in Broselow & Finer (1991)’s study is the Sonority Index shown in (7) and the proposed MSD parameter. (7) Sonority Index Class Scale Stops 1 Fricatives 2 Nasals 3 Liquids 4 Glides 5 The function of the MSD parameter is to provide a characterization of consonant clusters allowed in a language. Languages can be constrained by the minimal difference allowed in syllable onsets on the Sonority Index. Other things being equal, languages that required a greater difference in sonority between adjacent segments will have fewer kinds of consonant clusters in the onset. E.g. a stop-liquid cluster [pr] would be less marked than a stop-fricative cluster [ps]. But Eckman & Iverson (1993) argued it is typological markedness rather than sonority distance which better explains 26

L2 learners’ knowledge of English clusters in syllable onsets. they suggested sequential markedness principle as the better explaination: “For any two segments A and B and any given context X_Y, if A is less marked than B, then XAY is less marked than XBY.” On this assumption, since [p] is less marked than [f], hence [pr] clusters are less marked than [fr] clusters and are predicted to cause less IL difficulty than do [fr] clusters. Eckman & Iverson (1993) did an experiment with 11 subjects: 4 Japanese, 4 Korean, and 3 Cantonese speakers. They studied the production of English onset consonant clusters (CCV). Threshold for definition of acquisition is said to have the onset in the IL of a subject if the subject produces onset clusters at least 80% of the time on at least 4 attempts. The data was collected 8 times in casual conversations between 5 to 10 minutes. No attempt was made to control the vocabulary used by the subject. They claimed that a less marked cluster would be present just in case one or more of the more marked clusters is also present. 55 potential test of their claim (five sets of onset per subject  11 subjects) were collected. Out of the 55 potential tests, the data allow 50 to be tested (91%). Five of the potential tests yield no result because the subject did not produce at least four tokens of the relevant clusters. Four instances out of these 50 appeared to go against what typologcal markedness would predict. In 92% of the cases, the subject’s performance obeyed the markedness predictions. The four cases which ostensibly violated what typological Markedness would predict. two cases were from Cantonese speaking subjects in which they got the two clusters [br] and [fr] but not [pr]. Since [p] is less marked than [b] and [f], we would expect that [pr] would also be less marked than [br] and [fr]. Analysis of the actual errors from these two subjects showed that both of them substituted [ ph] onsets for the intended [pr] onsets. In order to explain this, Eckman & Iverson (1993) assumed that on the basis of similarities in VOT, the two subjects are associating their NL /p/ with the TL /b/, and their NL / ph/ with TL /p/. (8) Mapping of the NL obstruents on to the TL. 27

NL TL /p/  /b/ /ph /  /p/

Short-lag VOTs. Long-lag VOTs.

With this assumption, the subjects’ production would agree with markedness prediction because aspirated stops are typologically more marked relative to unaspirated stops. Hence, the [ph]-liquid onset is more marked than [p]-liquid onset and [f]-liquid onset. From Eckman & Iverson’s explanation, it brings up the question whether Cantonese speakers might have this kind of mapping. Edge (1991) This is a replication and extension of Eckman’s (1981) study on the production of English word-final voiced obstruents by native speakers of Japanese and Cantonese. In Edge’s (1991) study, the data of native speakers of English was included to account for the native devoicing and epenthesis. This was done to avoid classifying native-like articulation as evidence of IL rules since devoicing, vowel epenthesis, and the deletion of final voiced obstruents all characterize spoken English. 7 Japanese, 7 Cantonese and 4 native speakers of English were subjects of this study. The tasks in this study included (1) a picture-elicited storytelling task which contained words with voiced obstruents, (2) an oral reading of a short story and (3) an oral reading of 41 randomly ordered words. The voiced obstruents were classified in the data as either target, deletion, glottal stop substitution, devoicing, epenthesis, fricativization and other consonants substitution. In Eckman’s model, while the surface phonetic forms are influenced by language-specific processes, the underlying processes, such as terminal devoicing, are universal. Edge’s data from the Cantonese speakers provide evidence for an IL rule of terminal devoicing and supporting Eckman’s hypothesis. For the Cantonese subjects, 67% of the non-target variants were devoiced and deletion appeared to be more frequent in connected speech. When compared to deletion in the Native English subjects’ data, the deletion of Cantonese subjects is quite different in its distribution. While deletion of /v/ in function words

28

(fond of playing) rarely occurred, deletion of final /g/, as in dog and of /d/ after a diphthong in words (beside) occurred across phonetic environments. The results of this experiment indicate that under the three tasks, devoicing is the strategy that was most frequent used by Cantonese speakers. It is also important to take into account native speech in formulating rules for a language learner’s IL production. After we’ve loo Cichocki, et al. (1999) Cichocki, et al. (1999) studied the acquisition of French consonants by native speakers of Cantonese in onset and coda positions. The two consonant inventories differ in several ways. French allows more consonants in both onset and coda position. The number of consonants differs greatly between the two languages in coda positions since Cantonese only allows unreleased stops /p, t, k/ and nasals in the coda. Cantonese does not have the voiced/voicing contrast found in French stops but does have an aspiration contrast that is implemented as voiceless unaspirated and voiceless aspirated. There were 6 subjects in this study and their level of proficiency in French was at the upper beginner and lower intermediary levels. The subjects were asked to read a passage in the first task. For the second task, subjects were given an English and Cantonese translation of the items and were asked to give the French equivalent. The 37 words were expected to be well known. Only five words were unknown to some of the subjects and only three cases were the target words read and repeated after the fieldworker. In judging whether a response was acceptable or unacceptable, they followed principles such as judging the response as acceptable when it was or contained a merely sub-phonemic inaccuracy even though it contained a wrong nucleus, e.g. [ph ] was treated as acceptable for initial /p/. They also judged as acceptable when it ended in a nonnuclear element agreeable with the target phoneme even though it contained a wrong nucleus, e.g. [sz] for /s/ and [sz] for /z/. Finally, they also judged as acceptable

29

when the target contained an allophone of the target but ended in a wrong phonetype, e.g. [p] for /p/. As we can see from the table below (Figure 17), focusing on the result of stops, Cantonese speakers had greater problem in producing French initial voiceless stop /p, t, k/ accuracy around 50% even though their native language has the equivalent phone types. They made errors by producing the stops with prevoicing and sometimes with a schwa-like vowel inserted after the consonant. In learning to produce onset /p, t, k/, about 40% of their production were voiced [b, d, k], 35 % are voiceless aspirated [ph, th, kh], and only about 20% are voiceless unaspirated [p, t, k]. This contradicts the MDH because these French stops have Cantonese counterparts and one might expect that they be easily learned. In coda position, the result of this experiment is expected as Cantonese speakers have more difficulty in voiced stops than in voiceless stops. The voiced stops are nearly always devoiced in final position. As Figure 18 shows, of all the errors made in the production of stops, 95% included errors made involving the presence or absence of the voice feature.

Figure 17. Cichocki, et al. (1999)

30

To account for the difficulties with French onset stops in Cantonese speakers’ production, Cichocki, et al. (1999) suggested that we could look at the patterns of difficulty found in first language acquisition, which shows that voiceless initial stops are more difficult than are voiced initial stops. (Ingram, 1978). Cichocki, et al. also claimed that one of the problems in this study is that all the subjects were learning French as a second foreign language. It is because English is taught in all Hong Kong schools and is the medium if instruction in many. The possibility of interference from English cannot be neglected when we look at the data obtained in this study. My prediction is that English speakers would not have this trouble because English speakers has the voicing contrast in their L1. Cantonese speakers may have difficulties contrasting voiced stops and voiceless unaspirated stops.

31

Figure 18. Cichocki, et al. (1999)

7. Discussion Based on a comparison and contrast of the major differences between the English and Cantonese phonological systems in this article, we have examined some difficulties that Cantonese speakers may have when learning English pronunciation. It is argued that most of the Cantonese ESL learners’ difficulties with English pronunciation may be accounted for by reference to fundamental differences between the phoneme inventories of the two languages, the characteristics and distribution of the phonemes and the permissible syllable structures of the two languages in question. In this section, we are going to look at differences between the acquisition of stops in onset and coda position, and different repaired strategies are used under different circumstances. Onset vs. Coda From the data of Cantonese speakers of English collected by Eckman 1981, Cantonese speakers exhibit a voice contrast in word-initial, -medial and final position. However, devoicing occurred in some voiced stops in coda position but not onset and word-medial position. Although voiced stops are absent in the L1 phonology, Cantonese speakers seems to have no difficulty in onset voiced stops. Since coda is a 32

more marked position than onset, we would expect that people would have more difficulties in coda positions. Similar to Flege & Eefting (1988)’s studies of English and Spanish speaker, Cantonese speakers judge tokens of [p, t, k] in their L1 and the tokens of [b, d, g] to be realizations of the same phonetic categories in the coda position even though they can detect auditorily the acoustic differences between corresponding L1 and L2 stops. We found that Cantonese speakers had fewer problems in the production of onset voiced stops in the acquisition of French. the result of the study by Cichocki, et al. (1999). This only happened in the onset but not the coda position. Since voiced stops are more marked than voiceless stops, this is not what we expected from the prediction by MDH. Comparable to the result in Eckman 1981, subjects in this study also showed that they had more difficulties in coda voiced stops. Apart from the fact that voiceless initial stops are more difficult than are voiced initial stops in L1 acquisition studies, the reason why voiceless French voiceless onsets are difficult to acquire by Cantonese speaker may also due to the perception of the voicing contrast. Cantonese subjects may have a wrong realization in time of the phonological units (phonemes) that distinguish word. Voiced stops in French is easier to distinguish by Cantonese speaker as Flege (1987) stated that, all other things being equal, we actually learn L2 sounds which are dissimilar to the sounds in our L1 more easily than their less dissimilar counterparts. Repair strategies In terms of the kind of repair strategies that Cantonese speaker will choose in the acquisition of English voiced stop, we need to look at proficiency, formality and the grammatical and functional aspects of the speech. In Abrahamsson (2003)’s study, data shows that coda deletion is low in the initial phrase of development; it would increase during the early phrase and decrease during later phrases. The proportion of epenthesis to deletion will increase over time, which means that the use of epenthesis would be relatively low at the early stage and increase later on in the L2 development. Error rate increases because of the fact that fluency also increases

33

considerably with higher L2 proficiency. Fluent speech is characterized by more focus on content and less focus on form and so the increase of deletion and epenthesis would be found in the early phrase of L2 development. Another factor that varies individual L2 learner’s utilization of epenthesis versus deletion is the phenomena of avoiding ambiguity and facilitating recoverability. As suggested by Lin 2001, it appears that epenthesis-deletion distribution of consonant clusters correlates positively with increased formality of the speech task such that epenthesis is frequently employed in formal tasks (e.g.,word-list or minimal-pair reading) but less frequently in less formal tasks (e.g.,sentence, text, and story reading or natural conversation), where deletion is the dominant simplification strategy. Other than that, one aspect of recoverability from the context is whether the coda is crucial part of a noninflected lexical form or whether it is part of an inflectional morpheme. It can be argued that the reduction of lexical forms generally increases lexical ambiguity, and this might particularly be the case for content words. In contrast, the information expressed by inflectional morphemes is usually redundantly expressed by other formal markers or otherwise predictable from the context, and it might be argued that inflectional information is more easily recoverable from the context than the underlying form of a reduced lexical stem. It is more likely that word-final codas that are part of a lexical stem will be pronounced less incorrectly than word-final codas that are part of an inflectional morpheme.

34

References: Abrahamsson, N. (2003), Development and recoverability of L2 codas: A longitudinal study of Chinese/Swedish interphonology. Studies in Second Language Acquisition, 25:3, 313-349. Abramson, A. & Lisker, L. (1970). Discriminability along the voicing continuum:cross-language tests. In Hala, B.,Romportl, M. and Janota, P., editors, Proceedings of the Sixth International Congress of Phonetic Sciences. Prague:Academia, 569–573. Blumstein, S., Cooper, W., Goodglass, H., Statlender, S., & Gottlieb, J. (1980). Production deficits in aphasia: A voice-onset time analysis. Brain and Language 9, 153–170. Chan, A.Y.M. & Li, D.C.S. (2000). “English and Cantonese phonology in contrast: explaining Cantonese ESL learners’ English pronunciation problems”. Language, Culture and Curriculum, 13, 67-85. Cichocki, W., House, A.B. Kinloch, A.M. & Lister, A.C. (1999). “Cantonese speakers and the acquisition of French consonants”. Language Learning, 49, 95121. Curtin, S., Goad, H. & Pater, J. (1998). “Phonological transfer and levels of representation: the perceptual acquisition of Thai voice and aspiration by English and French.” Second Language Research 14, 4. 389-405. Eckman, F. (1981): “On predicting phonological difficulty in second language acquisition.” Studies in Second Language Acquisition 4: 18-30. Eckman, F & Iverson G. (1993) “Sonority and markedness among onset clusters in the interlanguage of ESL learners” Second Language Research 9, 3. 234-252. Eckman, F & Iverson G. (1994). 'Pronunciation difficulties in ESL: coda consonants in English interlanguage.' In M. Yavas (ed.), First and Second Language Phonology. San Diego: Singular Publishing Company. 251-265. Edge, B.A. (1991). ‘The production of word-final voiced obstruents in English by L1 speakers of Japanese and Cantonese’. Studies in Second Language Acquisition, 13, 377-393. Eimas, P.D., Siqueland E.R., Jusczyk, P.W., & Vigorito, J. (1971). Speech perception in infants. Science 171:303.6 Ethnologue. Website of the Summer Institute of Linguistics. http://www.ethnologue.com/ (Jan, 2004) Flege, J. (1987). 'The production of "new" and "similar" phones in a foreign language: Evidence for the effect of equivalence classification.' Journal of Phonetics 15: 4765.

35

Flege, J.E. & Eefting, W. (1988) "Imitation of a VOT continuum by native speakers of English and Spanish: Evidence for phonetic category formation", Journal of the Acoustical Society of America 83: 729-740. Hansen, J. (2001). “Linguistics constraints on the acquisition of English syllable codas by native speaker of Mandarin Chinese”. Applied Linguistics, 22, 338-365. Kess, J. F. Psycholinguistics: Psychology, Linguistics, and the Study of Natural Language. Amsterdam: John Benjamins Publishers BV, 1992. Lado, R. 1957: Linguistics across cultures. Ann Arbor: University of Michigan Press. Lin, Y. H. (2001). “Syllable simplification strategies-A stylistic perspective”. Language Learning 51:4, 681-718. Morton, K. (1995) Kate Morton's Image Resource. http://www.essex.ac.uk/speech/material/kate/k-images.html O'Grady, W., Dobrovolsky, M. and Aronoff, M. (1989). Contemporary Linguistics. New York: St. Martin's Press. Pisoni, D., and Tash, J. (1974) Reaction times to comparisons with and across phonetic categories. Perception and Psychophysics 15(2), 285-290. Pisoni, D., Aslin, R., Perey, A. and Hennessy, B. (1982): Some effects of laboratory training on identification and discrimination of voicing contrasts in stop consonants. Journal of Experimental Psychology: Human Perception and Performance 8, 297–314. Radwanska-Williams, J. & Yam, J.P.S.. (2001). “The acquisition of English plosives by Chinese learners”. In Phonetics Teaching & Learning Conference 2001. Russell, K. (1997). Narrower transcriptions of English: Aspiration (and Voice Onset Time). http://www.umanitoba.ca/linguistics/russell/138/2001/notes.html Schmidt, R. (1987). "Sociolinguistic variation and language transfer in phonology." In G. Ioup & SH Weinberger (Eds.), Interlanguage phonology, 365-377. Rowley, MA: Newbury House Publishers. Strange, W. 1972: The effects of training on the perception of synthetic speech sounds: voice onset time. Doctoral dissertation, University of Minnesota. Tsui, I. Y. H., & Ciocca, V. (2000). “The perception of aspiration and place of articulation of Cantonese initial stops by normal and sensorineural hearingimpaired listeners”. The International Journal of Language and Communication Disorders, 35, 507-525 Wertz, R. R. (2003). Geographical Database: Map of Guangdong Province http://www.ibiblio.org/chinesehistory/images/atlas/provincial/guangdong.html

36

37

Related Documents