한국   대만   중국   일본 
Analysis of Italian children's English pronunciation
The Wayback Machine - https://web.archive.org/web/20070527173328/http://www.eee.bham.ac.uk:80/russellm/ItalianEnglishReport/ItalianEnglish_report_v2.htm

Analysis of Italian children?s English pronunciation

  • Some remarks about learning to speak a second language
  • The variety of English discussed in this report is that often known as Standard British English (SBE). By this I mean English that betrays little of the geographical or social origins of the speaker. There is of course a very wide range of regional (and personal) variation within English, and where appropriate I will mention these.

    When a person learns to speak a second language some time after they have acquired their native tongue (rather than at the same time, as in bilingual children), we would expect that aspects of their native language would interfere with the pronunciation of the new language. There are two principal ways in which this might manifest itself. Firstly, speakers may try to apply the spelling-to-sound rules of their native language to the new language, possibly resulting in an inappropriate choice of target phoneme string. Secondly, even if they are aiming at the correct phoneme, the influence of their native phonological system might interfere with their production of sounds in the different system. This happens because when speakers of one language hear a sound in a foreign language they tend to interpret it in terms of their native phonological system, and when they try to produce a foreign sound they usually produce one that is similar to a phoneme in their own language. If the foreign language makes a distinction that is not made in their native tongue, learners often find it difficult to perceive the distinction, and still harder to produce it. In this study we shall see that the English vowel system is much richer than the Italian one, consequently Italian speakers of English tend to have problems making the appropriate distinctions. In addition, phonological processes in the native language, such as devoicing of word-final consonants may creep into the pronunciation of the target language.

    In this report I will call the first of these mispronunciation , and in simple terms it can be thought of as "aiming at the wrong target". The second will be referred to as phonological interference , and this can be thought of as "missing the correct target".

    Of course, distinguishing between these is not always straightforward. When an Italian speaker, whose phonological system does not contain the vowel /I/, consistently produces an /i/ as a substitute, then this phonological interference. But if a speaker, who can and does produce the /aI/ phoneme in the appropriate places, but pronounces the vowel in the English word child as /I/, then we can be fairly sure that this is a mispronunciation. However, if a speaker pronounces the middle vowel in the word banana as /a/, when standard British English would /A:/, then it is hard to tell if the Italian?s pronunciation is a mispronunciation, influenced by the spelling (letter ?a? is always /a/ in Italian), or whether they are trying to produce the English phoneme /A:/, but have actually produced the (very similar!) Italian phoneme /a/ instead. In fact it is probably a bit of both, as, of course, young speakers of a new language probably do not have in their minds an explicit phonological representation of either their native language or the new one.

    These two types of interference typically result in different types of pronunciation error. Mispronunciations tend to be more catastrophic in terms of intelligibility, in that it can be impossible to work out what the intended word was (though the context of the utterance often helps to resolve even the most disastrous of attempts!).

    Phonological interference results in rather more subtle errors, as the sounds produced generally approximate the "correct" ones in (usually) predictable ways. On the whole this does not result in words becoming unintelligible, but instead gives the resulting speech its "foreign accent". Sometimes, however, this can result in a word becoming confusable with another, so it would be misleading to say that these are trivial errors. An example in the Italian children?s speech would be the almost universal substitution of /i/ for /I/, so that the word lip sounds like leap.

  • Mispronunciations
  • Most mispronunciations can be seen as a result of a speaker using the rules that they use in their native tongue (mostly subconsciously) to convert from the written string of letters into the spoken string of sounds that make up the words. This is only true if speakers are using a predominantly text-based approach to learning the target language. Learners who have learnt by imitation, without access to the written form of the new language, will clearly not be influenced by the spelling. However, the children?s speech in this study was read, so it is valid to assume that some degree of "letter-to-sound" conversion is going on. The extent to which speakers rely on this conversion depends on how familiar they are with the words presented to them. Pronunciation of the more familiar words is likely to have been internalised, and so bypasses the "letter-to-sound" rules.

    The relationship between spelling and pronunciation in Italian is generally very simple, with one letter corresponding to one sound. Where there are alternatives, these are always determined by context. In contrast, English spelling often bears very little relation to how the word is pronounced. So, in Italian the letter ?a? is always pronounced /a/. In English it can be pronounced /{/ as in fat ; /eI/ as in fate ; /A:/ as in fast , or /@/ as in about, and there are at least six ways of pronouncing the letter combination ?ough?. It is, in general, very difficult to determine which pronunciation is correct for any given word; why is it /A:/ class , but /{/ in classic , for example? (But note that many English accents have /{/ in class. )

    Here are some examples from the Italian children?s data where Italian spelling rules have been applied:

    • the English letter ?u? in can be pronounced /u/, but more often it is /3/ or /V/, while in Italian it is always /u/; turn (SBE /t3:n/)
    turn_E.wav was pronounced [turn] turn_I.wav by 6 out of 12 children; and cup (SBE /kVp/) cup_E.wav was pronounced [kup] cup_u_I.wav by one speaker;
    • the letter ?g? before ?e? or ?i? is /dZ/ in Italian; finger (SBE /"fINg@/)
    finger_E.wav was pronounced ["findZ@r] finger_I.wav by 4 out of 13 children;
  • the letter ?c? before ?e? or ?i? is /tS/ in Italian; voices (SBE /"vOIsIz/)
  • voices_E.wav was pronounced ["vOItSes] voices_I.wav by 2 out of 13 children;
  • the letters ?ch? in Italian are always /k/; child (SBE /tSaIld/)
  • child_E.wav was pronounced [ki@ld] child_I.wav by 1 out of 13 children;
    • The letter ?h? is silent in Italian, and there was a great deal of variability in whether or not it was pronounced. However, it should be noted that ?h-dropping? is a feature of many English accents, and even within those, there is a great deal of variation in which words it applies to.

    Furthermore, with the exception of ?h?, all written letters are pronounced in Italian. English has numerous ?silent? letters, or letter pairs/triples that are pronounced as one. Here are some examples of where the Italian children made errors by trying to pronounce all of the letters.

    • The ?l? in walk (SBE /wO:k/) was pronounced by all 13 children [wQlk]
    walk_I.wav (though the quality of the vowel and pronunciation of the initial ?w? varied);
    • seat
    • (SBE /si:t/)
    seat_E.wav was pronounced as two syllables by 3 out of 12 children ["si.at] seat_I.wav ;
  • the letter combination ?gu? was pronounced [gw], so guards (SBE /gA:dz/)
  • guards_E.wav was pronounced [gwardz] by 5 out of 13 Italian children; and guide (SBE /gaId/) was pronounced [gwid] by 6 out of 13 children gu-gw_I.wav ;
  • boiled
  • which in SBE is a single syllable, /bOIld/ was pronounced with two syllables, ["bOIl.Ed]
    boiled_I.wav by 6 out of 12 children;
  • friend
  • (SBE /frEnd/)
    friend_E.wav was pronounced ["fri.End] friend_I.wav by 5 out of 12 children;
  • puzzle
  • (SBE /"pVzl=/) was pronounced ["pus.le]
    puzzle_I.wav by 2 out of 13 children.

    Other problems arose where the English has letters that Italian does not have. Although it is usually fairly straightforward to learn these, they can be a source of uncertainty. The letters ?j?, ?k?, ?x?, ?y? and ?w? do not occur in native Italian words, though they do appear in foreign words, and ?w? in particular gave rise to errors. Some of the Italian children pronounced it correctly and consistently, as /w/; while others pronounced it both /w/ and /v/, sometimes in the same word, so

    • window
    • (SBE /"wInd@U/)
    window_E.wav was pronounced ["windQv], ["vindo] or ["vindQv] by 8 out of 25 children; and yellow (SBE /"jEl@U/) was pronounced ["jElQv] by 3 out of 13 children w-v_I.wav .

    Some words caused nearly all the children problems:

    • bird
    • and birthday, in which the ?ir? vowel (SBE /3:/) was pronounced correctly by less than half the children; the most common attempts being [9r], [Vr], [er], and [ir]. So, birthday (SBE /?b3TdeI/)
    birthday_E.wav was very variable in the Italians? English birthday_I.wav ;
  • The vowel in chair and hair (SBE /e@/) chair_E.wav was also very variable; although more than half produced an acceptable variant ([e@r] or [er]), many made it two syllables; [ai.@r], or [ei.@r] being the most common chair_I.wav ;
  • The most common pronunciation of ears (SBE /I@z/) ears_E.wav was [erz], but [irs] was also common, and many added an initial /h/ ears_I.wav ;
  • The vowel in coat, road and boat (SBE /@U/) oa_E.wav was subject to a wide range of pronunciations, ranging from [Q:], the most common, through [A:], [O], [o.a], [{] and [u] coat_I.wav , boat_I.wav , road_I.wav .
    1. Phonological Interference

    When investigating the influence of a speaker?s native tongue on their pronunciation of English, it makes sense to start with a comparison of the phoneme systems of the two languages in question in order to provide a framework for describing any potential problems. Therefore, in what follows I have compared the structures of the English and Italian phoneme systems, and then picked out a few examples from the data to illustrate how the phonological structure of Italian has influenced the children?s English pronunciation.

    Standard British English is usually regarded as having 44 phonemes, while Italian has 50. In addition the two system differ in how these systems are made up. Italian has a very simple 7-vowel system, while English has 20 vowels. Italian has a richer consonant system. Table 1 summarises the two consonant systems and also shows how many of the consonants are shared by both, as well as how many are unique to English and Italian. Table 2 provides the same information for the vowels.

    Consonants

    Standard British English has 24 consonants: 17 obstruents (6 plosives, 9 fricatives and 2 affricates), and 7 sonorants (3 nasals, 2 liquids and 2 semivowel glides).

    The Italian system consists of 43 consonants: 29 obstruents (12 plosives, 9 fricatives, and 8 affricates) and 14 sonorants (6 nasals, 6 liquids and 2 semivowels).

    Between them English and Italian have 48 consonants, of which 19 (40%) are shared. Table 1 below breaks this down into consonant types, and also shows how many consonants are common to both languages. The low percentage of shared consonants is due largely to the existence in Italian of geminate forms of nearly all the consonants. A geminate consonant is written with a double letter in the spelling; it is pronounced much more forcefully and is usually longer than a single one.

    Consonants

    English

    Unique to English

    Italian

    Unique to Italian

    Eng+Ita

    Shared

    %shared

    Plosives

    6

    0

    12

    6

    12

    6

    50.00

    Fricatives

    9

    4

    9

    4

    13

    5

    38.46

    Affricates

    2

    0

    8

    6

    8

    2

    25.00

    Nasals

    3

    1

    6

    4

    7

    2

    28.57

    Liquids

    2

    0

    6

    4

    6

    2

    33.33

    Semivowels

    2

    0

    2

    0

    2

    2

    100.00

    Total

    24

    5

    43

    24

    48

    19

    39.58

    Table 1 Summary of structure of English and Italian consonant systems

     

    Obstruents

    Obstruent sounds are produced by obstructing the vocal tract at some point, and then either releasing the blockage (resulting in sounds known as plosives), or forcing air past it (resulting in frication). Both English and Italian obstruents are traditionally classified in pairs as voiceless and voiced. Italian also has single/geminate pairs of all the obstruents.

    Plosives

    The plosive systems of the two languages appear very similar, in that both languages have plosives at bilabial /p, b/, dental/alveolar /t, d/ and velar /k, g/ positions, and voiceless/voiced pairs. However, this superficial similarity disguises the fact that the sounds are produced differently in the two languages. In particular the distinction between voiceless and voiced relies on different cues. In fact, in English the presence of voicing is not actually a reliable cue, and they are perhaps better classified as fortis (strong, roughly equivalent to voiceless) and lenis (weak, roughly equivalent to voiced). In Italian, however, the main cue to voicing is the presence of vocal cord vibration. This can be seen in Figure 1 where the closure of the initial /d/ in the word dog is clearly voiced in the Italian child?s production.

     

     

     

     

     

    Figure 1 Spectrograms of the word dog spoken by a native English child (left) and an Italian child (right) showing the difference in the initial /d/, where the Italian version has a clear voice bar in the closure. Note also a strong vocalic offset that looks like a second syllable (see Page * ).

    Aspiration is the main cue to the fortis-lenis distinction in English plosives, with the release of fortis (voiceless) plosives being accompanied by an audible period of frication, whose duration and intensity vary with context. Italian speakers often fail to produce the aspiration for the fortis plosives, and this can cause confusion between these and the lenis plosives. This can be seen in Figure 2, and this did result in making the words confusable with other words.

     

     

     

     

     

    Figure 2 Spectrograms of the word town spoken by a native English child (left) and by an Italian child (right) showing the lack of aspiration of the initial /t/ in the Italian version.

    Italian also has geminate plosives corresponding to the single ones, /pp bb tt dd kk gg/, and these tend to have a longer closure and more forceful release than their single counterparts. It is possible that Italians may use the geminate form when pronouncing English words spelt with double letters, under the influence of the orthography. However, there are not enough words spelt with double letters in the data to allow this to be examined.

    • The only geminate plosive in this data was in the word apple , and in all cases the closure was at least 50% longer than the average English /p/ closure.

    Fricatives

    The fricative systems of the two languages overlap to some extent, with both having unvoiced and voiced labio-dental, /f,v/, and alveolar /s, z/ fricatives, and unvoiced palato-alveolar /S/. English also a voiced palatal alveolar /Z/, and unvoiced glottal fricative /h/, which are not present in Italian. However, Italian has geminate versions of all except /z/, and the comments made for the plosives may apply in this respect. However, there are a number of ways in which the systems differ that may be of significance.

    Italian has no equivalent of the English dental fricatives /T and D/, so these sounds could be a source of error, as the Italian children tended to produce approximations that depended on context. Many of the children in this data did produce satisfactory /T/ and /D/, but there was a great deal of variation in this, both between speakers and even for different words uttered by the same speaker.

    The /T/ sound was pronounced either [t] or [f], so we see:

    • the word three (SBE /Tri:/) three_E.wav pronounced [fri] three_f_I.wav by 7 children; but [tri] by 5 children. Note that both these result in alternative words ( free and tree ), so are potentially non-trivial errors.

    Some even used different pronunciations in the same word on different occasions:

    • The word three was pronounced [Tri] when pronounced in isolation, but [fri] when part of the phrase three old books by 2 of the 12 speakers.

    If the /D/ sound was not pronounced correctly, then it was most often mispronounced [d]:

    • The word-initial cluster ?sm?, common in English words, e.g. small small_E.wav is not found in Italian, and most of the children voiced the /s/ to some extent, small_I.wav .

    Affricates

    English has only one pair of affricates, /tS, dZ/ while Italian has two pairs, the English ones plus /ts, dz/ as well as the single-geminate distinction. There were no problems associated with the affricates in this data.

    Sonorants

    All sonorants, are, by definition, voiced. Again Italian has both single and geminate versions of all the sonorants, except for the semivowels, and again it is likely that there will be some geminate pronunciations where the spelling has a double letter.

    Nasals

    The nasal system in the two languages is slightly different, but there are similarities, as English and Italian both have bilabial /m/ and alveolar /n/ nasals. But note the following differences:

    In Italian the velar nasal /N/ is an allophone of /n/ before a velar, whereas in English it is considered to have full phonemic status. However, this caused few problems. The most frequent place where the velar nasal occurs in English is in the affix ?-ing?, and even within native English speakers there is a great deal of variation in whether this is actually pronounced [n] or [N]. And this was also the case for Italian speakers.

    However, they also tended to use /n/ for /N/ in other situations, where SBE would always have /N/:

    • SBE king is always [kIN]
    king_E.wav , but half of the Italian children said [kin] king_n_I.wav .
    • There was also a strong tendency for the Italian children to have a strong audible release of the velar nasal [N g ]
    king_g_I.wav . This is also a feature of some English accents, so does not seem particularly alien.

    Liquids

    Apart from the existence of Italian geminate liquids, and the usual comments that apply to those, the two systems are broadly comparable, both having a lateral /l/ and alveolar /r/.

    In acoustic phonetic terms the difference between English and Italian /r/ is substantial. The realisation of the sound traditionally given the symbol /r/ in English in fact varies greatly between English speakers, and across phonemic contexts. Many speakers of Standard British English have either a tap, represented in SAMPA by /4/ (IPA / R /) or an approximant /r\ / (IPA [ ? ]); while other accents may have a trill / r /. In Italian /r/ is always a trill. Acoustically, a trilled /r/ is quite different from a tapped one; there are usually a number of very short stops (silences) corresponding to the hits of the tongue against the roof of the mouth, while in an English tap the closure is rarely complete enough to show a single stop. This difference can be seen clearly in Figure 3, and heard here in the word parrot ; compare SBE parrot_E.wav with Italian English parrot_I.wav . Although this causes no intelligibility problems for a human listener, ASR models that are based on the English variant may fail on the Italian version (and incidentally, would also fail on those accents of English where /r/ is trilled).

    In SBE and many other accents an ?r? after a vowel in the orthography is not pronounced, so star is /stA:/ star_E.wav . In Italian it is always pronounced, and all the children in this data pronounced it to some extent, so star is [star] star_I.wav . However, Scottish and Irish, and many variants of American English do pronounce a post-vocalic ?r?, so if any Italian speaker does this it will not be misunderstood. For this reason, these are not counted as full errors in the scoring system used. In addition, it is worth noting that ASR models based on US English will probably already be accommodating this difference.

     

     

     

     



    Figure 3 Spectrograms showing intervocalic /r/ in the word parrot. On, the left is a native English speaker, showing no closure; on the right is an Italian speaker, clearly showing the trill as a series of rapid stops. Note also the difference in the initial /p/.

    In SBE there are two allophonic (context-dependent) variants of /l/, often referred to as light and dark /l/. Usually a light /l/ occurs syllable-initially, as in light or leave , and a dark /l/ syllable-finally or before a consonant, as in pool or milk and (for some speakers) before a back vowel, as in look . In Italian a light /l/ generally is used in all contexts. This can mean that Italian /l/ sounds a little odd to English ears. In this data this is particularly noticeable in the word child , where most accents of English would use a dark /l/. But the differences are acoustically subtle, and once again there are accents of English that do not have this allophonic variation, so it can not be viewed as a serious error.

    Semivowels

    See Both Italian and English have two semivowels /w/ and /j/. There are no geminate versions. They are very similar in both languages, and were not the source of any errors in the children?s speech.

    Vowels

    Table 2 summarises the two systems, while the rest of this section describes them in more detail. As can be seen from this table there is very little overlap between the vowel systems, with only 11.5% of vowels shared by both languages. In addition, 19 vowels are unique to English.

    It is in the vowels that I expect the greatest problems to arise for Italian speakers of English. The Italian vowel system is very simple, consisting of seven monophthongal vowels, /i e E a O o u/. In contrast, the English system consists of twenty-two vowels, comprising fourteen monophthongs / i: i I E { A: Q O: U u: u V 3: @/ and eight diphthongs /eI aI OI aU @u I@ e@ U@/, though it can be hard to draw a firm line between the two. So the Italian children have to learn to make many more distinctions than they are used to. This tends to lead to some of them using the same Italian vowel to cover many of the English vowels.

    Vowels

    English

    Unique to English

    Italian

    Unique to Italian

    Eng+Ita

    Shared

    % shared

    Monoph

    14

    11

    7

    4

    18

    3

    16.67

    Diphthongs

    8

    8

    0

    0

    8

    0

    0.00

    Total

    22

    19

    7

    4

    26

    3

    11.54

    Table 2 Summary of structure of English and Italian vowel systems

     

    Monophthongs

    English is often described as having five long monophthong vowels /i:/, /A:/, /O:/, /u:/ and /3:/; but because of the contextual effect on duration of a following consonant, (a notional short vowel before a lenis consonant can in fact be longer than a so-called long vowel before a fortis one) the length distinction is somewhat spurious, so the Italian counterparts of three of these, /i/, /O/ and /u/ are in general accurate enough pronunciations not to be noticeable as foreign.

    English has two unrounded vowels in close (or near-close) front position: /i:/ and /I/. This distinction is not made in Italian, and almost universally /I/ is pronounced [i]:

    • lip
    • (SBE /lIp/)
    lip_E.wav sounded like leap [lip] lip_I.wav for all but one of the children (who pronounced it [laIp]);
  • king
  • (SBE /kIN/)
    king_E.wav was pronounced [kiN] king_g_I.wav or sounded like keen [kin] king_n_I.wav if they also failed to produce a velar nasal.

    A similar distinction between the two close (or near close) rounded back vowels /u:/ and /U/ is also made in most varieties of English, but not in Italian, and once again /u/ is most often the sound used for both. However, there are accents of English that also do not make the /u:/-/U/ distinction (Scottish accents in particular), so while this substitution makes a person?s speech sound non-standard, it would not make it unintelligible, or even sound particularly foreign, so it does not constitute a serious pronunciation error.

    • The words foot (SBE /fUt/)
    foot_E.wav and book (SBE /bUk/) book_E.wav were pronounced [fut] foot_I.wav and [buk] book_I.wav , respectively by all children;

    One significant difference is that Italian appears to have no central vowels, with all the vowels being at the edges of the "vowel space". English has at least two central vowels, /3:/ and /@/. In English the /3:/ vowel always occurs before the letter ?r?, which in SBE is not pronounced. In Italian (in common with many accents of English) such an ?r? is always pronounced. Again, this is not considered a serious error, as it never compromises intelligibility.

    • The most common rendering of /3:/, as in bird , SBE /b3:d/
    bird_E.wav by the Italian children was [b3rd] bird_I.wav .

    In addition, in modern SBE the vowel /V/ is also more central than back, and is most often pronounced /a/: by the Italian children.

    • cup
    • (SBE /kVp/)
    cup_E.wav is [kap] cup_I.wav for 11 of the 13 speakers;
    • mother
    • (SBE /"mVD@/)
    mother_E.wav is ["mad@r] or ["maD@r] mother_I.wav for all but one of the children.

    English has 2 unrounded vowels in open position front /{ / and back /A:/ - Italian has neither, and tends to use /a/ (slightly more open than /{/,and more front than /A:/) for both.

    • bath
    • (SBE /bA:T/)
    bath_E.wav was pronounced [baT], bath_f_I.wav by 5 of the 13 children; the others used [{], bath_t_I.wav which is a quite acceptable accent variation of the vowel in this word;
  • car
  • (SBE /kA:/ )
    car_E.wav was pronounced [kar] car_I.wav by all the children.

    Sometimes [E] was used for /{/, so

    • bag
    • (SBE /b{g/)
    bag_E.wav sounds like beg [bEg] bag_I.wav in the speech of 7 of the 13 children;
  • for all but 2 of the children man (SBE /m{n/)
  • man_E.wav sounds like men [mEn] man_I.wav .

    The vowel /Q/ in dog was generally pronounced accurately, in spite of Italian having no equivalent. Where it wasn?t, either the unrounded counterpart /A/ was used, or the Italian /a/ tended to be used.

    Where the /O:/ vowel occurred before an ?r? in the spelling, for example in the word door , SBE /dO:/ door_E.wav , the Italian children tended to shorten it, and of course pronounced the /r/, sometimes producing an /Q/-like vowel door_I.wav . Where /O:/ wasn?t followed by ?r? the Italian children?s pronunciation was accurate, though the length tended to be shorter than that generally heard in SBE.

    Many of the features described above of the Italian children?s pronunciation of the English vowels are illustrated in the vowel plots in Figure 4. A vowel plot is a scatter graph of the frequency of the second formant (F2) against the first (F1), and it shows graphically the distribution of the vowels within the vowel space. In particular this plot shows where the vowels the Italian children produced were close to the target English pronunciation, and where there are differences.

    Figure 4 Vowel plot showing the average formant values for native English anf Italian speakers of English

    In this plot each vowel is represented by a different colour; the English children?s vowels are shown as diamonds, while the Italian children?s values are shown as asterisks

    The plot shows that for three of the English vowels that have (near) Italian equivalents: /i/ as in fleece (dark blue), /E/ (gold) in dress, /A/ as in start (red) the Italian children?s pronunciation is very close to that of the native English speakers. For the vowel /O:/ (thought- brown), which has an Italian counterpart in /O/, the Italian values for both f1 and f2 are a little lower than the English ones. This may be due to the fact that the majority of words containing the /O/ vowel are spelled with a following ?r?. The English children do not pronounce the /r/, while the Italian children do; the effect of an /r/ is to lower the surrounding formants, and this may be why the Italian values are lower. The same may apply to the /3/ (nurse - black) vowel, although there was a great deal of variation on how this vowel was pronounced.

    The failure of the Italian children to distinguish between /i/ (fleece ? dark blue) and /I/ (kit ? pink), mentioned above can be seen clearly - the Italian /i/ and /I/ both cluster in the same place, with the English /i/.

    Similarly, the Italians? lack of distinction between /u/ (goose ? green) and /U/ (foot- orange) also mentioned above, is evident in that the green and orange asterisks are very close to each other. They are not, however, close to the English values for either of these vowels. The reason for this is not that the Italians? pronunciation is not accurate, but rather reflects coarticulatory effects that cause the second formant to be higher in these vowels for the English speakers than it would be in careful isolated speech. The coarticulatory effects tend to be stronger in the native speakers because they are not focussing on their pronunciation, and some of them spoke them very rapidly. The Italian speakers tended to speak more slowly, and were focussing on their pronunciation.

    Diphthongs

    There are 8 diphthongs in English, while Italian has none. However, the exact quality of the sounds in English may be very variable, both between phonetic context and speaker/accent group. As well as a range of starting and ending positions being acceptable for these vowels, for some speakers and in certain contexts there may be no movement within the diphthong, making it phonetically a monophthong. However, the majority of native English speakers do not perceive these differences, so if an Italian speaker produces a monophthong in place of a diphthong this may go unnoticed.

    Of the rising diphthongs, /aI/ and /OI/ are relatively stable, and posed few problems for the Italian speakers. The /eI/ diphthong is often realised by native English speakers as /e:/, particularly in northern dialects ? this is very similar to the Italian /e/, and this is the vowel that was used by many children:

    • half the children pronounced cake , SBE /keIk/
    cake_E.wav , as [ke:k] cake_I.wav ;
  • a third of them used an [e] vowel in one or other of the syllables of railway .
  • When the centring diphthongs occur before /r/, there is a great deal of regional (and personal) variation in how they are pronounced, with rhotic accents having a monophthong with a quality close to the starting point of the diphthong, followed by /r/. As the post-vocalic /r/ is also pronounced in Italian, it is not surprising that the children tended to produce a monophthong plus /r/.

    • the vowel sound in bear (SBE /e@/, but increasingly pronounced [e:] even by speakers of the standard accent) was rendered [er] by 10 of the 25 children, and a similar picture was seen for wear ;
    • for the word near , (SBE /nI@/)
    nearandopen_E.wav the most common pronunciation was [nir] near_ir_I.wav , which was produced by 7 out of 12 children. A further 3 said [ner] near_er_I.wav .

    The falling diphthong /aU/ was generally pronounced acceptably most of the time, although sometimes the final quality was more [u]-like, this is a very subtle difference, and unlikely to cause problems. The other falling diphthong, /@U/, was more variable, as indeed it is in English.

    • old
    • (SBE /@Uld/)
    old_E.wav was pronounced [Qld] old_I.wav by 9 out of 12 children;
  • the initial vowel of the word open , SBE /?@Up@n/
  • nearandopen_E.wav was pronounced [Q] near_ir_I.wav by a quarter of the speakers, and [o] by a further quarter near_er_I.wav .

    Phonological processes

    Reduction

    English is often described as a stress-timed language, as there is a tendency for stressed syllables to occur at regular intervals. Because there can be a very variable number of unstressed syllables between these stressed syllables, some of them tend to get reduced in order to fit them into the rhythm. Reduction can lead to vowels being centralised, that is, the articulators do not reach their full positions. Many unstressed vowels end up as [@], also known as schwa, or may even disappear altogether, a process known as elision .

    Italian, on the other hand, is often described as a syllable-timed language, which means that there is a tendency for syllables to occur at regular intervals. This means that there is less pressure on some syllables to fit into a shorter interval, so reduction is not so widespread as it is in English, and Italian vowels tend to keep their full articulation. When Italians speak English, one of the things they find most difficult to master is the appropriate amount of reduction. So vowels that would be rendered with a schwa-type vowel maintain their full sound, and it is this that gives Italian English its particular rhythmic and melodic qualities. This was evident to some extent in nearly every polysyllabic word in this data, but here are some examples:

    • lemon
    • was pronounced ["lEmQn]
    lemon_I.wav rather than the typical SBE ["lEm@n] lemon_E.wav ;
  • television
  • , which in SBE is usually ["tElIvIZ@n/
    television_E.wav was most frequently pronounced [tEle"viZQn] television_I.wav . (And note also that the stress was placed on the wrong syllable by all but one of the children.);
  • parrot
  • , SBE /"p{r@t/
    parrot_E.wav was ["p{rot] parrot_Q_I.wav ;
  • intelligent
  • , which in SBE is /In"tElIdZ@nt/
    intelligent_E.wav , was commonly pronounced [in"tElidZEnt] intelligent_I.wav ;
  • the Italian children tended to pronounce words ending in ?er?, ?ar? or ?or?, which in English are generally pronounced with a /@/, with a full vowel, and usually the /r/was pronounced too, so water (SBE /"wO:t@/
  • water_E.wav ) is pronounced ["wOtEr] water_I.wav , and sugar (SBE /SUg@/ sugar_E.wav ) is ["Sugar] sugar_I.wav .

    Dictionaries of English represent this reduction in the pronunciation they give, and if these are used unmodified in an ASR system for Italian speakers, there is a strong likelihood that their pronunciations will not match the system?s representation, which is expecting a shorter, less intense schwa-type vowel.

    Syllabic consonants

    A further consequence of the reduction process in English is the appearance of syllabic consonants. Usually a syllable must have a vowel at its centre, but if that vowel becomes elided through extreme reduction, then a consonant may take on the role of syllabic nucleus. Although almost any consonant can become syllabic in this way, the most frequent syllabic consonants are nasals, especially /n/, and liquids, such as /l/, since these already share a vowel like structure. Again English dictionaries include syllabic /n=/ and /l=/ in their pronunciations, and this may well be a source of mismatches as many of the Italian children tended to pronounce a full, stressed vowel in such situations.

    • the final syllable in table and bottle was most frequently [bVl], but also [bQl] and [bUl], where in English it would generally be /bl=/. Compare native English /"teIbl=/
    table_E.wav and /"bQtl=/ bottle_E.wav with Italian English table_I.wav and bottle_I.wav ;
  • seven
  • , which in SBE is /"sEvn=/
    seven_E.wav was pronounced ["sEvEn] seven_I.wav by all the speakers.

    Final devoicing

    In many languages there is a tendency to devoice utterance-final consonants. In English spoken by Italians many phonemically voiced consonants were devoiced, even within a phrase. I don?t know whether this is a consequence of an often not very fluent reading style (so that words within a phrase were pronounced with noticeable pauses between them), or whether this is a feature of Italian which is leaking into their English.

    Some comments on supra-segmental features

    I have already mentioned the stress-timed nature of English in contrast with syllable timed Italian as giving Italian English its characteristic rhythmic properties. More detailed comment on the intonation patterns is not possible due to the nature of this data. Reading lists of isolated words and short more-or-less meaningless phrases are not conducive to the production of natural intonation patterns, irrespective of whether or not the speaker is native or non-native.

    However, one observation can be made about the influence of the Italian word structure. Italian words end in a vowel, so for some speakers there was a tendency to have strong vocalic offset to final consonants. See Figure 1 above and compare SBE dog_E.wav with Italian dog_I.wav , and SBE food_E.wav with Italian food_I.wav . This was particularly common in the isolated words, but did also occur within phrases. Where this offset is strong enough to be construed as another syllable, then it could cause ASR errors, even though it would be unlikely to pose any problem to a human listener.

  • Summary: the four differences most likely to cause ASR problems
  • Many of the features of the Italian children?s pronunciation of English discussed above are acoustically quite subtle, but obvious to a human ear, while other differences may be acoustically large but go unnoticed by human listeners. It is also possible that differences which seem significant to a human listener are not so for an ASR system, and vice versa. The following are the features of Italian English that I consider likely to be the most problematic for ASR systems.

    • Trilled /r/, whose acoustic pattern is quite different from the English tapped /r/.
    • Non-aspiration of fortis plosives, leading to mismatch with models that are based on plosives with a distinct period of strong aspiration.
    • Vocalic offsets can look like another syllable, which won?t be accounted for by the models.
    • Lack of reduction making unstressed vowels longer and of different quality from those in English.
  • Scoring the children?s pronunciation
  • A word-by word analysis of each child?s data was carried out, and an error penalty was calculated for each child which took into account the number and severity of the errors they made. Table 3 shows the scores derived for each child. Words were scored as "correct", shown in green, "acceptable variant", shown in blue, "significant error", shown in black, or "mispronunciation", shown in red. The entries in the table are listed from best to worst.

    The scoring system

    Words were scored as correct when the child achieved a realisation that was very close to the SBE pronunciation. For instance if they pronounced the so-called English long vowels /i: A: O: u: 3:/ as their counterparts /i A O u 3/ this was not penalised, as the distinction is not an absolute one in English, and is subject to considerable accent and personal variation. Note that some pronunciations may still sound slightly "foreign" due to prosodic (intonational and rhythmic) factors, which are difficult to analyse meaningfully for read lists of words and short phrases.

    A pronunciation was considered an acceptable variant if it could be considered the result of normal phonological processes (e.g. devoicing before a voiceless consonant, assimilation, etc.), or if it represented a (non-extreme) accent variant, (e.g. pronunciation of post-vocalic /r/, use of /{/ for /A:/ in certain words and the precise realisation of the diphthongs). In other words these are pronunciations that, while non-standard, would not be looked upon as obviously foreign.

    Where a speaker was aiming at the correct pronunciation, but what they produced sounded foreign, these were scored as significant errors. This included where an Italian vowel was used in place of an English one (e.g. /i/ for /I/ in words like kid and city , or /a/ for /V/ in sun) and where lack of aspiration or other compensatory gesture led to fortis plosives being perceived as lenis (e.g. if the initial sound in town sounded like a /d/).

    The term mispronunciation was reserved for errors that resulted in the word being unintelligible (e.g. /ki@ld/ for child , or /ba:t/ for boat ), or in the word being confusable with another (common) English word (e.g. lip sounding like leap , or coat like caught ). These were considered the most severe, as they compromise intelligibility.

    When calculating the error penalty, acceptable variants scored 0.5, significant errors 1,and mispronunciations 2. This gives rise to an error penalty that is weighted in favour of near misses, but against unintelligible words, and which corresponds to an intuitive judgement of the relative impact of the different kinds of error.

    I have attempted to give each child a "final score", which is the result of dividing the number correct by the error penalty. This has the intuitively sensible effect of giving those with high number of correct and low error penalty a high overall score, and vice versa.

    Notes on particular children

    Some of the children exhibited factors that may have influenced their pronunciation, and I have noted these in the final column.

    Speakers f0202 and m0211 sounded as if they had a cold.

    Speaker m0216 had a very much deeper voice than the others. He was also not a very fluent reader.

    Some of the speakers were very breathy, particularly before and after each utterance. Speakers f0201, f0203 and m0206 had especially shaky breaths and may have been nervous.

    Some speakers (f0202, m0208 and m0211) had a whistle, mainly on /s/ sounds. This did not affect intelligibility, but may have an effect on the performance of an ASR system.

    Speaker m0205had a slight alveolar lisp, affecting /s/ and /z/ sounds.

    Speaker m0217 had a slight lateral list, making /l/ sound like a fricative IPA / Â /; /K/ in SAMPA (like Welsh ?ll?).

    Speaker m0218 had quite a severe palatal lisp, affecting all sounds produced on the roof of the mouth. This may be a natural feature, or he may have been wearing a tooth brace, the framework of which fits into the roof of the mouth, and can cause this type of speech.

    Speaker

    Total words

    Correct

    Acceptable variant

    Mispronunciation

    Significant Errors

    Error penalty

    Final score

    absol

    %

    absol

    %

    absol

    %

    absol

    %

    m0203

    119

    35

    29.41%

    39

    32.77%

    3

    2.52%

    42

    35.29%

    0.57

    0.52

    f0203

    118

    37

    31.36%

    24

    20.34%

    5

    4.24%

    52

    44.07%

    0.63

    0.50

    shaky breaths; nervous?

    m0202

    117

    34

    29.06%

    30

    25.64%

    6

    5.13%

    47

    40.17%

    0.63

    0.46

    m0214

    119

    36

    30.25%

    23

    19.33%

    10

    8.40%

    50

    42.02%

    0.68

    0.44

    m0208

    119

    35

    29.41%

    22

    18.49%

    10

    8.40%

    52

    43.70%

    0.70

    0.42

    /s/ whistle

    f0205

    115

    32

    27.83%

    24

    20.87%

    9

    7.83%

    50

    43.48%

    0.70

    0.40

    f0204

    117

    34

    29.06%

    24

    20.51%

    14

    11.97%

    45

    38.46%

    0.73

    0.40

    m0217

    118

    36

    30.51%

    14

    11.86%

    16

    13.56%

    52

    44.07%

    0.77

    0.40

    slight lateral lisp - /l/

    m0218

    118

    34

    28.81%

    17

    14.41%

    15

    12.71%

    52

    44.07%

    0.77

    0.38

    severe palatal lisp /s z S Z l /

    f0206

    120

    31

    25.83%

    24

    20.00%

    6

    5.00%

    59

    49.17%

    0.69

    0.37

    f0202

    118

    34

    28.81%

    19

    16.10%

    17

    14.41%

    48

    40.68%

    0.78

    0.37

    cold? Slight /s/ whistle

    m0207

    117

    31

    26.50%

    24

    20.51%

    10

    8.55%

    52

    44.44%

    0.72

    0.37

    m0211

    117

    31

    26.50%

    20

    17.09%

    17

    14.53%

    49

    41.88%

    0.79

    0.33

    cold? /s/ whistle

    f0201

    127

    31

    24.41%

    18

    14.17%

    8

    6.30%

    70

    55.12%

    0.75

    0.33

    m0205

    119

    29

    24.37%

    21

    17.65%

    11

    9.24%

    58

    48.74%

    0.76

    0.32

    shaky breaths; nervous?

    m0210

    121

    29

    23.97%

    22

    18.18%

    10

    8.26%

    60

    49.59%

    0.75

    0.32

    m0201

    117

    26

    22.22%

    28

    23.93%

    6

    5.13%

    57

    48.72%

    0.71

    0.31

    m0216

    120

    30

    25.00%

    19

    15.83%

    17

    14.17%

    54

    45.00%

    0.81

    0.31

    v deep voice; non-fluent reader

    m0204

    118

    27

    22.88%

    23

    19.49%

    8

    6.78%

    60

    50.85%

    0.74

    0.31

    m0213

    119

    30

    25.21%

    25

    21.01%

    22

    18.49%

    42

    35.29%

    0.83

    0.30

    m0206

    118

    28

    23.73%

    21

    17.80%

    13

    11.02%

    56

    47.46%

    0.78

    0.30

    shaky breaths; nervous?

    m0209

    119

    29

    24.37%

    19

    15.97%

    16

    13.45%

    55

    46.22%

    0.81

    0.30

    m0215

    118

    27

    22.88%

    26

    22.03%

    12

    10.17%

    53

    44.92%

    0.76

    0.30

    slight lisp?

    f0207

    119

    27

    22.69%

    26

    21.85%

    12

    10.08%

    54

    45.38%

    0.76

    0.30

    m0212

    120

    28

    23.33%

    18

    15.00%

    17

    14.17%

    57

    47.50%

    0.83

    0.28

    Table 3 Children's pronunciation scores

  • List of phoneme symbols used
  • This list contains all the English and Italian phoneme symbols, and shows both the SAMPA and IPA symbol, as well as a reference word for each language, as appropriate. Where I have used symbols from other languages to express Italianate pronunciations of English words, these are shown after the main sections.

    Vowels

    Reference words
    SAMPA IPA description English Italian

    i: close front unrounded (long) fleece
    i
    i close front unrounded (unstressed) happy mite
    I
    I near-close near-front unrounded kit
    e e close-mid front unrounded rete
    E
    E open-mid front unrounded dress meta
    { Q near-open front unrounded trap
    A A
    ù open back unrounded (long) start
    a a open front unrounded rata
    Q  open back rounded lot
    O: 
    ù open-mid back rounded (long) thought
    O  open-mid back rounded moto
    o o close-mid back rounded dove
    U U near-close near-back rounded foot
    u: u
    ù close back rounded (long) goose
    u u close back rounded (unstressed) influenza muto
    V Ã open-mid near-back unrounded strut
    3: Î
    ù open-mid central unrounded nurse
    @ « schwa; mid-central (unstressed) comma, letter
    aI a
    I rising diphthong price
    eI
    eI rising diphthong face
    OI 
    I rising diphthong choice
    @U « U falling diphthong goat
    aU a U falling diphthong mouth
    I@
    I « centring diphthong near
    e@
    e « centring diphthong square
    U@ U « centring diphthong cure

    9 ¿ open-mid front rounded (rounded /E /) German plötzlich

    Consonants

    SAMPA IPA description Reference word
    English Italian

    p p voiceless/fortis bilabial plosive pin pane
    b b voiced/lenis bilabial plosive bin banco
    t t voiceless/fortis alveolar/dental plosive tin tana
    d d voiced/lenis alveolar/dental plosive din danno
    k k voiceless/fortis velar plosive kin cane
    g g voiced/lenis velar plosive give gamba
    pp p p voiceless bilabial geminate plosive coppa
    bb b b voiced bilabial geminate plosive gobba
    tt t t voiceless dental geminate plosive zitto
    dd d d voiced dental geminate plosive cadde
    kk k k voiceless velar geminate plosive nokka
    gg g g voiced velar geminate plosive fugga

    f f voiceless labiodental fricative fine fame
    v v voiced labiodental fricative vine vano
    T T voiceless dental fricative thin
    D D voiced dental fricative this
    s s voiceless alveolar fricative sit sano
    z z voiced alveolar fricative zoo sbaglio
    S S voiceless palato-alveolar fricative shine scendo
    Z Z voiced palato-alveolar fricative pleasure
    h h voiceless glottal fricative hit
    ff f f voiceless labiodental geminate fricative beffa
    vv v v voiced labiodental geminate fricative bevvi
    ss s s voiceless alveolar geminate fricative cassa
    SS S S voiceless palato-alveolar geminate fricative ascia

    ts t s voiceless alveolar affricate zitto
    dz d z voiced alveolar affricate zona
    tS t S voiceless palato-alveolar affricate chip cena
    dZ d Z voiced palato-alveolar affricate gin gita
    tts t t s voiceless alveolar geminate affricate bozza
    ddz d d z voiced alveolar geminate affricate mezzo
    ttS t t S voiceless palato-alveolar geminate affricate braccio
    ddZ d d Z voiced palato-alveolar geminate affricate oggi

    m m bilabial nasal mind molla
    n n alveolar nasal nine nocca
    J ø palatal nasal gnocco
    N N velar nasal sing
    mm m m geminate bilabial nasal grammo
    nn n n geminate alveolar nasal panna
    JJ ø ø geminate palatal nasal bagno
    n= syllabic /n/ button

    l l alveolar lateral lip lama
    L ´ palatal lateral gli
    ll l geminate alveolar lateral colla
    LL ´ ´ geminate palatal lateral foglia
    l= syllabic /l/ bottle

    4 R alveolar tap/flap rain
    r\ ? alveolar approximant
    r r alveolar trill rete
    rr r r geminate alveolar trill ferro
    w w labial-velar approximant wine uomo
    j j palatal approximant young ieri

    B B voiced bilabial fricative Spanish cabo
    K Â voiceless alveolar lateral fricative Welsh llan
    j\ Æ voiced palatal fricative

    Prosodic

    SAMPA IPA Description

    : ù length mark
    " È primary stress
    % Ç secondary stress
    . . syllable boundary; only used where I need to mark this explicitly (e.g. between two vowels that might otherwise be construed as a diphthong).

    Diacritics

    Where SAMPA (or extended SAMPA) already has a symbol, I have used this, but generally put it in superscript. E.g. SAMPA recommends using -0 to indicate that a phonemically voiced sound is devoiced; I have used ?0 for this.

    -h to indicate that the sound (usually a fortis plosive) is not aspirated [t-hOI] (toy)

    -0 to indicate devoicing [irz-0] (ears)

    +v to indicate a strong vocalic offset to a consonant [dQg +v ] (dog ?uh)

    +g to indicate the release of the velar nasal ["ridIN g ] (reading)