Analysis of Italian children's English pronunciation
COLLECTED BY
Organization:
Alexa Crawls
Starting in 1996,
Alexa Internet
has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the
Wayback Machine
after an embargo period.
this data is currently not publicly accessible.
The Wayback Machine - https://web.archive.org/web/20070527173328/http://www.eee.bham.ac.uk:80/russellm/ItalianEnglishReport/ItalianEnglish_report_v2.htm
Analysis of Italian children?s English pronunciation
Some remarks about learning to speak a second language
The variety of English discussed in this report is that often known as Standard British English (SBE). By this I mean English that betrays little of the geographical or social origins of the speaker. There is of course a very wide range of regional (and personal) variation within English, and where appropriate I will mention these.
When a person learns to speak a second language some time after they have acquired their native tongue (rather than at the same time, as in bilingual children), we would expect that aspects of their native language would interfere with the pronunciation of the new language. There are two principal ways in which this might manifest itself. Firstly, speakers may try to apply the spelling-to-sound rules of their native language to the new language, possibly resulting in an inappropriate choice of target phoneme string. Secondly, even if they are aiming at the correct phoneme, the influence of their native phonological system might interfere with their production of sounds in the different system. This happens because when speakers of one language hear a sound in a foreign language they tend to interpret it in terms of their native phonological system, and when they try to produce a foreign sound they usually produce one that is similar to a phoneme in their own language. If the foreign language makes a distinction that is not made in their native tongue, learners often find it difficult to perceive the distinction, and still harder to produce it. In this study we shall see that the English vowel system is much richer than the Italian one, consequently Italian speakers of English tend to have problems making the appropriate distinctions. In addition, phonological processes in the native language, such as devoicing of word-final consonants may creep into the pronunciation of the target language.
In this report I will call the first of these
mispronunciation
, and in simple terms it can be thought of as "aiming at the wrong target". The second will be referred to as
phonological interference
, and this can be thought of as "missing the correct target".
Of course, distinguishing between these is not always straightforward. When an Italian speaker, whose phonological system does not contain the vowel /I/, consistently produces an /i/ as a substitute, then this phonological interference. But if a speaker, who can and does produce the /aI/ phoneme in the appropriate places, but pronounces the vowel in the English word
child
as /I/, then we can be fairly sure that this is a mispronunciation. However, if a speaker pronounces the middle vowel in the word
banana
as /a/, when standard British English would /A:/, then it is hard to tell if the Italian?s pronunciation is a mispronunciation, influenced by the spelling (letter ?a? is always /a/ in Italian), or whether they are trying to produce the English phoneme /A:/, but have actually produced the (very similar!) Italian phoneme /a/ instead. In fact it is probably a bit of both, as, of course, young speakers of a new language probably do not have in their minds an explicit phonological representation of either their native language or the new one.
These two types of interference typically result in different types of pronunciation error. Mispronunciations tend to be more catastrophic in terms of intelligibility, in that it can be impossible to work out what the intended word was (though the context of the utterance often helps to resolve even the most disastrous of attempts!).
Phonological interference results in rather more subtle errors, as the sounds produced generally approximate the "correct" ones in (usually) predictable ways. On the whole this does not result in words becoming unintelligible, but instead gives the resulting speech its "foreign accent". Sometimes, however, this can result in a word becoming confusable with another, so it would be misleading to say that these are trivial errors. An example in the Italian children?s speech would be the almost universal substitution of /i/ for /I/, so that the word
lip
sounds like
leap.
Mispronunciations
Most mispronunciations can be seen as a result of a speaker using the rules that they use in their native tongue (mostly subconsciously) to convert from the written string of letters into the spoken string of sounds that make up the words. This is only true if speakers are using a predominantly text-based approach to learning the target language. Learners who have learnt by imitation, without access to the written form of the new language, will clearly not be influenced by the spelling. However, the children?s speech in this study was read, so it is valid to assume that some degree of "letter-to-sound" conversion is going on. The extent to which speakers rely on this conversion depends on how familiar they are with the words presented to them. Pronunciation of the more familiar words is likely to have been internalised, and so bypasses the "letter-to-sound" rules.
The relationship between spelling and pronunciation in Italian is generally very simple, with one letter corresponding to one sound. Where there are alternatives, these are always determined by context. In contrast, English spelling often bears very little relation to how the word is pronounced. So, in Italian the letter ?a? is always pronounced /a/. In English it can be pronounced /{/ as in
fat
; /eI/ as in
fate
; /A:/ as in
fast
, or /@/ as in
about,
and there are at least six ways of pronouncing the letter combination ?ough?. It is, in general, very difficult to determine which pronunciation is correct for any given word; why is it /A:/
class
, but /{/ in
classic
, for example? (But note that many English accents have /{/ in
class.
)
Here are some examples from the Italian children?s data where Italian spelling rules have been applied:
- the English letter ?u? in can be pronounced /u/, but more often it is /3/ or /V/, while in Italian it is always /u/;
turn
(SBE /t3:n/)
turn_E.wav
was pronounced [turn]
turn_I.wav
by 6 out of 12 children; and
cup
(SBE /kVp/)
cup_E.wav
was pronounced [kup]
cup_u_I.wav
by one speaker;
- the letter ?g? before ?e? or ?i? is /dZ/ in Italian;
finger
(SBE /"fINg@/)
finger_E.wav
was pronounced ["findZ@r]
finger_I.wav
by 4 out of 13 children;
the letter ?c? before ?e? or ?i? is /tS/ in Italian;
voices
(SBE /"vOIsIz/)
voices_E.wav
was pronounced ["vOItSes]
voices_I.wav
by 2 out of 13 children;
the letters ?ch? in Italian are always /k/;
child
(SBE /tSaIld/)
child_E.wav
was pronounced [ki@ld]
child_I.wav
by 1 out of 13 children;
- The letter ?h? is silent in Italian, and there was a great deal of variability in whether or not it was pronounced. However, it should be noted that ?h-dropping? is a feature of many English accents, and even within those, there is a great deal of variation in which words it applies to.
Furthermore, with the exception of ?h?, all written letters are pronounced in Italian. English has numerous ?silent? letters, or letter pairs/triples that are pronounced as one. Here are some examples of where the Italian children made errors by trying to pronounce all of the letters.
- The ?l? in
walk
(SBE /wO:k/) was pronounced by all 13 children [wQlk]
walk_I.wav
(though the quality of the vowel and pronunciation of the initial ?w? varied);
seat_E.wav
was pronounced as two syllables by 3 out of 12 children ["si.at]
seat_I.wav
;
the letter combination
?gu?
was pronounced [gw], so
guards
(SBE /gA:dz/)
guards_E.wav
was pronounced [gwardz] by 5 out of 13 Italian children; and
guide
(SBE /gaId/) was pronounced [gwid] by 6 out of 13 children
gu-gw_I.wav
;
boiled
which in SBE is a single syllable, /bOIld/ was pronounced with two syllables, ["bOIl.Ed]
boiled_I.wav
by 6 out of 12 children;
friend
(SBE /frEnd/)
friend_E.wav
was pronounced ["fri.End]
friend_I.wav
by 5 out of 12 children;
puzzle
(SBE /"pVzl=/) was pronounced ["pus.le]
puzzle_I.wav
by 2 out of 13 children.
Other problems arose where the English has letters that Italian does not have. Although it is usually fairly straightforward to learn these, they can be a source of uncertainty. The letters ?j?, ?k?, ?x?, ?y? and ?w? do not occur in native Italian words, though they do appear in foreign words, and ?w? in particular gave rise to errors. Some of the Italian children pronounced it correctly and consistently, as /w/; while others pronounced it both /w/ and /v/, sometimes in the same word, so
window_E.wav
was pronounced ["windQv], ["vindo] or ["vindQv] by 8 out of 25 children; and
yellow
(SBE /"jEl@U/) was pronounced ["jElQv] by 3 out of 13 children
w-v_I.wav
.
Some words caused nearly all the children problems:
- bird
and
birthday,
in which the ?ir? vowel (SBE /3:/) was pronounced correctly by less than half the children; the most common attempts being [9r], [Vr], [er], and [ir]. So,
birthday
(SBE /?b3TdeI/)
birthday_E.wav
was very variable in the Italians? English
birthday_I.wav
;
The vowel in
chair
and
hair
(SBE /e@/)
chair_E.wav
was also very variable; although more than half produced an acceptable variant ([e@r] or [er]), many made it two syllables; [ai.@r], or [ei.@r] being the most common
chair_I.wav
;
The most common pronunciation of
ears
(SBE /I@z/)
ears_E.wav
was [erz], but [irs] was also common, and many added an initial /h/
ears_I.wav
;
The vowel in
coat, road
and
boat
(SBE /@U/)
oa_E.wav
was subject to a wide range of pronunciations, ranging from [Q:], the most common, through [A:], [O], [o.a], [{] and [u]
coat_I.wav
,
boat_I.wav
,
road_I.wav
.
- Phonological Interference
When investigating the influence of a speaker?s native tongue on their pronunciation of English, it makes sense to start with a comparison of the phoneme systems of the two languages in question in order to provide a framework for describing any potential problems. Therefore, in what follows I have compared the structures of the English and Italian phoneme systems, and then picked out a few examples from the data to illustrate how the phonological structure of Italian has influenced the children?s English pronunciation.
Standard British English is usually regarded as having 44 phonemes, while Italian has 50. In addition the two system differ in how these systems are made up. Italian has a very simple 7-vowel system, while English has 20 vowels. Italian has a richer consonant system. Table 1 summarises the two consonant systems and also shows how many of the consonants are shared by both, as well as how many are unique to English and Italian. Table 2 provides the same information for the vowels.
Consonants
Standard British English has 24 consonants: 17 obstruents (6 plosives, 9 fricatives and 2 affricates), and 7 sonorants (3 nasals, 2 liquids and 2 semivowel glides).
The Italian system consists of 43 consonants: 29 obstruents (12 plosives, 9 fricatives, and 8 affricates) and 14 sonorants (6 nasals, 6 liquids and 2 semivowels).
Between them English and Italian have 48 consonants, of which 19 (40%) are shared. Table 1 below breaks this down into consonant types, and also shows how many consonants are common to both languages. The low percentage of shared consonants is due largely to the existence in Italian of geminate forms of nearly all the consonants. A geminate consonant is written with a double letter in the spelling; it is pronounced much more forcefully and is usually longer than a single one.
Consonants
|
|
|
|
|
|
|
|
English
|
Unique to English
|
Italian
|
Unique to Italian
|
Eng+Ita
|
Shared
|
%shared
|
Plosives
|
6
|
0
|
12
|
6
|
12
|
6
|
50.00
|
Fricatives
|
9
|
4
|
9
|
4
|
13
|
5
|
38.46
|
Affricates
|
2
|
0
|
8
|
6
|
8
|
2
|
25.00
|
Nasals
|
3
|
1
|
6
|
4
|
7
|
2
|
28.57
|
Liquids
|
2
|
0
|
6
|
4
|
6
|
2
|
33.33
|
Semivowels
|
2
|
0
|
2
|
0
|
2
|
2
|
100.00
|
Total
|
24
|
5
|
43
|
24
|
48
|
19
|
39.58
|
Table 1
Summary of structure of English and Italian consonant systems
Obstruents
Obstruent sounds are produced by obstructing the vocal tract at some point, and then either releasing the blockage (resulting in sounds known as plosives), or forcing air past it (resulting in frication). Both English and Italian obstruents are traditionally classified in pairs as voiceless and voiced. Italian also has single/geminate pairs of all the obstruents.
Plosives
The plosive systems of the two languages appear very similar, in that both languages have plosives at bilabial /p, b/, dental/alveolar /t, d/ and velar /k, g/ positions, and voiceless/voiced pairs. However, this superficial similarity disguises the fact that the sounds are produced differently in the two languages. In particular the distinction between voiceless and voiced relies on different cues. In fact, in English the presence of voicing is not actually a reliable cue, and they are perhaps better classified as fortis (strong, roughly equivalent to voiceless) and lenis (weak, roughly equivalent to voiced). In Italian, however, the main cue to voicing is the presence of vocal cord vibration. This can be seen in Figure 1 where the closure of the initial /d/ in the word
dog
is clearly voiced in the Italian child?s production.
Figure 1
Spectrograms of the word
dog
spoken by a native English child (left) and an Italian child (right) showing the difference in the initial /d/, where the Italian version has a clear voice bar in the closure. Note also a strong vocalic offset that looks like a second syllable (see Page
*
).
Aspiration is the main cue to the fortis-lenis distinction in English plosives, with the release of fortis (voiceless) plosives being accompanied by an audible period of frication, whose duration and intensity vary with context. Italian speakers often fail to produce the aspiration for the fortis plosives, and this can cause confusion between these and the lenis plosives. This can be seen in Figure 2, and this did result in making the words confusable with other words.
Figure 2
Spectrograms of the word
town
spoken by a native English child (left) and by an Italian child (right) showing the lack of aspiration of the initial /t/ in the Italian version.
Italian also has geminate plosives corresponding to the single ones, /pp bb tt dd kk gg/, and these tend to have a longer closure and more forceful release than their single counterparts. It is possible that Italians may use the geminate form when pronouncing English words spelt with double letters, under the influence of the orthography. However, there are not enough words spelt with double letters in the data to allow this to be examined.
- The only geminate plosive in this data was in the word
apple
, and in all cases the closure was at least 50% longer than the average English /p/ closure.
Fricatives
The fricative systems of the two languages overlap to some extent, with both having unvoiced and voiced labio-dental, /f,v/, and alveolar /s, z/ fricatives, and unvoiced palato-alveolar /S/. English also a voiced palatal alveolar /Z/, and unvoiced glottal fricative /h/, which are not present in Italian. However, Italian has geminate versions of all except /z/, and the comments made for the plosives may apply in this respect. However, there are a number of ways in which the systems differ that may be of significance.
Italian has no equivalent of the English dental fricatives /T and D/, so these sounds could be a source of error, as the Italian children tended to produce approximations that depended on context. Many of the children in this data did produce satisfactory /T/ and /D/, but there was a great deal of variation in this, both between speakers and even for different words uttered by the same speaker.
The /T/ sound was pronounced either [t] or [f], so we see:
- the word
three
(SBE /Tri:/)
three_E.wav
pronounced [fri]
three_f_I.wav
by 7 children; but [tri] by 5 children. Note that both these result in alternative words (
free
and
tree
), so are potentially non-trivial errors.
Some even used different pronunciations in the same word on different occasions:
- The word
three
was pronounced [Tri] when pronounced in isolation, but [fri] when part of the phrase
three old books
by 2 of the 12 speakers.
If the /D/ sound was not pronounced correctly, then it was most often mispronounced [d]:
- The word-initial cluster ?sm?, common in English words, e.g.
small
small_E.wav
is not found in Italian, and most of the children voiced the /s/ to some extent,
small_I.wav
.
Affricates
English has only one pair of affricates, /tS, dZ/ while Italian has two pairs, the English ones plus /ts, dz/ as well as the single-geminate distinction. There were no problems associated with the affricates in this data.
Sonorants
All sonorants, are, by definition, voiced. Again Italian has both single and geminate versions of all the sonorants, except for the semivowels, and again it is likely that there will be some geminate pronunciations where the spelling has a double letter.
Nasals
The nasal system in the two languages is slightly different, but there are similarities, as English and Italian both have bilabial /m/ and alveolar /n/ nasals. But note the following differences:
In Italian the velar nasal /N/ is an allophone of /n/ before a velar, whereas in English it is considered to have full phonemic status. However, this caused few problems. The most frequent place where the velar nasal occurs in English is in the affix ?-ing?, and even within native English speakers there is a great deal of variation in whether this is actually pronounced [n] or [N]. And this was also the case for Italian speakers.
However, they also tended to use /n/ for /N/ in other situations, where SBE would always have /N/:
king_E.wav
, but half of the Italian children said [kin]
king_n_I.wav
.
- There was also a strong tendency for the Italian children to have a strong audible release of the velar nasal [N
g
]
king_g_I.wav
. This is also a feature of some English accents, so does not seem particularly alien.
Liquids
Apart from the existence of Italian geminate liquids, and the usual comments that apply to those, the two systems are broadly comparable, both having a lateral /l/ and alveolar /r/.
In acoustic phonetic terms the difference between English and Italian /r/ is substantial. The realisation of the sound traditionally given the symbol /r/ in English in fact varies greatly between English speakers, and across phonemic contexts. Many speakers of Standard British English have either a tap, represented in SAMPA by /4/ (IPA /
R
/) or an approximant /r\ / (IPA [
?
]); while other accents may have a trill /
r
/. In Italian /r/ is always a trill. Acoustically, a trilled /r/ is quite different from a tapped one; there are usually a number of very short stops (silences) corresponding to the hits of the tongue against the roof of the mouth, while in an English tap the closure is rarely complete enough to show a single stop. This difference can be seen clearly in Figure 3, and heard here in the word
parrot
; compare SBE
parrot_E.wav
with Italian English
parrot_I.wav
. Although this causes no intelligibility problems for a human listener, ASR models that are based on the English variant may fail on the Italian version (and incidentally, would also fail on those accents of English where /r/ is trilled).
In SBE and many other accents an ?r? after a vowel in the orthography is not pronounced, so
star
is /stA:/
star_E.wav
. In Italian it is always pronounced, and all the children in this data pronounced it to some extent, so
star
is [star]
star_I.wav
. However, Scottish and Irish, and many variants of American English do pronounce a post-vocalic ?r?, so if any Italian speaker does this it will not be misunderstood. For this reason, these are not counted as full errors in the scoring system used. In addition, it is worth noting that ASR models based on US English will probably already be accommodating this difference.
Figure 3
Spectrograms showing intervocalic /r/ in the word
parrot.
On, the left is a native English speaker, showing no closure; on the right is an Italian speaker, clearly showing the trill as a series of rapid stops. Note also the difference in the initial /p/.
In SBE there are two allophonic (context-dependent) variants of /l/, often referred to as light and dark /l/. Usually a light /l/ occurs syllable-initially, as in
light
or
leave
, and a dark /l/ syllable-finally or before a consonant, as in
pool
or
milk
and (for some speakers) before a back vowel, as in
look
. In Italian a light /l/ generally is used in all contexts. This can mean that Italian /l/ sounds a little odd to English ears. In this data this is particularly noticeable in the word
child
, where most accents of English would use a dark /l/. But the differences are acoustically subtle, and once again there are accents of English that do not have this allophonic variation, so it can not be viewed as a serious error.
Semivowels
See Both Italian and English have two semivowels /w/ and /j/. There are no geminate versions. They are very similar in both languages, and were not the source of any errors in the children?s speech.
Vowels
Table 2 summarises the two systems, while the rest of this section describes them in more detail. As can be seen from this table there is very little overlap between the vowel systems, with only 11.5% of vowels shared by both languages. In addition, 19 vowels are unique to English.
It is in the vowels that I expect the greatest problems to arise for Italian speakers of English. The Italian vowel system is very simple, consisting of seven monophthongal vowels, /i e E a O o u/. In contrast, the English system consists of twenty-two vowels, comprising fourteen monophthongs / i: i I E { A: Q O: U u: u V 3: @/ and eight diphthongs /eI aI OI aU @u I@ e@ U@/, though it can be hard to draw a firm line between the two. So the Italian children have to learn to make many more distinctions than they are used to. This tends to lead to some of them using the same Italian vowel to cover many of the English vowels.
Vowels
|
|
|
|
|
|
|
|
|
English
|
Unique to English
|
Italian
|
Unique to Italian
|
Eng+Ita
|
Shared
|
% shared
|
Monoph
|
14
|
11
|
7
|
4
|
18
|
3
|
16.67
|
Diphthongs
|
8
|
8
|
0
|
0
|
8
|
0
|
0.00
|
Total
|
22
|
19
|
7
|
4
|
26
|
3
|
11.54
|
Table 2
Summary of structure of English and Italian vowel systems
Monophthongs
English is often described as having five long monophthong vowels /i:/, /A:/, /O:/, /u:/ and /3:/; but because of the contextual effect on duration of a following consonant, (a notional short vowel before a lenis consonant can in fact be longer than a so-called long vowel before a fortis one) the length distinction is somewhat spurious, so the Italian counterparts of three of these, /i/, /O/ and /u/ are in general accurate enough pronunciations not to be noticeable as foreign.
English has two unrounded vowels in close (or near-close) front position: /i:/ and /I/. This distinction is not made in Italian, and almost universally /I/ is pronounced [i]:
lip_E.wav
sounded like
leap
[lip]
lip_I.wav
for all but one of the children (who pronounced it [laIp]);
king
(SBE /kIN/)
king_E.wav
was pronounced [kiN]
king_g_I.wav
or sounded like
keen
[kin]
king_n_I.wav
if they also failed to produce a velar nasal.
A similar distinction between the two close (or near close) rounded back vowels /u:/ and /U/ is also made in most varieties of English, but not in Italian, and once again /u/ is most often the sound used for both. However, there are accents of English that also do not make the /u:/-/U/ distinction (Scottish accents in particular), so while this substitution makes a person?s speech sound non-standard, it would not make it unintelligible, or even sound particularly foreign, so it does not constitute a serious pronunciation error.
- The words
foot
(SBE /fUt/)
foot_E.wav
and
book
(SBE /bUk/)
book_E.wav
were pronounced [fut]
foot_I.wav
and [buk]
book_I.wav
, respectively by all children;
One significant difference is that Italian appears to have no central vowels, with all the vowels being at the edges of the "vowel space". English has at least two central vowels, /3:/ and /@/. In English the /3:/ vowel always occurs before the letter ?r?, which in SBE is not pronounced. In Italian (in common with many accents of English) such an ?r? is always pronounced. Again, this is not considered a serious error, as it never compromises intelligibility.
- The most common rendering of /3:/, as in
bird
, SBE /b3:d/
bird_E.wav
by the Italian children was [b3rd]
bird_I.wav
.
In addition, in modern SBE the vowel /V/ is also more central than back, and is most often pronounced /a/: by the Italian children.
cup_E.wav
is [kap]
cup_I.wav
for 11 of the 13 speakers;
mother_E.wav
is ["mad@r] or ["maD@r]
mother_I.wav
for all but one of the children.
English has 2 unrounded vowels in open position front /{ / and back /A:/ - Italian has neither, and tends to use /a/ (slightly more open than /{/,and more front than /A:/) for both.
bath_E.wav
was pronounced [baT],
bath_f_I.wav
by 5 of the 13 children; the others used [{],
bath_t_I.wav
which is a quite acceptable accent variation of the vowel in this word;
car
(SBE /kA:/ )
car_E.wav
was pronounced [kar]
car_I.wav
by all the children.
Sometimes [E] was used for /{/, so
bag_E.wav
sounds like
beg
[bEg]
bag_I.wav
in the speech of 7 of the 13 children;
for all but 2 of the children
man
(SBE /m{n/)
man_E.wav
sounds like
men
[mEn]
man_I.wav
.
The vowel /Q/ in
dog
was generally pronounced accurately, in spite of Italian having no equivalent. Where it wasn?t, either the unrounded counterpart /A/ was used, or the Italian /a/ tended to be used.
Where the /O:/ vowel occurred before an ?r? in the spelling, for example in the word
door
, SBE /dO:/
door_E.wav
, the Italian children tended to shorten it, and of course pronounced the /r/, sometimes producing an /Q/-like vowel
door_I.wav
. Where /O:/ wasn?t followed by ?r? the Italian children?s pronunciation was accurate, though the length tended to be shorter than that generally heard in SBE.
Many of the features described above of the Italian children?s pronunciation of the English vowels are illustrated in the vowel plots in Figure 4. A vowel plot is a scatter graph of the frequency of the second formant (F2) against the first (F1), and it shows graphically the distribution of the vowels within the vowel space. In particular this plot shows where the vowels the Italian children produced were close to the target English pronunciation, and where there are differences.
Figure 4
Vowel plot showing the average formant values for native English anf Italian speakers of English
In this plot each vowel is represented by a different colour; the English children?s vowels are shown as diamonds, while the Italian children?s values are shown as asterisks
The plot shows that for three of the English vowels that have (near) Italian equivalents: /i/ as in fleece (dark blue), /E/ (gold) in dress, /A/ as in start (red) the Italian children?s pronunciation is very close to that of the native English speakers. For the vowel /O:/ (thought- brown), which has an Italian counterpart in /O/, the Italian values for both f1 and f2 are a little lower than the English ones. This may be due to the fact that the majority of words containing the /O/ vowel are spelled with a following ?r?. The English children do not pronounce the /r/, while the Italian children do; the effect of an /r/ is to lower the surrounding formants, and this may be why the Italian values are lower. The same may apply to the /3/ (nurse - black) vowel, although there was a great deal of variation on how this vowel was pronounced.
The failure of the Italian children to distinguish between /i/ (fleece ? dark blue) and /I/ (kit ? pink), mentioned above can be seen clearly - the Italian /i/ and /I/ both cluster in the same place, with the English /i/.
Similarly, the Italians? lack of distinction between /u/ (goose ? green) and /U/ (foot- orange) also mentioned above, is evident in that the green and orange asterisks are very close to each other. They are not, however, close to the English values for either of these vowels. The reason for this is not that the Italians? pronunciation is not accurate, but rather reflects coarticulatory effects that cause the second formant to be higher in these vowels for the English speakers than it would be in careful isolated speech. The coarticulatory effects tend to be stronger in the native speakers because they are not focussing on their pronunciation, and some of them spoke them very rapidly. The Italian speakers tended to speak more slowly, and were focussing on their pronunciation.
Diphthongs
There are 8 diphthongs in English, while Italian has none. However, the exact quality of the sounds in English may be very variable, both between phonetic context and speaker/accent group. As well as a range of starting and ending positions being acceptable for these vowels, for some speakers and in certain contexts there may be no movement within the diphthong, making it phonetically a monophthong. However, the majority of native English speakers do not perceive these differences, so if an Italian speaker produces a monophthong in place of a diphthong this may go unnoticed.
Of the rising diphthongs, /aI/ and /OI/ are relatively stable, and posed few problems for the Italian speakers. The /eI/ diphthong is often realised by native English speakers as /e:/, particularly in northern dialects ? this is very similar to the Italian /e/, and this is the vowel that was used by many children:
- half the children pronounced
cake
, SBE /keIk/
cake_E.wav
, as [ke:k]
cake_I.wav
;
a third of them used an [e] vowel in one or other of the syllables of
railway
.
When the centring diphthongs occur before /r/, there is a great deal of regional (and personal) variation in how they are pronounced, with rhotic accents having a monophthong with a quality close to the starting point of the diphthong, followed by /r/. As the post-vocalic /r/ is also pronounced in Italian, it is not surprising that the children tended to produce a monophthong plus /r/.
- the vowel sound in
bear
(SBE /e@/, but increasingly pronounced [e:] even by speakers of the standard accent) was rendered [er] by 10 of the 25 children, and a similar picture was seen for
wear
;
- for the word
near
, (SBE /nI@/)
nearandopen_E.wav
the most common pronunciation was [nir]
near_ir_I.wav
, which was produced by 7 out of 12 children. A further 3 said [ner]
near_er_I.wav
.
The falling diphthong /aU/ was generally pronounced acceptably most of the time, although sometimes the final quality was more [u]-like, this is a very subtle difference, and unlikely to cause problems. The other falling diphthong, /@U/, was more variable, as indeed it is in English.
old_E.wav
was pronounced [Qld]
old_I.wav
by 9 out of 12 children;
the initial vowel of the word
open
, SBE /?@Up@n/
nearandopen_E.wav
was pronounced [Q]
near_ir_I.wav
by a quarter of the speakers, and [o] by a further quarter
near_er_I.wav
.
Phonological processes
Reduction
English is often described as a stress-timed language, as there is a tendency for stressed syllables to occur at regular intervals. Because there can be a very variable number of unstressed syllables between these stressed syllables, some of them tend to get reduced in order to fit them into the rhythm.
Reduction
can lead to vowels being centralised, that is, the articulators do not reach their full positions. Many unstressed vowels end up as [@], also known as schwa, or may even disappear altogether, a process known as
elision
.
Italian, on the other hand, is often described as a syllable-timed language, which means that there is a tendency for syllables to occur at regular intervals. This means that there is less pressure on some syllables to fit into a shorter interval, so reduction is not so widespread as it is in English, and Italian vowels tend to keep their full articulation. When Italians speak English, one of the things they find most difficult to master is the appropriate amount of reduction. So vowels that would be rendered with a schwa-type vowel maintain their full sound, and it is this that gives Italian English its particular rhythmic and melodic qualities. This was evident to some extent in nearly every polysyllabic word in this data, but here are some examples:
- lemon
was pronounced ["lEmQn]
lemon_I.wav
rather than the typical SBE ["lEm@n]
lemon_E.wav
;
television
, which in SBE is usually ["tElIvIZ@n/
television_E.wav
was most frequently pronounced [tEle"viZQn]
television_I.wav
. (And note also that the stress was placed on the wrong syllable by all but one of the children.);
parrot
, SBE /"p{r@t/
parrot_E.wav
was ["p{rot]
parrot_Q_I.wav
;
intelligent
, which in SBE is /In"tElIdZ@nt/
intelligent_E.wav
, was commonly pronounced [in"tElidZEnt]
intelligent_I.wav
;
the Italian children tended to pronounce words ending in ?er?, ?ar? or ?or?, which in English are generally pronounced with a /@/, with a full vowel, and usually the /r/was pronounced too, so
water
(SBE /"wO:t@/
water_E.wav
) is pronounced ["wOtEr]
water_I.wav
, and
sugar
(SBE /SUg@/
sugar_E.wav
) is ["Sugar]
sugar_I.wav
.
Dictionaries of English represent this reduction in the pronunciation they give, and if these are used unmodified in an ASR system for Italian speakers, there is a strong likelihood that their pronunciations will not match the system?s representation, which is expecting a shorter, less intense schwa-type vowel.
Syllabic consonants
A further consequence of the reduction process in English is the appearance of syllabic consonants. Usually a syllable must have a vowel at its centre, but if that vowel becomes elided through extreme reduction, then a consonant may take on the role of syllabic nucleus. Although almost any consonant can become syllabic in this way, the most frequent syllabic consonants are nasals, especially /n/, and liquids, such as /l/, since these already share a vowel like structure. Again English dictionaries include syllabic /n=/ and /l=/ in their pronunciations, and this may well be a source of mismatches as many of the Italian children tended to pronounce a full, stressed vowel in such situations.
- the final syllable in
table
and
bottle
was most frequently [bVl], but also [bQl] and [bUl], where in English it would generally be /bl=/. Compare native English /"teIbl=/
table_E.wav
and /"bQtl=/
bottle_E.wav
with Italian English
table_I.wav
and
bottle_I.wav
;
seven
, which in SBE is /"sEvn=/
seven_E.wav
was pronounced ["sEvEn]
seven_I.wav
by all the speakers.
Final devoicing
In many languages there is a tendency to devoice utterance-final consonants. In English spoken by Italians many phonemically voiced consonants were devoiced, even within a phrase. I don?t know whether this is a consequence of an often not very fluent reading style (so that words within a phrase were pronounced with noticeable pauses between them), or whether this is a feature of Italian which is leaking into their English.
Some comments on supra-segmental features
I have already mentioned the stress-timed nature of English in contrast with syllable timed Italian as giving Italian English its characteristic rhythmic properties. More detailed comment on the intonation patterns is not possible due to the nature of this data. Reading lists of isolated words and short more-or-less meaningless phrases are not conducive to the production of natural intonation patterns, irrespective of whether or not the speaker is native or non-native.
However, one observation can be made about the influence of the Italian word structure. Italian words end in a vowel, so for some speakers there was a tendency to have strong vocalic offset to final consonants. See Figure 1 above and compare SBE
dog_E.wav
with Italian
dog_I.wav
, and SBE
food_E.wav
with Italian
food_I.wav
. This was particularly common in the isolated words, but did also occur within phrases. Where this offset is strong enough to be construed as another syllable, then it could cause ASR errors, even though it would be unlikely to pose any problem to a human listener.
Summary: the four differences most likely to cause ASR problems
Many of the features of the Italian children?s pronunciation of English discussed above are acoustically quite subtle, but obvious to a human ear, while other differences may be acoustically large but go unnoticed by human listeners. It is also possible that differences which seem significant to a human listener are not so for an ASR system, and vice versa. The following are the features of Italian English that I consider likely to be the most problematic for ASR systems.
- Trilled /r/, whose acoustic pattern is quite different from the English tapped /r/.
- Non-aspiration of fortis plosives, leading to mismatch with models that are based on plosives with a distinct period of strong aspiration.
- Vocalic offsets can look like another syllable, which won?t be accounted for by the models.
- Lack of reduction making unstressed vowels longer and of different quality from those in English.
Scoring the children?s pronunciation
A word-by word analysis of each child?s data was carried out, and an error penalty was calculated for each child which took into account the number and severity of the errors they made. Table 3 shows the scores derived for each child. Words were scored as "correct", shown in green, "acceptable variant", shown in blue, "significant error", shown in black, or "mispronunciation", shown in red. The entries in the table are listed from best to worst.
The scoring system
Words were scored as correct when the child achieved a realisation that was very close to the SBE pronunciation. For instance if they pronounced the so-called English long vowels /i: A: O: u: 3:/ as their counterparts /i A O u 3/ this was not penalised, as the distinction is not an absolute one in English, and is subject to considerable accent and personal variation. Note that some pronunciations may still sound slightly "foreign" due to prosodic (intonational and rhythmic) factors, which are difficult to analyse meaningfully for read lists of words and short phrases.
A pronunciation was considered an acceptable variant if it could be considered the result of normal phonological processes (e.g. devoicing before a voiceless consonant, assimilation, etc.), or if it represented a (non-extreme) accent variant, (e.g. pronunciation of post-vocalic /r/, use of /{/ for /A:/ in certain words and the precise realisation of the diphthongs). In other words these are pronunciations that, while non-standard, would not be looked upon as obviously foreign.
Where a speaker was aiming at the correct pronunciation, but what they produced sounded foreign, these were scored as significant errors. This included where an Italian vowel was used in place of an English one (e.g. /i/ for /I/ in words like
kid
and
city
, or /a/ for /V/ in sun) and where lack of aspiration or other compensatory gesture led to fortis plosives being perceived as lenis (e.g. if the initial sound in town sounded like a /d/).
The term mispronunciation was reserved for errors that resulted in the word being unintelligible (e.g. /ki@ld/ for
child
, or /ba:t/ for
boat
), or in the word being confusable with another (common) English word (e.g.
lip
sounding like
leap
, or
coat
like
caught
). These were considered the most severe, as they compromise intelligibility.
When calculating the error penalty, acceptable variants scored 0.5, significant errors 1,and mispronunciations 2. This gives rise to an error penalty that is weighted in favour of near misses, but against unintelligible words, and which corresponds to an intuitive judgement of the relative impact of the different kinds of error.
I have attempted to give each child a "final score", which is the result of dividing the number correct by the error penalty. This has the intuitively sensible effect of giving those with high number of correct and low error penalty a high overall score, and vice versa.
Notes on particular children
Some of the children exhibited factors that may have influenced their pronunciation, and I have noted these in the final column.
Speakers f0202 and m0211 sounded as if they had a cold.
Speaker m0216 had a very much deeper voice than the others. He was also not a very fluent reader.
Some of the speakers were very breathy, particularly before and after each utterance. Speakers f0201, f0203 and m0206 had especially shaky breaths and may have been nervous.
Some speakers (f0202, m0208 and m0211) had a whistle, mainly on /s/ sounds. This did not affect intelligibility, but may have an effect on the performance of an ASR system.
Speaker m0205had a slight alveolar lisp, affecting /s/ and /z/ sounds.
Speaker m0217 had a slight lateral list, making /l/ sound like a fricative IPA /
Â
/; /K/ in SAMPA (like Welsh ?ll?).
Speaker m0218 had quite a severe palatal lisp, affecting all sounds produced on the roof of the mouth. This may be a natural feature, or he may have been wearing a tooth brace, the framework of which fits into the roof of the mouth, and can cause this type of speech.
Speaker
|
Total words
|
Correct
|
|
Acceptable variant
|
Mispronunciation
|
Significant Errors
|
|
Error penalty
|
Final score
|
|
|
|
absol
|
%
|
absol
|
%
|
absol
|
%
|
absol
|
%
|
|
|
|
m0203
|
119
|
35
|
29.41%
|
39
|
32.77%
|
3
|
2.52%
|
42
|
35.29%
|
0.57
|
0.52
|
|
f0203
|
118
|
37
|
31.36%
|
24
|
20.34%
|
5
|
4.24%
|
52
|
44.07%
|
0.63
|
0.50
|
shaky breaths; nervous?
|
m0202
|
117
|
34
|
29.06%
|
30
|
25.64%
|
6
|
5.13%
|
47
|
40.17%
|
0.63
|
0.46
|
|
m0214
|
119
|
36
|
30.25%
|
23
|
19.33%
|
10
|
8.40%
|
50
|
42.02%
|
0.68
|
0.44
|
|
m0208
|
119
|
35
|
29.41%
|
22
|
18.49%
|
10
|
8.40%
|
52
|
43.70%
|
0.70
|
0.42
|
/s/ whistle
|
f0205
|
115
|
32
|
27.83%
|
24
|
20.87%
|
9
|
7.83%
|
50
|
43.48%
|
0.70
|
0.40
|
|
f0204
|
117
|
34
|
29.06%
|
24
|
20.51%
|
14
|
11.97%
|
45
|
38.46%
|
0.73
|
0.40
|
|
m0217
|
118
|
36
|
30.51%
|
14
|
11.86%
|
16
|
13.56%
|
52
|
44.07%
|
0.77
|
0.40
|
slight lateral lisp - /l/
|
m0218
|
118
|
34
|
28.81%
|
17
|
14.41%
|
15
|
12.71%
|
52
|
44.07%
|
0.77
|
0.38
|
severe palatal lisp /s z S Z l /
|
f0206
|
120
|
31
|
25.83%
|
24
|
20.00%
|
6
|
5.00%
|
59
|
49.17%
|
0.69
|
0.37
|
|
f0202
|
118
|
34
|
28.81%
|
19
|
16.10%
|
17
|
14.41%
|
48
|
40.68%
|
0.78
|
0.37
|
cold? Slight /s/ whistle
|
m0207
|
117
|
31
|
26.50%
|
24
|
20.51%
|
10
|
8.55%
|
52
|
44.44%
|
0.72
|
0.37
|
|
m0211
|
117
|
31
|
26.50%
|
20
|
17.09%
|
17
|
14.53%
|
49
|
41.88%
|
0.79
|
0.33
|
cold? /s/ whistle
|
f0201
|
127
|
31
|
24.41%
|
18
|
14.17%
|
8
|
6.30%
|
70
|
55.12%
|
0.75
|
0.33
|
|
m0205
|
119
|
29
|
24.37%
|
21
|
17.65%
|
11
|
9.24%
|
58
|
48.74%
|
0.76
|
0.32
|
shaky breaths; nervous?
|
m0210
|
121
|
29
|
23.97%
|
22
|
18.18%
|
10
|
8.26%
|
60
|
49.59%
|
0.75
|
0.32
|
|
m0201
|
117
|
26
|
22.22%
|
28
|
23.93%
|
6
|
5.13%
|
57
|
48.72%
|
0.71
|
0.31
|
|
m0216
|
120
|
30
|
25.00%
|
19
|
15.83%
|
17
|
14.17%
|
54
|
45.00%
|
0.81
|
0.31
|
v deep voice; non-fluent reader
|
m0204
|
118
|
27
|
22.88%
|
23
|
19.49%
|
8
|
6.78%
|
60
|
50.85%
|
0.74
|
0.31
|
|
m0213
|
119
|
30
|
25.21%
|
25
|
21.01%
|
22
|
18.49%
|
42
|
35.29%
|
0.83
|
0.30
|
|
m0206
|
118
|
28
|
23.73%
|
21
|
17.80%
|
13
|
11.02%
|
56
|
47.46%
|
0.78
|
0.30
|
shaky breaths; nervous?
|
m0209
|
119
|
29
|
24.37%
|
19
|
15.97%
|
16
|
13.45%
|
55
|
46.22%
|
0.81
|
0.30
|
|
m0215
|
118
|
27
|
22.88%
|
26
|
22.03%
|
12
|
10.17%
|
53
|
44.92%
|
0.76
|
0.30
|
slight lisp?
|
f0207
|
119
|
27
|
22.69%
|
26
|
21.85%
|
12
|
10.08%
|
54
|
45.38%
|
0.76
|
0.30
|
|
m0212
|
120
|
28
|
23.33%
|
18
|
15.00%
|
17
|
14.17%
|
57
|
47.50%
|
0.83
|
0.28
|
|
Table 3
Children's pronunciation
scores
List of phoneme symbols used
This list contains all the English and Italian phoneme symbols, and shows both the SAMPA and IPA symbol, as well as a reference word for each language, as appropriate. Where I have used symbols from other languages to express Italianate pronunciations of English words, these are shown after the main sections.
Vowels
Reference words
SAMPA IPA description English Italian
i:
iù
close front unrounded (long)
fleece
i
i
close front unrounded (unstressed) happy mite
I
I
near-close near-front unrounded kit
e
e
close-mid front unrounded rete
E
E
open-mid front unrounded dress meta
{
Q
near-open front unrounded trap
A
A
ù
open back unrounded (long) start
a
a
open front unrounded rata
Q
open back rounded lot
O:
ù
open-mid back rounded (long) thought
O
open-mid back rounded moto
o
o
close-mid back rounded dove
U
U
near-close near-back rounded foot
u:
u
ù
close back rounded (long) goose
u
u
close back rounded (unstressed) influenza muto
V
Ã
open-mid near-back unrounded strut
3:
Î
ù
open-mid central unrounded nurse
@
«
schwa; mid-central (unstressed) comma, letter
aI
a
I
rising diphthong price
eI
eI
rising diphthong face
OI
I
rising diphthong choice
@U
«
U
falling diphthong goat
aU
a
U
falling diphthong mouth
I@
I
«
centring diphthong near
e@
e
«
centring diphthong square
U@
U
«
centring diphthong cure
9
¿
open-mid front rounded (rounded /E /) German plötzlich
Consonants
SAMPA IPA description Reference word
English Italian
p
p
voiceless/fortis bilabial plosive pin pane
b
b
voiced/lenis bilabial plosive bin banco
t
t
voiceless/fortis alveolar/dental plosive tin tana
d
d
voiced/lenis alveolar/dental plosive din danno
k
k
voiceless/fortis velar plosive kin cane
g
g
voiced/lenis velar plosive give gamba
pp
p
p
voiceless bilabial geminate plosive coppa
bb
b
b
voiced bilabial geminate plosive gobba
tt
t
t
voiceless dental geminate plosive zitto
dd
d
d
voiced dental geminate plosive cadde
kk
k
k
voiceless velar geminate plosive nokka
gg
g
g
voiced velar geminate plosive fugga
f
f
voiceless labiodental fricative fine fame
v
v
voiced labiodental fricative vine vano
T
T
voiceless dental fricative thin
D
D
voiced dental fricative this
s
s
voiceless alveolar fricative sit sano
z
z
voiced alveolar fricative zoo sbaglio
S
S
voiceless palato-alveolar fricative shine scendo
Z
Z
voiced palato-alveolar fricative pleasure
h
h
voiceless glottal fricative hit
ff
f
f
voiceless labiodental geminate fricative beffa
vv
v
v
voiced labiodental geminate fricative bevvi
ss
s
s
voiceless alveolar geminate fricative cassa
SS
S
S
voiceless palato-alveolar geminate fricative ascia
ts
t
s
voiceless alveolar affricate zitto
dz
d
z
voiced alveolar affricate zona
tS
t
S
voiceless palato-alveolar affricate chip cena
dZ
d
Z
voiced palato-alveolar affricate gin gita
tts
t
t
s
voiceless alveolar geminate affricate bozza
ddz
d
d
z
voiced alveolar geminate affricate mezzo
ttS
t
t
S
voiceless palato-alveolar geminate affricate braccio
ddZ
d
d
Z
voiced palato-alveolar geminate affricate oggi
m
m
bilabial nasal mind molla
n
n
alveolar nasal nine nocca
J
ø
palatal nasal gnocco
N
N
velar nasal sing
mm
m
m
geminate bilabial nasal grammo
nn
n
n
geminate alveolar nasal panna
JJ
ø
ø
geminate palatal nasal bagno
n= syllabic /n/ button
l
l
alveolar lateral lip lama
L
´
palatal lateral gli
ll
l
geminate alveolar lateral colla
LL
´
´
geminate palatal lateral foglia
l= syllabic /l/ bottle
4
R
alveolar tap/flap rain
r\
?
alveolar approximant
r
r
alveolar trill rete
rr
r
r
geminate alveolar trill ferro
w
w
labial-velar approximant wine uomo
j
j
palatal approximant young ieri
B
B
voiced bilabial fricative Spanish cabo
K
Â
voiceless alveolar lateral fricative Welsh llan
j\
Æ
voiced palatal fricative
Prosodic
SAMPA IPA Description
:
ù
length mark
"
È
primary stress
%
Ç
secondary stress
.
.
syllable boundary; only used where I need to mark this explicitly (e.g. between two vowels that might otherwise be construed as a diphthong).
Diacritics
Where SAMPA (or extended SAMPA) already has a symbol, I have used this, but generally put it in superscript. E.g. SAMPA recommends using -0 to indicate that a phonemically voiced sound is devoiced; I have used
?0
for this.
-h to indicate that the sound (usually a fortis plosive) is not aspirated [t-hOI] (toy)
-0 to indicate devoicing [irz-0] (ears)
+v
to indicate a strong vocalic offset to a consonant [dQg
+v
] (dog ?uh)
+g
to indicate the release of the velar nasal ["ridIN
g
] (reading)