Home Linguistics & Semiotics The weight of VOT and f0 in English stop differentiation: 2L1 bilinguals versus monolinguals
Article Open Access

The weight of VOT and f0 in English stop differentiation: 2L1 bilinguals versus monolinguals

  • ORCID logo EMAIL logo and
Published/Copyright: May 26, 2025
Folia Linguistica
From the journal Folia Linguistica

Abstract

This study focuses on the differences and similarities between 2L1 bilinguals and monolinguals from the perspective of the weight of VOT and f0 as cues in English stop differentiation. The paper takes both Japanese-English and Mandarin-English 2L1 bilinguals into consideration, as f0 has different roles in stop distinguishing in Japanese and Mandarin. The statistical analysis models indicate that both kinds of 2L1 bilinguals show differences from the English monolinguals that cannot be explained by cross-linguistic interaction. Efficiency can satisfactorily explain these observed differences. Neither 2L1 bilingual group differs from the English monolingual group both in the weights of VOT and f0. The Japanese-English 2L1 bilingual group relies less on VOT than the English monolingual group in word list reading and more on f0 than the English monolingual group in text reading, although VOT and f0 have similar weights as cues for stop differentiation in Japanese and English. The Japanese-English 2L1 bilingual group uses one cue differently from the English monolingual group to distinguish between their two first languages. The Mandarin-English 2L1 bilingual group relies on f0 as much as the monolingual group in English stop differentiation although f0 has a lighter weight in Mandarin stop differentiation; this may make the differentiation between Mandarin and English easier.

1 Introduction

Since its proposal in Lisker and Abramson (1964), voice onset time (hereafter VOT) has been discussed in numerous studies. To exemplify, a large body of research focuses on VOT durational differences between monolinguals and bilinguals with two first languages (henceforth 2L1 bilinguals) to explore whether cross-linguistic interaction (hereafter CLI) exists and whether it can explain these differences (Caramazza et al. 1973; Hazan and Boulakia 1993; Macleod and Stoel-Gammon 2005; Magloire and Green 1999; Nathan et al. 1987; Olson 2016). CLI refers to the impact that arises from both the similarities and differences between the target language and any previously acquired language, possibly with some imperfections (see e.g., Odlin 1989; Ringbom 1987; Sharwood Smaith and Kellerman 1986). A related example is that adult learners of a second language often produce VOT values that fall between those of their native language and the target language, indicating CLI from their native language to the target language (Flege 1991; Flege and Port 1980; Laeufer 1997; Zampini 2013). However, studies on 2L1 bilinguals have provided distinct conclusions: (i) 2L1 bilinguals have similar VOT durations for each language as their monolingual peers (Magloire and Green 1999); (ii) 2L1 bilinguals have VOT durations somewhere between the two monolinguals (Williams 1977); and (iii) 2L1 bilinguals have more extreme VOT durations than both monolinguals (Whitworth 2000). It is important to state regarding the study that a 2L1 bilingual is defined as a bilingual speaker who has two first languages, both acquired naturally before the age of three and in which they are fluent (Genesee and Nicoladis 2007; McLaughlin 1978). In addition to VOT, fundamental frequency at vowel onset following a stop (henceforth f0) has also been given much attention. However, research from this perspective cannot give a conclusive answer either: some studies claim that 2L1 bilinguals have similar f0 values as monolinguals (Altenberg and Ferrand 2006; Chang and Mandock 2019), while others claim that CLI plays a role (Lee and Iverson 2012).

Since studies looking at VOT duration and f0 value cannot reach a consensus, exploring a different and less explored perspective may offer a fresh viewpoint. In this study, we firstly discuss differences in the weights of VOT and f0 in stop differentiation between English monolinguals and Japanese-English 2L1 bilinguals. Here, weight refers to the extent to which an acoustic cue is utilized in English stop differentiation. Both Japanese and English use VOT as the primary cue and f0 as a secondary cue in stop differentiation. Thus, following CLI, Japanese-English 2L1 bilinguals are not supposed to differ significantly from English monolinguals in the weights of VOT and f0 in English stop differentiation. If a significant difference still emerges, it suggests that differences between them are outside CLI. However, only one kind of 2L1 bilingual may not be sufficient to give a convincing conclusion. Therefore, this paper also discusses Mandarin-English 2L1 bilinguals to further examine the conclusion drawn from the comparison between English monolinguals and Japanese-English 2L1 bilinguals. The choice of Mandarin-English 2L1 bilinguals is based on a difference between Mandarin and the other two languages: the role of f0 as a cue in stop differentiation is significantly diminished in Mandarin as f0 is involved in lexical tone differentiation (Francis et al. 2006; Gandour 1974; Hombert 1977); accordingly, Mandarin-English 2L1 bilinguals are predicted to put less weight on f0 in English stop production due to CLI from Mandarin.

This paper is structured as follows. Section 2 discusses the weight of VOT and f0 as stop differentiation cues in English and Japanese. Section 3 reports results concerning the weight of VOT and f0 in stop differentiation by 2L1 bilinguals in previous studies. Section 4 discusses the details concerning the participants, acoustic analysis, and predictions of this study. Section 5 gives the weight of VOT and f0 by English monolinguals and Japanese-English 2L1 bilinguals and discusses similarities and differences between them. Section 6 compares Mandarin-English 2L1 bilinguals with the same English monolinguals to further examine the conclusion drawn from Section 5. Section 7 compares the three groups and further discusses similarities and differences among them. Section 8 concludes the whole paper.

2 Weight of VOT and f0 in stop differentiation in English and Japanese

This section briefly discusses the weight of VOT and f0 in stop differentiation in English and Japanese respectively.

2.1 Weight of VOT and f0 in stop differentiation in English

It is generally agreed that VOT is the primary cue in distinguishing voiceless and voiced word-initial stops in English (Abramson and Lisker 1985; Lisker 1978; Raphael 2021), and that f0 is a secondary cue (Abramson and Lisker 1985; Dmitrieva et al. 2015; Francis et al. 2008; Schertz et al. 2020; Whalen et al. 1990, 1993; Yu 2022; but see Francis and Nusbaum 2002, Kwon 2014 arguing that f0 has only a marginal role, and Repp 1982, Shultz et al. 2012, Yu 2022 for the tradeoff relation between VOT and f0).

The voiceless-voiced distinction for English word-initial stop consonants is mainly realized as differences in VOT: the interval between a stop release and the onset of vocal fold vibration (Lisker and Abramson 1964). VOT is considered negative when voicing begins prior to the stop release. Conversely, VOT is considered positive when voicing starts after the stop release. When the voicing onset occurs 25 milliseconds (ms) or more before the release, it is termed as voicing lead (Laver 1994; Lisker and Abramson 1964). Short-lag VOT refers to instances where voicing onset begins approximately 20 ms or less after the release (Laver 1994; Lisker and Abramson 1964). In contrast, long-lag VOT indicates that the voicing onset occurs 25 ms or more after the stop release (Laver 1994; Lisker and Abramson 1964). In English, although both voiceless and voiced stops may have positive VOTs, voiceless stops belong to the long-lag category and voiced stops to the short-lag category (Klatt 1975; Kong et al. 2012; Lisker and Abramson 1964). In addition, voiced stops may also have voicing-lead VOT.

In terms of f0, the general pattern is that voiceless stops are associated with higher f0 and voiced stops with lower f0, irrespective of the specific phonetic realization of the voicing contrast (Dmitrieva et al. 2015; Hanson 2009; House and Fairbanks 1953; Lehiste and Peterson 1961; Ohde 1984; Whalen et al. 1993). This tendency has been found not only in English, but also in Burmese (Shimizu 1996), French (Kirby and Ladd 2016), German (Kohler 1982), Hindi (Shimizu 1996), Italian (Kirby and Ladd 2016), Japanese (Shimizu 1996), Spanish (Dmitrieva et al. 2015), Thai (Gandour 1974, Shimizu 1996), among other languages.

2.2 Weight of VOT and f0 in stop differentiation in Japanese

Japanese has both voiceless (/p/, /t/, /k/) and voiced stops (/b/, /d/, /g/) (Itoh et al. 1980; Shimizu 1989; Vance 1987). These stops are found only in word-initial and word-medial positions (Okada 1999; Vance 1987). Regarding the voiced stops, Itoh et al. (1980) report only negative VOTs for these consonants (see also Kobayashi 1981; Shimizu 1989, 1996, 1999). However, a recent trend in Japanese speakers born after the 1930s involves devoicing word-initial voiced stops. This results in two variants: a voiceless unaspirated variant with a short-lag VOT, and the other variant with voicing-lead VOT. In Tokyo Japanese, fewer than 25 % of voiced stops exhibit voicing-lead VOTs (Kong et al. 2012; Liu and Takeda 2024; Takada 2004, 2011). Similarly to English, VOT is the primary cue in distinguishing the voiceless stops from voiced ones in Japanese (Byun 2019; Gao and Arai 2019; Lisker 1978; Raphael 2021; Shimizu 1996, 1999).

F0 has been identified as a secondary cue for distinguishing between the two sets of stops in Japanese (Homma 1980; Shimizu 1993, 1996, 1999; Takada 2011; Takada et al. 2015). While debates persist regarding the precise f0 contours in various accent contexts, there is a general consensus that Japanese tends to exhibit high f0 at the onset of a vowel after a voiceless stop and low f0 after a voiced stop similarly to English (Gao et al. 2019; Kawasaki 1983; Mizuguchi and Tateishi 2018). The onset f0 in Japanese is shaped by the voicing contrast at the phonological level rather than by VOT, also similarly to English (Gao and Arai 2019; Ishihara 1998; Kawasaki 1983; Shimizu 1993, 1996, 1999; Takada 2011).

3 Previous studies on the weight of VOT and f0 in stop differentiation by 2L1 bilinguals

Previous studies have given mixed results concerning the weight of VOT and f0 in stop differentiation by 2L1 bilinguals. Some research claims that 2L1 bilinguals are similar to monolinguals in this respect. To exemplify, McCarthy et al. (2014) focus on Sylheti-English sequential bilingual children to examine whether any significant differences exist between them and monolingual English children in terms of perception and production of English bilabial and velar stops. McCarthy et al. (2014) report that, after having studied at school for one year, the bilingual children have no significant differences from English monolingual children in VOT either in terms of perception or production. Similarly, Pan et al. (2022) take English-Mandarin 2L1 bilingual children in Singapore as their participants and report that their perceived VOT boundaries for English and Mandarin are different. Their perception of VOT boundaries aligns with the typical patterns observed in monolingual speakers of each language, i.e., these 2L1 bilingual children have different phonetic categories for VOT in English and Mandarin. However, as McCarthy et al. (2014) and Pan et al. (2022) focus on young children, the question remains whether a similar conclusion can be drawn from 2L1 adult bilinguals.

Certain studies report mixed results from 2L1 bilinguals. For example, Schertz et al. (2020) compare English monolingual, Spanish monolingual, and early English-Spanish bilingual listeners in the perception of the voicing contrast in Spanish and English stops, and report that the early bilingual listeners use the three secondary cues, f0, first formant onset frequency, and stop closure duration, similarly in English and Spanish, while they perceive and interpret VOT differently depending on the language being spoken. This suggests that these bilinguals may have shown a certain level of CLI in the secondary cues in stop differentiation while having distinct perceptual boundaries for VOT in each of their languages. Surprisingly, very few studies have presented results exclusively supporting CLI, at least not as far as we understand. Since previous studies cannot agree, an experiment will be conducted to compare a Japanese-English 2L1 bilingual group with an English monolingual group.

4 Experiment 1: Japanese-English 2L1 bilinguals versus English monolinguals

All participants received the two word lists and text from the PAC project three weeks prior to recording to acquaint themselves with the material.[1] The word lists and text contain various phonological phenomena, such as word-initial voiceless and voiced stops, vowel length, and so on. The word lists include both monosyllabic and disyllabic words, but the majority of words with word-initial stops are monosyllabic. Hence, for this study, we extracted 64 monosyllabic words with word-initial stops, e.g., ‘bird’, ‘doll’, and ‘put’, from the lists. The text, entitled as Christmas Interview of a Television Evangelist, is a two-page passage adapted from a newspaper article. The text contains 94 related words. Labov (1972, 2001) suggests that speech style can influence language use (see also Eckert and Rickford 2001; Sato 2011; but see Sato 1985 for an opposing view). Our previous study also demonstrated that 2L1 bilinguals perform more akin to monolinguals when they can control their speech production carefully such as when they read isolated words (Liu and Takeda 2024). Therefore, we incorporated both word list reading and text reading tasks into this study and examined the results in these two speech styles separately. Intuitively, the word reading task gives the participants more control over their production, and text reading seems to be closer to natural speech production than word list reading.

Participants were directed to practice reading the word lists and text in their natural voice, at a pace that felt comfortable, ensuring fluency in both the words and text. Instructions emphasized the importance of repeating words or sentences in case of errors. Recordings took place using the Voice Memo Application on the second author’s iPhone in quiet rooms at the participants’ homes (Maryn et al. 2017; Uloza et al. 2021; van der Woerd et al. 2020). The recordings, originally in m4a format, were later converted to wav for acoustic analysis using Praat.

4.1 Participants

We recorded six monolingual American English speakers (three female and three male), all born and raised in the United States. Among which, three originated from California and the remaining three from Texas (hereafter the monolingual group). Additionally, six Japanese-American English 2L1 bilinguals (four female and two male; henceforth the JE group) were recorded. Born in the western part of Japan, they moved to America during infancy and spent their formative years there. Two completed their college education in America, while the other four did so in Japan. Despite being residents of Japan during recording, they are fluent in both Japanese and American English. All participants in the JE group acquired both languages naturally before the age of three, and are classified as 2L1 bilinguals (Genesee and Nicoladis 2007; McLaughlin 1978). To minimize the influence of diverse English accents, we exclusively selected American English speakers for the monolingual and JE groups. At the time of recording, all participants were approximately 30–35 years old.

4.2 Acoustic analysis

The first author segmented and annotated the recordings using speech waveforms and spectrograms in Praat, following the guidelines in Lisker and Abramson (1964), Lavoie (2001), and Riney et al. (2007). For voiceless stops, the interval between the stop release and the onset of glottal vibration marked the VOT. Oral closure was identified by the absence of acoustic energy in the formant frequency band, while oral release was indicated by a rapid onset of energy in the formant frequency range. Regarding voiced stops, their VOT values were measured as negative if voicing preceded the release. Stops without audible glottal vibration in word-initial positions during text reading were omitted since it was difficult for us to determine the exact onsets of this kind of stops and thus difficult to measure the exact VOT durations. The stops focused on were word-initial /p/, /t/, /k/, /b/, /d/, and /g/. F0, measured in Hertz, was determined at the onset of the following vowel of a word-initial stop on narrowband spectrograms in Praat with a 0.025 window length. Tokens without a pitch track at the vowel onset were excluded from consideration since it was impossible to measure their onset f0s. In total, 97 data points in the word reading task and 69 data points in the text reading task were excluded from consideration. The final dataset consisted of 1,730 samples, 885 from the monolingual group and 845 from the JE group. The breakdown was as follows: (i) in the word list reading task, 339 tokens from the monolingual group and 332 from the JE group; and (ii) in the text reading task, 546 and 513 from the two groups respectively.

4.3 Prediction of the present study

Both English and Japanese use VOT as the primary cue and f0 as the secondary cue in word-initial stop differentiation to distinguish voiceless stops from voiced ones. According to CLI, the English speech production of the JE group may be influenced by Japanese. However, since the two languages rely on VOT and f0 in stop differentiation similarly, it was predicted that the monolingual and JE groups have no statistically significant differences in terms of the weights of VOT and f0 in stop differentiation in both the word list and text reading tasks.

5 Statistical analysis: JE group versus monolingual group

Two logistic regression model tests are carried on data from the word list and text reading tasks respectively. The alpha level is set at 0.05. In both tests, the dependent variable is voicedness (voiceless, voiced), with voiceless as the base level. The independent variables are group (monolingual group, JE group; with the monolingual group as the reference level), VOT duration, f0, VOT_group, and f0_group. The variables VOT duration and f0 include VOT duration data from both groups and f0 data from both groups. The variable VOT_group was generated using the Compute Variable function on the Transform menu in the SPSS Statistics for Windows software (IBM 2020; hereafter SPSS). This involved combining information from the existing columns VOT duration and group in the dataset. This variable combines two key pieces of information: VOT duration and group. Therefore, it reflects not only the duration of VOT for each participant but also their respective group. The variable VOT_group allows for a clearer analysis of how VOT durations differ between the monolingual and JE groups. Similarly, the f0_group variable was calculated by combining f0 and group affiliation together using the Compute Variable function on the Transform menu in the SPSS, enabling a comprehensive examination of f0 in relation to the participants’ group membership.

5.1 Weights of VOT and f0 in word list reading

Table 1 presents the results for the logistic regression model carried on the data of the word list reading task from the monolingual and JE groups.

Table 1:

The logistic regression model for the word list reading task.

Word list Omnibus tests of model coefficients
Chi-square df Sig.
Model 440.692 5 <0.001
Variables in the equation
Level B S.E. Wald df Sig.
Group Monolingual

JE bilingual
6.314 1.803 12.260 1 <0.001
VOT duration Numeric data −97.185 11.557 70.721 1 <0.001
F0 Numeric data −0.027 0.006 18.096 1 <0.001
VOT_Group Numeric data −68.962 12.039 32.815 1 <0.001
F0_Group Numeric data −0.013 0.007 3.173 1 0.075
Constant 10.496 1.705 37.883 1 <0.001

In Table 1, the Omnibus tests of model coefficients indicate that at least one of the independent variables is statistically significantly correlated with the dependent variable voicedness in the word list reading task (p < 0.001). The variables of group, VOT duration, f0, and VOT_group are statistically significantly correlated with voicedness (p < 0.001, p < 0.001, p < 0.001, p < 0.001). In terms of the variable group, the positive coefficient suggests that the JE group tends to produce stops with characteristics that are more indicative of voicing compared to the monolingual group (B = 6.314). This implies a systematic difference in the speech production patterns between the monolingual and JE groups, particularly regarding the differentiation of stops based on VOT and f0. The negative coefficient value of VOT duration suggests that a longer VOT is associated with a decreased likelihood of the stop being produced as voiced (B = −97.185). This finding aligns with the phonetic characteristics of English, where voiceless stops typically exhibit longer VOTs compared to voiced stops. The negative coefficient value of f0 indicates that stops followed with higher f0s are less likely to be produced as voiced stops, which aligns with the tendency that f0s following voiceless stops are generally higher than those following voiced stops (B = −0.027). The negative coefficient of the VOT_group variable indicates that VOT has a stronger influence on stop classification in the monolingual group compared to the JE group (B = −68.962): VOT has a heavier weight as a cue in differentiating the stops within the monolingual group’s speech production compared to the JE group in the word list reading task. The variable f0_group has not emerged as statistically significant (p = 0.075), which suggests that the two groups do not differ statistically significantly in using f0 as a cue to differentiate English stops in word list reading.

5.2 Weights of VOT and f0 in text reading

Table 2 presents the results for the logistic regression model based on the data of the text reading task from the monolingual and JE groups.

Table 2:

The logistic regression model for the text reading task.

Text Omnibus tests of model coefficients
Chi-square df Sig.
Model 1,095.590 5 <0.001
Variables in the equation
Level B S.E. Wald df Sig.
Group Monolingual

JE bilingual
−0.010 1.188 0.000 1 0.993
VOT duration Numeric data −155.816 16.742 86.621 1 <0.001
F0 Numeric data 0.004 0.004 0.654 1 0.419
VOT_Group Numeric data −16.424 21.471 0.585 1 0.444
F0_Group Numeric data 0.011 0.005 3.904 1 0.048
Constant 5.131 0.997 26.502 1 <0.001

Table 2 shows that the variables VOT duration and f0_group have statistically significant correlations with stop differentiation in the text reading task (p < 0.001, p = 0.048). In terms of VOT duration, the longer the VOT duration of a stop, the more likely it is produced as voiceless, in line with the general tendency of VOT duration in English (B = −155.816). The variable VOT_group has not emerged as statistically significant (p = 0.444). This indicates that the two groups do not have a large difference in the weight of VOT in distinguishing stops in text reading. Plainly, VOT does not significantly differentiate stop production between the monolingual and JE groups in the context of text reading. The variable f0_group has a significant correlation with stop differentiation (p = 0.048), suggesting that f0 has a different weight as a cue to distinguish the stops in the two groups: f0 has a heavier weight in the JE group than in the monolingual group in the text reading task (B = 0.011).

5.3 Comparison of Tables 1 and 2

A comparison of Table 1 with Table 2 gives differences in cue weighting in different tasks in the two groups: (i) the monolingual group utilizes VOT as a cue more than the JE group in the word list reading task, while the JE group utilizes f0 more in text reading; and (ii) the two groups do not have significant differences in the weight of the f0 cue in word list reading and the weight of the VOT cue in text reading. This demonstrates that significant differences as well as some similarities exist between the monolinguals and JE 2L1 bilinguals.

Liu and Takeda (2024) have noted that the monolingual and JE groups do not have significant differences in VOT duration in the word list reading task. The present study reports that they have significant differences in terms of the weight of the VOT cue in word list reading. The difference in the weighting of the VOT cue between the monolingual and JE groups should not be attributed to CLI. Both Japanese and English use VOT as the primary cue and f0 as a secondary cue for distinguishing voiceless and voiced stops. The VOT durations in Japanese are generally shorter than those in English (Chodroff and Wilson 2017; Gao and Arai 2019; Hunnicutt and Morris 2016; Lisker and Abramson 1964; Shimizu 1996). If CLI were the main or sole factor, statistically significant differences in VOT durations should have been witnessed between the two groups, rather than in the weighting of VOT as a cue. Therefore, the difference in VOT weighting is more likely due to bilinguals’ heightened sensitivity to VOT as a cue, rather than CLI from the phonetic system of Japanese. The difference in VOT weighting could indicate that the JE group is more attuned to utilizing VOT as a cue for stop differentiation, even though their VOT durations remain comparable to those of the monolingual group. This could result from the need to maintain a clear contrast between the two languages that both rely on VOT as the primary cue by slightly allocating more cognitive resources or articulatory control toward relying on VOT as the primary cue for English stop differentiation. In short, the strategy chosen by the JE group seems to result from efficiency, as they adapt their use of VOT as a cue to achieve distinctiveness between the two languages.

In text reading, Liu and Takeda (2024) have shown that the VOT duration difference between the monolingual and JE groups is smaller than that in word list reading. This reduction in the VOT duration difference in text reading may make it challenging for the JE group to heavily depend on VOT to differentiate between voiceless and voiced stops. Consequently, they shift their reliance more towards f0 as a cue in text reading.

Liu and Takeda (2024) have reported that the JE group is never completely the same as the monolingual group either in terms of VOT duration, VOT polarity, or f0 value: the JE group has employed an extreme approach to positive VOTs, where they must distinguish among all six stops in Japanese and English, though they adopt an intermediate approach to negative VOTs, where they just need to distinguish the three voiced stops. In the present study, the JE group has different weights for VOT or f0 compared with the monolingual group. However, the JE group has never assigned statistically significant different weights to both VOT and f0 in either word list reading or text reading. It appears that efficiency can explain the choice of strategy by the JE group: emphasizing one cue over the other is enough to distinguish the stops. Therefore, the JE group has only assigned a different weight from the monolingual group to one cue in each task.

In summary, while CLI may play a role in bilingual language processing, it does not sufficiently explain the differences observed in this study. The variations in cue weighting of VOT and f0 appear to stem more from the bilinguals’ adaptive strategies aimed at efficiency in managing phonetic distinctions, rather than from direct CLI from the phonetic system of Japanese. Initially, it was predicted that there were no statistically significant differences in the utilization of VOT and f0 cues for stop differentiation between the monolingual and JE groups. However, our results do not support this prediction. As a result, this suggests that there may be noteworthy distinctions in speech production strategies between monolinguals and 2L1 bilinguals, even in areas where the two languages exhibit similar phonetic patterns. One plausible explanation appears to be that 2L1 bilinguals need to distinguish between their two languages, leading them to make differences in areas where the two languages are similar to each other. In the subsequent sections, we will compare another type of 2L1 bilingual with the monolingual and JE groups to further examine the conclusion drawn in this section.

6 Experiment 2: Mandarin-English 2L1 bilinguals versus English monolinguals

In this experiment, we focus on Mandarin-American English 2L1 bilinguals for the following two reasons: (i) in contrast to English and Japanese, voiced stops are not in the Mandarin phonemic inventory; and (ii) the role of f0 as a cue in stop differentiation is significantly diminished in Mandarin as f0 is involved in lexical tone cuing. Mandarin does not have voiced stops in its phonetic inventory. Instead, it has two series of voiceless stops distinguished by aspiration: voiceless unaspirated /p, t, k/ and voiceless aspirated /ph, th, kh/. The role of f0 in differentiating stops is significantly diminished in Mandarin compared with that in English and Japanese since f0 is involved in cuing lexical tones (Francis et al. 2006; Gandour 1974; Hombert 1977).

Concerning the Mandarin-American English 2L1 bilingual group (henceforth the ME group), we make the following two predictions, based on the concept of efficiency, the keyword from the results of Experiment 1. Firstly, we predict that the ME group relies less heavily on VOT than the monolingual group as a cue for differentiating voiceless and voiced stops in English, as this is an efficient strategy for distinguishing between Mandarin and English. Secondly, we predict that the ME group has no statistically significant differences in f0 cue utilization compared to the monolingual group. Mandarin does not depend so much on f0 to distinguish stops, so the ME group does not need to rely less on f0 than the monolingual group to distinguish their two languages. Conversely, if significant differences in VOT and f0 as cues are observed between the monolingual and ME groups, specifically if the ME group relies more on VOT and/or less on f0 than the monolingual group, it may indicate an influence from Mandarin, suggesting CLI in their cue weighting.

6.1 Participants and acoustic analysis

Similar to the experiment in Section 4, the same word lists and text from the PAC project were given to participants three weeks before recording for them to familiarize themselves with the material. Six 2L1 bilingual speakers of Mandarin and American English, four female and two male, were recorded (the ME group). They were all born in China and moved to America as infants. They were also residents of California and around 30–35 years old at the time of recording. All of them are college graduates. Procedures of recording, segmentation, and acoustic analysis are the same as those for the first experiment in Section 4.

6.2 Statistical analysis: ME group versus monolingual group

Similarly to Section 5, two logistic regression model tests are conducted on word list reading and text reading data respectively. The dependent variable is still voicedness (voiceless, voiced), with voiceless as the reference level. The independent variables are group (monolingual group, ME group; with the monolingual group as the reference level), VOT duration, f0, VOT_group, and f0_group. The data from the monolingual group are the same as those used in the statistical analysis in Section 5. The variables VOT_group and f0_group were also generated using the Compute Variable function on the Transform menu in SPSS and involved combined information from the columns VOT duration or f0 value and group. Altogether, 826 samples were collected from the ME group, 338 from word list reading and 488 from text reading. In the following, Tables 3 and 4 respectively report results of the logistic regression model for the data of word list reading and of text reading.

Table 3:

The logistic regression model for word list reading.

Word list Omnibus tests of model coefficients
Chi-square df Sig.
Model 447.357 5 <0.001
Variables in the equation
Level B S.E. Wald df Sig.
Group Monolingual

ME bilingual
0.585 0.330 3.140 1 0.076
VOT duration Numeric data −1.072 8.716 0.015 1 0.902
F0 Numeric data 0.002 0.001 1.356 1 0.244
VOT_Group Numeric data −27.151 7.878 11.877 1 <0.001
F0_Group Numeric data 0.002 0.001 1.356 1 0.244
Constant 3.597 0.727 24.475 1 <0.001
Table 4:

The logistic regression model for text reading.

Text Omnibus tests of model coefficients
Chi-square df Sig.
Model 1,062.046 5 <0.001
Variables in the equation
Level B S.E. Wald df Sig.
Group Monolingual

ME bilingual
0.461 0.198 5.415 1 0.020
VOT duration Numeric data −127.615 15.492 67.853 1 0.000
F0 Numeric data −0.007 0.004 4.010 1 0.045
VOT_Group Numeric data −11.777 4.372 7.257 1 0.007
F0_Group Numeric data 0.000 0.001 0.019 1 1.000
Constant 4.680 0.742 39.813 1 <0.001

The logistic regression model for word list data in Table 3 demonstrates that the VOT_group variable emerges as the sole statistically significant variable (p < 0.001), suggesting that it has a heavier weight in stop differentiation in the monolingual group than in the ME group (B = −27.151), which conforms to our prediction. There are no significant differences in terms of f0_group (p = 0.244). Our provisional explanation is that since VOT has a heavier weight than f0 in Mandarin, the ME group relies on VOT less in English stop differentiation to make the distinguishing of Mandarin and English easier and more efficient. The two groups do not have a significant difference in the weight of f0 as a cue although f0 has a smaller role in stop differentiation in Mandarin. This is not explicable using CLI: if CLI is the main factor here, the ME group is expected to use f0 as a cue less than the monolingual group. However, this result seems to be in line with efficiency: since f0 is not a major cue for stop differentiation in Mandarin, relying on f0 in English stop differentiation as much as the monolingual group will not require unnecessary adjustments. In Table 4, we will take a close look at data from text reading.

In Table 4, the variables group, VOT duration, f0, and VOT_group have given statistically significant differences (p = 0.020, 0.000, 0.045, 0.007). The positive coefficient of the variable group indicates that the ME group tends to produce stops that show more signs of voicing compared to the monolingual group in text reading (B = 0.461). This implies a systematic difference in distinguishing stops based on VOT and f0 between the ME and monolingual groups. The negative coefficient value of VOT implies that a stop with a long VOT is not likely to be produced as a voiced stop, consistent with the observation that voiceless stops in English generally have longer VOTs than the voiced stops (B = −127.615). In a similar vein, the negative coefficient of f0 indicates that stops followed by higher f0s are less likely to be produced as voiced stops, in alignment with the tendency that f0s following voiceless stops are higher than those following voiced stops (B = −0.007). The negative coefficient value for the VOT_group variable indicates that the ME group exhibits a different pattern in the production of stops compared to the monolingual group, with potentially reduced reliance on VOT as a distinguishing cue (B = −11.777). In addition, the variable f0_group has been rejected as a statistically significant variable (p = 1.000). This suggests that the ME and monolingual groups do not have a statistically significant difference in terms of the weight of f0 as a cue in English stop differentiation in text reading.

A comparison of Tables 3 and 4 firstly gives a clear contrast between the results from word list and text reading respectively: while only the VOT_group variable has given a statistically significant result for data from word list reading in Table 3, four variables have emerged as statistically significant for data from text reading in Table 4. This can be explained by the fact that the ME group can meticulously regulate their speech when reading word lists, resembling monolingual speakers more closely. Conversely, because maintaining such precise control during text reading is challenging, greater distinctions between the monolingual and ME groups emerged. Both Tables 3 and 4 suggest that the two groups differ in the weights of VOT as a cue in stop differentiation, in line with our prediction based on the concept of efficiency. The negative coefficients of the VOT_group variable in both Tables 3 and 4 suggest that the monolingual group relies more on VOT than the ME group in both word list reading and text reading tasks (B = −27.151, −11.777). The two groups do not have a significant difference in the weight of f0 in either word list or text reading (p = 0.244, 1.000).

Looking at Tables 3 and 4 together, although VOT is a major cue for stop differentiation in both Mandarin and English, the ME group places less emphasis on VOT than the monolingual group does. Furthermore, while the role of f0 in stop differentiation is notably reduced in Mandarin, the ME group does not show significant differences from the monolingual group in how they use f0 as a cue in either word list or text reading in English. These results from the ME group, concerning both VOT and f0, seem to challenge CLI: if CLI were the primary factor, the ME group would be expected to rely on VOT more than the monolingual group and to use f0 less than the monolingual group. Instead, the concept of efficiency seems to offer a more fitting explanation: like the JE group, the ME group differs from the monolingual group in the use of only one cue in both word list and text reading, suggesting that this distinction is sufficient to differentiate their two languages. More specifically, the ME group relies less on VOT because they can utilize f0 just as much as the monolingual group does, reducing the need to heavily depend on VOT. In this context, efficiency seems to be the key principle guiding their speech production.

Furthermore, similar to the results in Tables 1 and 2, the results in Tables 3 and 4 suggest that there are differences between the monolingual and ME groups, while also revealing similarities between them. Neither the JE nor ME group differs from the monolingual group in both the weights of VOT and f0. While differences may appear in either VOT or f0, they do not occur in both at the same time. This holds true for both word list and text reading tasks.

7 Comparison of the monolingual, JE, and ME groups

Similar to English, f0 is a secondary cue in stop differentiation in Japanese. As a result, we have predicted that the monolingual and JE groups do not differ significantly in the weights of VOT and f0. However, the two groups differ significantly in VOT weighting in word list reading as given in Table 1, and in f0 weighting in text reading as given in Table 2. Following the conclusion from the comparison between the monolingual and JE groups, it was hypothesized that the ME group has a less weight in VOT compared to the monolingual group and a similar weight in f0 as the monolingual group in English stop differentiation, as the role of f0 in stop differentiation in Mandarin is relatively small. Our results suggest that the two groups differ in the weight of VOT as a cue in both word list and text reading and do not differ in the weight of f0 either in word list or text reading as we have predicted. CLI cannot explain the differences between the two 2L1 bilingual groups and the monolingual group just noted. Efficiency appears to explain at least part of the puzzle here: the 2L1 bilingual groups need to distinguish their two first languages, so they use VOT or f0 as a cue differently from the monolingual group. Specifically, the role of VOT and f0 in stop differentiation in Japanese is similar to that in English. Therefore, the JE group has to differ from the monolingual group in at least one cue to distinguish between Japanese and English. This explains why the JE group differs from the monolingual group in the weight of VOT as a cue in word list reading and in the weight of f0 as a cue in text reading. Since f0 only has a relatively small role in stop differentiation in Mandarin, it is relatively easy for the ME group to distinguish English from Mandarin by relying on f0 as a cue for English stop differentiation like the monolingual English group does. This may explain why the ME group does not differ from the monolingual group in the weight of f0 either in word list or text reading. Since VOT is the primary cue in Mandarin stop differentiation, the ME group relies on VOT less in English stop differentiation.

One-way ANOVA with a post hoc Bonferroni Correction test was also conducted on the VOT duration data in word list reading from the monolingual, JE, and ME groups: the differences between the monolingual and JE groups and between the JE and ME groups are not significant (p = 0.12, 0.086), while the difference between the monolingual and ME groups is (p < 0.001). One-way ANOVA with a post hoc Bonferroni Correction test carried out on the VOT duration data in text reading gives significant differences between the monolingual and JE groups and between the monolingual and ME groups (p < 0.001, p < 0.001), but not between the JE and ME groups (p = 0.491). Overall, the two 2L1 bilingual groups differ less from the monolingual group in word list reading than in text reading in terms of VOT in English stop differentiation. The smaller differences observed in the word list reading task indicate that the three groups can regulate their speech production more similarly in this task.

8 Conclusions

VOT and f0 as cues in stop differentiation have similar weights in English and Japanese. Therefore, the JE group was predicted to perform similarly to the monolingual English group in the weights of VOT and f0 following the assumption of CLI. On the other hand, f0 has different weights in stop differentiation between English and Mandarin. Thus, following the assumption of CLI, the ME group was expected to differ from the monolingual English group in terms of the weight of f0. Our results, however, were not in conformity with the predictions based on CLI.

Efficiency may be a key word here: (i) the JE and ME groups differ from the monolingual group in certain respects, while neither the JE nor ME group differs from the monolingual group in all respects; (ii) the JE group differs from the monolingual group only in one cue both in word list and text reading, not in both cues simultaneously; and (iii) the ME group does not differ significantly from the monolingual group in f0 as a cue either in word list or text reading. Contrary to the predictions based on CLI, the JE group differs from the monolingual group in one cue both in word list and text reading. Our explanation is that the JE group needs to differentiate the two languages and thus uses VOT or f0 differently from the monolingual group, although VOT and f0 has similar weights in stop differentiation in English and Japanese. Additionally, the ME group makes use of f0 as a cue in English stop differentiation as much as the monolingual group does. This appears to be due to the diminished role of f0 in Mandarin: it is easy for the ME group to distinguish English and Mandarin by relying on f0 more as a cue in English.

This study may also challenge the role of CLI since it appears to explain less than previously thought regarding this small aspect in the speech production of 2L1 bilinguals. Further research exploring different types of 2L1 bilinguals and investigating various aspects of speech production will help elucidate the intricacies of language processing efficiency and its implications for 2L1 bilingual speech production. It is premature to rush into rejecting CLI completely and claim that efficiency is the sole element in explaining the speech production pattern by 2L1 bilinguals based on the observation in this study alone. Further research in other types of 2L1 bilinguals and different aspects of speech production awaits.


Corresponding author: Sha Liu, Fukuoka Institute of Technology, Fukuoka, Japan, E-mail:

Funding source: This work was funded by Scholarship Fund for Women Researchers (2025) from the Promotion and Mutual Aid Corporation for Private Schools of Japan.

Acknowledgments

For help in getting this article to its final form, our special gratitude goes to Prof. Jacques Durand for his advice on acoustic analysis, to Prof. Eiji Yamada for advice and discussion, to Prof. Stephen Mark Howe and Prof. Long III Robert William for editing our paper, and to the anonymous reviewers and the editors of the present journal for detailed and helpful feedback. All remaining errors are our own responsibility.

  1. Research funding: This work was funded by Scholarship Fund for Women Researchers (2025) from the Promotion and Mutual Aid Corporation for Private Schools of Japan.

References

Abramson, Arthur S. & Leigh Lisker. 1985. Relative power of cues: F0 shift versus voice timing. In Victoria Fromkin (ed.), Phonetic linguistics: Essays in honor of Peter Ladefoged, 25–33. Orlando, Florida: Academic.Search in Google Scholar

Altenberg, Evelyn P. & Carole T. Ferrand. 2006. Fundamental frequency in monolingual English, bilingual English/Russian, and bilingual English/Cantonese young adult women. Journal of Voice 20(1). 89–96. https://doi.org/10.1016/j.jvoice.2005.01.005.Search in Google Scholar

Byun, Hi-Gyung. 2019. Acoustic characteristics for Japanese stops in word-initial position: VOT and post-stop f0. The Study of Sounds 23. 174–197.Search in Google Scholar

Caramazza, Alfonso, Grace H. Yeni-Komshian, Edgar B. Zurif & Ettore Carbone. 1973. The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals. Journal of the Acoustical Society of America 54(2). 421–428. https://doi.org/10.1121/1.1913594.Search in Google Scholar

Chang, Seung-Eun & Karina Mandock. 2019. A phonetic study of Korean heritage learners’ production of Korean word-initial stops. Heritage Language Journal 16(3). 273–295. https://doi.org/10.46538/hlj.16.3.1.Search in Google Scholar

Chodroff, Eleanor & Colin Wilson. 2017. Structure in talker-specific phonetic realization: Covariation of stop consonant VOT in American English. Journal of Phonetics 61. 30–47. https://doi.org/10.1016/j.wocn.2017.01.001.Search in Google Scholar

Dmitrieva, Olga, Fernando Llanos, Amanda A. Shultz & Alexander L. Francis. 2015. Phonological status, not voice onset time, determines the acoustic realization of onset f0 as a secondary voicing cue in Spanish and English. Journal of Phonetics 49. 77–95. https://doi.org/10.1016/j.wocn.2014.12.005.Search in Google Scholar

Eckert, Penelope & John R. Rickford (eds.). 2001. Style and sociolinguistic variation. Cambridge: Cambridge University Press.10.1017/CBO9780511613258Search in Google Scholar

Flege, James Emil. 1991. Perception and production: The relevance of phonetic input to L2 phonological learning. In Thom Huebner & Charles A. Ferguson (eds.), Cross currents in second language acquisition and linguistic theory, 249–289. Philadelphia: John Benjamins.10.1075/lald.2.15fleSearch in Google Scholar

Flege, James Emil & Robert Port. 1980. Cross-language phonetic interference from Arabic to English. Research in Phonetics 1. 99–137.Search in Google Scholar

Francis, Alexander L. & Howard C. Nusbaum. 2002. Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance 28(2). 349–366. https://doi.org/10.1037/0096-1523.28.2.349.Search in Google Scholar

Francis, Alexander L., Valter Ciocca, Virginia Ka Man Wong & Jess Ka Lam Chan. 2006. Is fundamental frequency a cue to aspiration in initial stops? Journal of the Acoustical Society of America 120(5). 2884–2895. https://doi.org/10.1121/1.2346131.Search in Google Scholar

Francis, Alexander L., Natalya Kaganovich & Courtney Driscoll-Huber. 2008. Cue-specific effects of categorization training on the relative weighting of acoustic cues to consonant voicing in English. Journal of the Acoustical Society of America 124. 1234–1251. https://doi.org/10.1121/1.2945161.Search in Google Scholar

Gandour, Jack. 1974. Consonant types and tone in Siamese. Journal of Phonetics 2(4). 337–350. https://doi.org/10.1016/s0095-4470(19)31303-8.Search in Google Scholar

Gao, Jiayin & Takayuki Arai. 2019. Plosive (de-) voicing and f0 perturbations in Tokyo Japanese: Positional variation, cue enhancement, and contrast recovery. Journal of Phonetics 77. 100932. https://doi.org/10.1016/j.wocn.2019.100932.Search in Google Scholar

Gao, Jiayin, Jihyeon Yun & Takayuki Arai. 2019. VOT-F0 coarticulation in Japanese: Production-biased or misparsing? In Proceedings of the 19th international congress of phonetic sciences, 210–214.Search in Google Scholar

Genesee, Fred & Elena Nicoladis. 2007. Bilingual first language acquisition. In Erika Hoff & Marilyn Shatz (eds.), Blackwell handbook of language development, 324–342. Oxford: Blackwell.10.1002/9780470757833.ch16Search in Google Scholar

Hanson, Helen M. 2009. Effects of obstruent consonants on fundamental frequency at vowel onset in English. Journal of the Acoustical Society of America 125(1). 425–441. https://doi.org/10.1121/1.3021306.Search in Google Scholar

Hazan, Valerie L. & Georges Boulakia. 1993. Perception and production of a voicing contrast by French-English bilinguals. Language and Speech 36. 17–38. https://doi.org/10.1177/002383099303600102.Search in Google Scholar

Hombert, Jean-Marie. 1977. Consonant types, vowel height and tone in Yoruba. Studies In African Linguistics 8(2). 173–190.Search in Google Scholar

Homma, Yayoi. 1980. Voice onset time in Japanese stops. The Bulletin : The Phonetic Society of Japan 163. 7–9.Search in Google Scholar

House, Arthur S. & Grant Fairbanks. 1953. The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America 25(1). 105–113. https://doi.org/10.1121/1.1906982.Search in Google Scholar

Hunnicutt, Leigh & Paul A. Morris. 2016. Prevoicing and aspiration in Southern American English. University of Pennsylvania Working Papers in Linguistics 22(1). 215–224.Search in Google Scholar

IBM. 2020. IBM SPSS statistics for windows, Version 27.0. Armonk, NY: IBM.Search in Google Scholar

Ishihara, Shunichi. 1998. Independence of consonantal voicing and vocoid f0 perturbation in English and Japanese. In The proceedings of the fifth international conference on spoken language processing, 3107–3110.10.21437/ICSLP.1998-800Search in Google Scholar

Itoh, Motonobu, Sumiko Sasanuma, Itaru F. Tatsumi, Shuko Hata, Yoko Fukusako & Tsutomu Suzuki. 1980. Voice onset time characteristics of apraxia of speech part II. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics 14. 273–284.Search in Google Scholar

Kawasaki, Haruko. 1983. Fundamental frequency perturbation caused by voiced and voiceless stops in Japanese. Journal of the Acoustical Society of America 73(S1). S88. https://doi.org/10.1121/1.2020616.Search in Google Scholar

Kirby, James & D. Robert Ladd. 2016. Effects of obstruent voicing on vowel F0: Evidence from true voicing languages. Journal of the Acoustical Society of America 140(4). 2400–2411. https://doi.org/10.1121/1.4962445.Search in Google Scholar

Klatt, Dennis H. 1975. Voice onset time, frication, and aspiration in word-initial consonant clusters. Journal of Speech & Hearing Research 18(4). 686–706. https://doi.org/10.1044/jshr.1804.686.Search in Google Scholar

Kobayashi, Takashi. 1981. Nihongo no goto haretsuon no VOT [VOT of word-initial stops in Japanese]. Gengo Bunka Kenkyujo 7. 149–157.Search in Google Scholar

Kohler, Klaus J. 1982. F0 in the production of lenis and fortis plosives. Phonetica 39(4–5). 199–218. https://doi.org/10.1159/000261663.Search in Google Scholar

Kong, Eun Jong, Mary E. Beckman & Jan Edwards. 2012. Voice onset time is necessary but not always sufficient to describe acquisition of voiced stops: The cases of Greek and Japanese. Journal of Phonetics 40(6). 725–744. https://doi.org/10.1016/j.wocn.2012.07.002.Search in Google Scholar

Kwon, Nahyun. 2014. Acoustic observation for English speakers’ perception of a three-way laryngeal contrast of Korean stops. In Proceedings of the 44th conference of the Australian linguistic society, 58–76.Search in Google Scholar

Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.Search in Google Scholar

Labov, William. 2001. The anatomy of style-shifting. In Penelope Eckert & John Rickford (eds.), Style and sociolinguistic variation, 85–108. Cambridge: Cambridge University Press.10.1017/CBO9780511613258.006Search in Google Scholar

Laeufer, Christiane. 1997. Towards a typology of bilingual phonological systems. In Allan James & Jonathan Leather (eds.), Second-language speech: Structure and process, 325–342. Berlin: Mouton de Gruyter.10.1515/9783110882933.325Search in Google Scholar

Laver, John. 1994. Principles of phonetics. Cambridge: Cambridge University Press.10.1017/CBO9781139166621Search in Google Scholar

Lavoie, Lisa M. 2001. Consonant strength: Phonological patterns and phonetic manifestations. New York: Garland Pub.Search in Google Scholar

Lee, Sueanns & Gregory Iverson. 2012. Stop consonant of English-Korean bilingual children. Bilingualism: Language and Cognition 15(2). 275–287. https://doi.org/10.1017/S1366728911000083.Search in Google Scholar

Lehiste, Ilse & Gordon E. Peterson. 1961. Some basic considerations in the analysis of intonation. Journal of the Acoustical Society of America 33(4). 419–425. https://doi.org/10.1121/1.1908681.Search in Google Scholar

Lisker, Leigh. 1978. In qualified defense of VOT. Language and Speech 21(4). 375–383. https://doi.org/10.1177/002383097802100413.Search in Google Scholar

Lisker, Leigh & Arthur S. Abramson. 1964. A cross-language study of voicing in initial stops: Acoustical measurements. Word 20. 384–422. https://doi.org/10.1080/00437956.1964.11659830.Search in Google Scholar

Liu, Sha & Kaye Takeda. 2024. VOT in English by bilinguals with 2L1s: Different approaches to voiceless and voiced stops. Folia Linguistica 58(2). 327–360. https://doi.org/10.1515/flin-2024-2014.Search in Google Scholar

MacLeod, Andrea A. N. & Carol Stoel-Gammon. 2005. Are bilinguals different? What VOT tells us about simultaneous bilinguals. Journal of Multilingual Communication Disorders 3(2). 118–127. https://doi.org/10.1080/14769670500066313.Search in Google Scholar

Magloire, Joël & Kerry P. Green. 1999. A cross-language comparison of speaking rate effects on the production of voice onset time in English and Spanish. Phonetica 56. 158–185. https://doi.org/10.1159/000028449.Search in Google Scholar

Maryn, Youri, Femke Ysenbaert, Andrzej Zarowski & Robby Vanspauwen. 2017. Mobile communication devices, ambient noise, and acoustic voice measures. Journal of Voice 31(2). 248.e11–248.e23. https://doi.org/10.1016/j.jvoice.2016.07.023.Search in Google Scholar

McCarthy, Kathleen M., Merle Mahon, Stuart Rosen & Bronwen G. Evans. 2014. Speech perception and production by sequential bilingual children: A longitudinal study of voice onset time acquisition. Child Development 85(5). 1965–1980. https://doi.org/10.1111/cdev.12275.Search in Google Scholar

McLaughlin, Barry. 1978. Second-language acquisition in childhood. Mahwah, NJ: Lawrence Earlbaum Associates.Search in Google Scholar

Mizuguchi, Shinobu & Koichi Tateishi. 2018. Focus prosody in Japanese reconsidered. Proceedings of the Linguistic Society of America 3(12). 1–15. https://doi.org/10.3765/plsa.v3i1.4291.Search in Google Scholar

Nathan, Geoffrey S., Warren Anderson & Budsaba Budsayamongkon. 1987. On the acquisition of aspiration. In Georgette Ioup & Stephen Weinberger (eds.), Interlanguage phonology, 204–212. Rowley, MA: Newbury House.Search in Google Scholar

Odlin, Terence. 1989. Language transfer. Cambridge: Cambridge University Press.10.1017/CBO9781139524537Search in Google Scholar

Ohde, Ralph N. 1984. Fundamental frequency as an acoustic correlate of stop consonant voicing. Journal of the Acoustical Society of America 75(1). 224–230. https://doi.org/10.1121/1.390399.Search in Google Scholar

Okada, Hideo. 1999. Japanese. In Handbook of the international phonetic association, 117–119. Cambridge: Cambridge University Press.Search in Google Scholar

Olson, Daniel J. 2016. The role of code-switching and language context in bilingual phonetic transfer. Amsterdam: John Benjamins.10.1017/S0025100315000468Search in Google Scholar

Pan, Lei, Han Ke & Suzy J. Styles. 2022. Early linguistic experience shapes bilingual adults’ hearing for phonemes in both languages. Scientific Reports 12(1). 4703. https://doi.org/10.1038/s41598-022-08557-7.Search in Google Scholar

Raphael, Lawrence J. 2021. Acoustic cues to the perception of segmental phonemes. In Jennifer S. Pardo, Lynne C. Nygaard, Robert E. Remez & David B. Pisoni (eds.), The handbook of speech perception, 603–631. New Jersey: Wiley Blackwell.10.1002/9781119184096.ch22Search in Google Scholar

Repp, Bruno H. 1982. Phonetic trading relations and context effects: New experimental evidence for a speech mode of perception. Psychological Bulletin 92(1). 81–110. https://doi.org/10.1037/0033-2909.92.1.81.Search in Google Scholar

Riney, Timothy James, Naoyuki Takagi, Kaori Ota & Yoko Uchida. 2007. The intermediate degree of VOT in Japanese initial voiceless stops. Journal of Phonetics 35(3). 439–443. https://doi.org/10.1016/j.wocn.2006.01.002.Search in Google Scholar

Ringbom, Håkan. 1987. The role of the first language in foreign language learning. Bristol: Multilingual Matters.Search in Google Scholar

Sato, Junko Charlene. 1985. Task variation in interlanguage phonology. In Susan M. Gass & Carolyn G. Madden (eds.), Input in second language acquisition, 181–196. Rowley, MA: Newbury House.Search in Google Scholar

Sato, Keiko. 2011. Task type effects on the production of voice onset time (VOT) by Japanese learners of English. In Proceedings of the 17th international congress of phonetic sciences, 1746–1749. Hong Kong: City University of Hong Kong.Search in Google Scholar

Schertz, Jessamyn, Kathy Carbonell & Andrew J. Lotto. 2020. Language specificity in phonetic cue weighting: Monolingual and bilingual perception of the stop voicing contrast in English and Spanish. Phonetica 77(3). 186–208. https://doi.org/10.1159/000497278.Search in Google Scholar

Sharwood Smith, Michael & Eric Kellerman. 1986. Crosslinguistic influence in second language acquisition: An Introduction. In Eric Kellerman & Michael Sharwood Smith (eds.), Crosslinguistic influence in second language acquisition, 1–9. Oxford: Pergamon.Search in Google Scholar

Shimizu, Katsumasa. 1989. A cross-language study of voicing contrasts of stops. Studia Phonologica 23. 1–12.Search in Google Scholar

Shimizu, Katsumasa. 1993. A cross-language study of voicing contrasts of stops – with reference to voicing contrasts. Journal of Asian and African Studies 45. 163–175.Search in Google Scholar

Shimizu, Katsumasa. 1996. A cross-language study of voicing contrasts of stop consonants in Asian languages (Japanese, Mandarin Chinese, Korean, Burmese, Thai, Hindi). Tokyo: Seibido.Search in Google Scholar

Shimizu, Katsumasa. 1999. A study on phonetic characteristics of voicing of stop consonants in Japanese and English. Journal of the Phonetic Society of Japan 3(2). 4–10.Search in Google Scholar

Shultz, Amanda A., Alexander L. Francis & Fernando Llanos. 2012. Differential cue weighting in perception and production of consonant voicing. Journal of the Acoustical Society of America 132(2). EL95–EL101. https://doi.org/10.1121/1.4736711.Search in Google Scholar

Takada, Mieko. 2004. Nihongo no gotou no yuuseion /d/ ni okeru +VOT-kato sedaisa. [+VOT Tendency in the Initial Voiced Alveolar Plosive /d/ in Japanese and the Speakers’ age]. Journal of the Phonetic Society of Japan 8(3). 57–66.Search in Google Scholar

Takada, Mieko. 2011. Nihongo no gotou heisa’on no kenkyuu: VOT no kyoujiteki bunpu to tsuujiteki henka [Research on the word-initial stops of Japanese: Synchronic distribution and diachronic change in VOT]. Tokyo: Kurosio.Search in Google Scholar

Takada, Mieko, Eun Jong Kong, Kiyoko Yoneyama & Mary E. Beckman. 2015. Loss of prevoicing in modern Japanese /g, d, b/. In Proceedings of the 18th international congress of phonetic sciences, 1–5.Search in Google Scholar

Uloza, Virgilijus, Nora Ulozaitė-Stanienė, Tadas Petrauskas & Rima Kregždytė. 2021. Accuracy of acoustic voice quality index captured with a smartphone – measurements with added ambient noise. Journal of Voice 35. 1–8. https://doi.org/10.1016/j.jvoice.2021.01.025.Search in Google Scholar

Vance, Timothy J. 1987. An introduction to Japanese phonology. Albany, NY: State University of New York Press.Search in Google Scholar

Van der Woerd, Benjamin, Min Wu, Vijay Parsa, Philip C. Doyle & Kevin Fung. 2020. Evaluation of acoustic analyses of voice in nonoptimized conditions. Journal of Speech, Language, and Hearing Research 63(12). 3991–3999. https://doi.org/10.1044/2020_jslhr-20-00212.Search in Google Scholar

Whalen, Douglas H., Arthur S. Abramson, Leigh Lisker & Maria Mody. 1990. Gradient effects of fundamental frequency on stop consonant voicing judgments. Phonetica 47(1–2). 36–49. https://doi.org/10.1159/000261851.Search in Google Scholar

Whalen, Douglas H., Arthur S. Abramson, Leigh Lisker & Maria Mody. 1993. F0 gives voicing information even with unambiguous voice onset times. Journal of the Acoustical Society of America 93(4). 2152–2159. https://doi.org/10.1121/1.406678.Search in Google Scholar

Whitworth, Nicole. 2000. Acquisition of VOT and vowel length by English-German bilinguals: A pilot study. Leeds Working Papers in Linguistics and Phonetics 8. 15–25.Search in Google Scholar

Williams, Lee. 1977. The perception of stop consonant voicing by Spanish-English bilinguals. Perception & Psychophysics 21(4). 289–297. https://doi.org/10.3758/bf03199477.Search in Google Scholar

Yu, Alan C. L. 2022. Perceptual cue weighting is influenced by the listener’s gender and subjective evaluations of the speaker: The case of English stop voicing. Frontiers in Psychology 13. 840291. https://doi.org/10.3389/fpsyg.2022.840291.Search in Google Scholar

Zampini, Mary L. 2013. Voice onset time in second language Spanish. In Kimberly L. Geeslin (ed.), The handbook of Spanish second language acquisition, 111–129. Hoboken, New Jersey: Wiley Blackwell.10.1002/9781118584347.ch7Search in Google Scholar

Received: 2024-04-03
Accepted: 2025-04-27
Published Online: 2025-05-26

© 2025 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 18.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/flin-2025-2017/html
Scroll to top button