Home Perception of English semivowels by Japanese-speaking learners of English
Article Open Access

Perception of English semivowels by Japanese-speaking learners of English

  • Wei William Zhou EMAIL logo , Mineharu Nakayama and Atsushi Fujimori
Published/Copyright: August 20, 2024

Abstract

This study compared the perception of English semivowels /j/ and /w/ and their corresponding vowels /i/ and /u/ by two groups of Japanese-speaking learners of English: foreign language (FL) learners in Japan and second language learners who were initially FL learners (FL-L2). The phonological targets were /i/ and /u/ with and without preceding /j/ and /w/, respectively (/ji/-/i/ and /wu/-/u/). Discrimination and identification results showed that both groups performed comparably for /i/ with and without /j/, indicating their reliance on native phonological sensitivity to /i/ and /j/, which closely resemble Japanese /i/ and /j/. However, important differences emerged for /u/ with and without /w/, possibly due to articulatory differences in lip rounding for /u/ and /w/ between the two languages. Notably, while FL-L2 learners were equally competent in both the /u/-/wu/ and the /i/-/ji/ discriminations, FL learners were much less capable of the /u/-/wu/ discrimination than of the /i/-/ji/ discrimination. Moreover, FL-L2 learners were better at identifying /u/ than their FL peers, suggesting that L2 exposure may have facilitated their acquisition of the articulatory details associated with /u/. Overall, the study showed that engaging in L2 immersion following FL learning is beneficial for non-native phonological development among adult learners.

1 Introduction

Language learners acquire their additional languages in highly diverse contexts. For example, individuals can learn additional languages in a second language (L2) or a foreign language (FL) context (e.g., Ellis 2015; Klein 1986; Littlewood 1984; Raphan and Gertner 1990; VanPatten et al. 1987). In the L2 context, where the language acquired is predominantly spoken, learners have opportunities to receive naturalistic input and use the language to complete daily tasks. On the other hand, in the FL context, the additional language is not the dominant language, and learners usually lack access to naturalistic input and often depend heavily on structured instructional settings such as FL classrooms. Research has shown that acquisition context is important for learners’ phonological acquisition and development in the additional language (e.g., Albuquerque and Alves 2022; Best and Tyler 2007; Flege and Bohn 2021; Leung 2012, 2014; Tyler 2019). For instance, compared to L2 learners, FL learners often receive foreign-accented input in the classroom (e.g., from instructors and peers), which can contribute to an inaccurate phonological representation of non-native sounds (Tyler 2019). However, it is important to note that language acquisition context is not static. Many learners begin by acquiring an additional language as an FL and later transition to learning it as an L2 after relocating to a place where the language is predominantly spoken, such as when studying abroad (Institute of International Education 2022). Despite this, there is limited understanding of how changes in acquisition context influence non-native phonological acquisition and development.

Speech sounds are grouped into two broad categories: consonants and vowels (e.g., Ashby 2005; Ladefoged and Disner 2012). Both of these categories have been the subject of extensive research in the field of non-native phonological acquisition (e.g., Bohn and Flege 1990; Flege and MacKay 2004; Gorba and Cebrian 2021; Guion et al. 2000). In this context, semivowels represent an intriguing phonological category, as they occupy an “intermediate” position between vowels and consonants. Acoustically, they exhibit high resonance and share similarities with vowels, but functionally, they serve as consonants by gliding onto vowels or diphthongs (Borden and Harris 1980). Semivowels include the palatal semivowel /j/ (as in y ear) and the labial-velar semivowel /w/ (as in w ooze). Notably, the semivowels /j/ and /w/ have their vowel counterparts, the unrounded high front vowel /i/ (as in ea r) and the rounded high back vowel /u/ (as in oo zy), respectively (Maddieson and Emmorey 1985). In English, /j/ and /w/ can precede /i/ and /u/, creating contrasts with the vowels alone (as in year vs. ear and woozy vs. oozy). Impressionistic accounts have suggested that these English contrasts pose significant challenges for learners from different first language (L1) backgrounds, including Japanese (e.g., Avery and Ehrlich 1992; Nagase 1978; Tsujimura 2014), Chinese (Chan 2023), and Russian (Mentcher 1979). To the best of our knowledge, the study by Zhou and Nakayama (2023) is the first empirical investigation into this acquisition phenomenon. The study focused on the perception of English semivowels /j/ and /w/ by Japanese-speaking learners who initially learned English as a foreign language (FL) in Japan before moving to the United States, where they transitioned to becoming L2 learners of English (“FL-L2” learners). Through AX discrimination and identification tasks, the study revealed that FL-L2 learners encountered difficulties in perceiving both /j/ and /w/, with the latter presenting more challenges. This was attributed to the higher degree of phonetic and articulatory similarity shared between the Japanese and English /j/ in comparison to the Japanese and English /w/. Overall, the results of the study highlighted the influence of L1-L2 phonetic and articulatory differences on L2 sound perception.

The current study is an extension of Zhou and Nakayama’s (2023) research and investigates how Japanese-speaking learners, studying English as a foreign language (FL) in Japan, perceive English semivowels /j/ and /w/ and their corresponding high vowel counterparts /i/ and /u/. The primary goal is to compare the perception of FL learners with that of FL-L2 learners in Zhou and Nakayama’s (2023) study, shedding light on how shifts in acquisition context impact non-native phonological acquisition. Another goal is to understand how shifts in acquisition context impact the perception of sounds with varying degrees of L1-L2 phonetic and articulatory differences within the same phonological category.

The organization of the paper is as follows: Section 2 presents both the theoretical and language background as well as the purpose of the study with specific research questions. Sections 3 and 4 discuss the experimental design and results, respectively. Section 5 provides a general discussion and Section 6 discusses the limitation of the current study. Finally, our concluding remarks are stated in Section 7.

2 Background and motivation

2.1 Phonological acquisition in the L2 and FL contexts

The contexts in which an additional language is acquired are hypothesized to impact the phonological acquisition of the language. In the case of L2 acquisition, the Perceptual Assimilation Model-L2 (PAM-L2) was proposed to explain how individuals come to perceive and categorize L2 speech sounds (Best and Tyler 2007). It is an extension of the original PAM, which predicts speech sound perception among naïve listeners (Best 1994, 1995). According to the PAM-L2, listeners perceive unfamiliar non-native phones based on the perceived resemblance of articulatory gestures used to produce the non-native phones and those used to produce the closest sounds in their L1. The PAM-L2 predicts that the challenge in distinguishing between L2 phones is determined by the way these phonemes are perceptually grouped and how well they fit the phonetic categories of the learner’s L1. Specifically: (a) When two L2 phonemes are assimilated into distinct L1 categories (“two-category”), their discrimination should be good. For example, in the case of L1 Japanese learners of L2 Australian English (AusE), the AusE /i:/-/ɪ/ contrast represents a two-category assimilation scenario, with AusE /i:/ assimilated into the Japanese long (two-mora) /ii/, and AusE /ɪ/ assimilated into the Japanese short (one-mora) /i/ (Bundgaard-Nielsen et al. 2011a, 2011b). (b) If two L2 phones are assimilated into a single L1 category (“single-category”), their discrimination should be poor. In the same context of L1 Japanese learners acquiring L2 AusE, the AusE /i:/-/ɪə/ contrast represents a case of single-category assimilation, where both sounds are assimilated into the Japanese long /ii/ (Bundgaard-Nielsen et al. 2011a, 2011b). (c) When two L2 phonemes are assimilated into a single category in the L1 but are perceived as differing in their “fit” to that category (“category-goodness”), their discrimination will be better than single-category pairs but worse than two-category pairs. For example, the widely investigated English /r/-/l/ contrast acquired by L1 Japanese learners constitutes a case of category-goodness assimilation, where both English /r/ and /l/ are assimilated into Japanese /r/, but English /l/ appears to be more similar to Japanese /r/ than English /r/ does (Aoyama et al. 2004; Guion et al. 2000; Hattori and Iverson 2009; Tyler 2021). In addition, the PAM-L2 also predicts that if two L2 phones are recognized as speech sounds but cannot be consistently grouped into any specific L1 category, their discrimination will depend on their phonetic similarity and their perceptual resemblances to the nearest L1 categories.

In contrast to L2 contexts, the acquisition of phonology in an FL environment differs and presents unique challenges. This necessitates adjustments to the PAM-L2 model to adequately explain how phoneme acquisition takes place in an FL classroom, leading to the development of the PAM-FL (Tyler 2019). According to the PAM-FL, the predictions for two-category assimilations remain identical to those of the PAM-L2. Learners possess the necessary phonological sensitivity from their L1 to effectively acquire such FL phonemes. However, the acquisition of category-goodness assimilations in an FL classroom tends to be less successful compared to the results achieved through the L2 immersion context, particularly in cases where phonetic differences between the two phonemes are small due to the possible foreign-accented input from their instructors. As for single-category assimilations, their successful acquisition becomes even more unlikely within the confines of an FL classroom, given that they already present challenges within L2 contexts.

Traditionally, the distinction between L2 and FL has centered on the acquisition’s context and purpose; however, the reality of acquisition contexts is far more nuanced and dynamic (Albuquerque and Alves 2022; Cook 2009, 2010; Leung 2012, 2014; Kramsch 2002). For instance, many non-native English speakers initially learn English as an FL in their home countries before pursuing education, employment, and other opportunities in countries where English is spoken as the primary language. In such contexts, these learners may also continue to learn and improve their English as an L2. This type of acquisition context shift is quite common. For instance, in the academic year of 2021/22, a total of 948,519 international students pursued their studies in various higher education institutions in the United States alone, and the majority of these students were not native English speakers (Institute of International Education 2022). Many of these students had likely started learning English as an FL in their home countries. However, neither the PAM-L2 nor the PAM-FL has addressed the unique circumstances faced by these learners. That is, the development of phonology in light of the acquisition context shift is unclear. To shed light on this issue, this study presents a case involving the acquisition of English semivowels /j/ and /w/ as well as vowels /i/ and /u/ by Japanese-speaking learners in different acquisition contexts. Given that these English sounds have equivalents in Japanese yet display various degrees of phonetic and articulatory differences between the two languages, examining them can provide insights into how phonetic and articulatory differences influence non-native sound acquisition.

2.2 Semivowels and their corresponding vowels in English and Japanese

Semivowels, including the palatal /j/ and the labial-velar /w/, fall between vowels and consonants. Acoustically, they are highly resonant, closely resembling vowels; however, they are produced with a certain degree of constriction and are categorized as consonants because of their role in releasing vowels and diphthongs (Borden and Harris 1980). These two semivowels also have their high vowel counterparts /i/ and /u/, but they differ from these vowels by their increased degree of constriction and shorter durations (e.g., Catford 1977; Padgett 2008).

The presence of the semivowels /j/ and /w/ is prevalent. Maddieson (1984) reported that in over 85 % of the world’s languages, there exists a voiced palatal /j/ or a similar sound, and in over 75 % of languages, there is a voiced labial-velar /w/ or a similar sound. Similarly, high vowels /i/ and /u/ are also common in the majority of world languages. However, the articulatory characteristics and phonotactic rules for semivowels and vowels vary across languages. For example, Japanese /j/ and /i/ are highly similar to English /j/ and /i/. In contrast, Japanese /w/ and /u/ are also similar to English /w/ and /u/, but these two Japanese phonemes exhibit crucial phonetic and articulatory differences compared to their English counterparts. Specifically, Japanese /w/ involves lip-spreading, whereas English /w/ requires lip rounding. Moreover, Japanese /u/ is an unrounded vowel, whereas English /u/ is a rounded vowel (Nishi et al. 2008; Tsujimura 2014).

Japanese semivowels also differ from their English counterparts in terms of phonotactic rules. In English, /j/ and /w/ can precede /i/ and /u/, respectively, giving rise to the /ji/ and /wu/ sequences. These sequences and the vowels /i/ and /u/ in isolation form minimally contrastive pairs, as in year versus ear, yeast versus east, woof versus oof, and woozy versus oozy. However, in standard Japanese, /j/ and /w/ cannot precede /i/ and /u/, making the /ji/ and /wu/ sequences impossible (Labrune 2012; Vance 2008).

The challenges faced by Japanese learners when acquiring English semivowels have been documented. According to Nagase (1978), Japanese speakers learning English often omit the sound /w/ when it precedes other vowels because in Japanese, /w/ is only followed by /a/. Similarly, due to the restriction in Japanese where /j/ cannot be followed by /i/, Japanese learners of English tend to omit /j/ before /i/. This issue has also been discussed in works on L2/FL English pronunciation (e.g., Avery and Ehrlich 1992; Kenworthy 1987). More recently, Tsujimura (2014) also noted that English /wu/ and /ji/ are very difficult for Japanese learners, who tend to omit English /w/ in /wu/ and replace it with the Japanese /u/. As a result, the word wool [wʊl] may be pronounced as [uul]. Similarly, they will perceive [ji] as [i], making it difficult to differentiate between yeast and east in perception and production.

Although these observations are primarily based on impressions, Zhou and Nakayama’s (2023) study offered firsthand empirical evidence regarding the perception of /j/ and /w/ by Japanese-speaking learners. In the study, Japanese-speaking L2 learners of English with an initial FL background were asked to perceive artificial English words containing /i/ and /u/, both with and without preceding /j/ and /w/. Between the two semivowels under examination, /j/ exhibited a high degree of phonetic and articulatory similarity between Japanese and English, while /w/ largely demonstrated a high phonetic and articulatory resemblance but featured a crucial difference in lip-rounding, with Japanese requiring lip-spreading and English requiring lip-rounding. The study’s findings revealed that Japanese FL-L2 learners encountered greater difficulty with /w/ in comparison to /j/, suggesting that phonetic and articulatory differences between the learners’ L1 and L2 play a significant role in the perception of L2 sounds.

Given our incomplete understanding of non-native phonological acquisition across various contexts (as discussed in Section 2.1) and the limited empirical investigation on the acquisition of English semivowels in the existing literature, the current study extends Zhou and Nakayama’s (2023) research by delving into the crucial factor of language acquisition context. While Zhou and Nakayama’s study focused on a group of FL-L2 English learners who initially acquired the language as an FL in Japan and later transitioned to L2 acquisition in the United States, our study targets a different group of learners who are studying English solely as an FL in Japan and do not have any L2 experience. This contrast allows us to explore how differences in acquisition context can impact perceptual acquisition of non-native phonology. Particularly, because the two semivowels exhibit various degrees of phonetic and articulatory differences between Japanese and English, they provide an interesting case for investigating how context influences non-native phonological acquisition. Therefore, our research questions (RQs) are as follows:

RQ1:

How do language acquisition contexts (FL vs. FL-L2) influence Japanese-speaking learners’ perception of English semivowels before their vowel counterparts?

RQ2:

How do cross-linguistic (L1 and L2/FL) phonetic and articulatory differences influence Japanese-speaking FL and FL-L2 learners’ perception of English semivowels before their vowel counterparts?

The predictions are as follows. Given that Japanese /j/ and /i/ are highly similar to English /j/ and /i/ (Nishi et al. 2008; Vance 1987), Japanese speakers can rely on their L1 phonological sensitivity to perceive /j/ and /i/. Therefore, FL learners should be able to perceive /j/ and /i/ at a comparable level to that of FL-L2 learners reported in Zhou and Nakayama’s (2023) study. On the other hand, as Japanese /w/ and /u/ and English /w/ and /u/ are similar but differ significantly in lip-rounding (Nishi et al. 2008; Tsujimura 2014), Japanese-speaking FL learners are expected to be less successful in distinguishing /wu/ and /u/ compared to /ji/ and /i/. Moreover, due to the challenging nature of the lip-rounding feature, FL learners are expected to be less accurate in perceiving /w/ and /u/ than their FL-L2 counterparts in Zhou and Nakayama’s study, who benefited from naturalistic input in an L2 environment.

3 Methods

To address our research questions, the current study followed the methodology from Zhou and Nakayama (2023) in order to draw a direct comparison between FL and FL-L2 learners.

3.1 Participants

Thirty Japanese-speaking FL learners of English in Japan (average age 19.80, SD = 0.41, 3 males and 27 females) participated in this study. They were all students at a Japanese university who received their primary education in Japanese. All the FL participants had taken the institutional Test of English for International Communication (TOEIC) for listening and writing. Their average total TOEIC score was 581.67 (SD = 15.88, corresponding to the Common European Framework of Reference B1), with sub-scores of 322.83 (SD = 17.45) for listening and 258.83 (SD = 17.10) for reading. These participants were recruited through advisements in college-wide English as a Foreign Language (EFL) classes and received extra credits for their EFL classes for their participation. The specific consent procedure is elaborated in Section 3.3 Procedure.

In Zhou and Nakayama (2023), the participants were 22 Japanese-speaking FL-L2 learners of English (average age 23.50, SD = 3.24, 6 males and 16 females) and 10 L1 American English speakers (average age 20.90, SD = 2.56, 4 males, 4 females, and 2 unclassified) serving as a control group. The Japanese-speaking FL-L2 English learners all studied English as an FL in Japan initially and then came to the U.S. for further education, subsequently studying English as an L2. At the time of the experiment, they were all matriculated, degree-seeking students at U.S. universities. On average, they had resided in the U.S. for 2.53 years (SD = 1.92), with an average age of arrival at 21.68 years (SD = 3.91). The L2 participants had followed various pathways to meet their English language proficiency requirements before enrolling in their respective universities, including taking assessments such as the Test of English as a Foreign Language (e.g., iBT 79 or above), the International English Language Testing System (e.g., 6.5 or above), the TOEIC, or L2 English courses at their universities.

Both the FL participants in the current study and the FL-L2 participants in Zhou and Nakayama’s (2023) study also self-reported their daily percentage of English use. The FL participants in the present study used English daily at a rate of only 5 % (SD = 3.70), whereas the FL-L2 participants in Zhou and Nakayama’s study used English at a rate of 72.50 % (SD = 23.06) daily. An independent samples t-test indicated a significant difference in the self-reported daily use of English percentage between the two groups, t = -15.267, p < 0.001. Cohen’s d was 4.35, with a 95 % CI [3.311, 5.374], indicating a large effect size.

Overall, the FL participants in this study and the FL-L2 participants in Zhou and Nakayama’s (2023) study all began learning English during their primary education in Japan and experienced a comparable curriculum as FL learners in schools until the FL-L2 participants came to the U.S. for further studies.

3.2 Stimulus materials

The stimuli were identical to those used in Zhou and Nakayama (2023). They consisted of 60 disyllabic non-words following English phonotactic rules, comprising 40 critical words (e.g., eecha /ˈi:ʧə/, yeecha /ˈji:ʧə/, oocha /ˈu:ʧə/, and woocha /ˈwu:ʧə/) and 20 filler words that contained the phoneme /l/ (e.g., leecha /ˈli:ʧə/, loocha /ˈlu:ʧə/). Non-words were used to mitigate lexical influences, such as frequency. The phoneme /l/ was used because it shares acoustic similarities with /j/ and /w/, but words with an initial /l/ followed by /i/ or /u/ are more easily distinguishable from those without (e.g., lease vs. ease, loops vs. oops). The choice of disyllabic words represents the most common syllable number in English, which is two (Oh 2015). The second syllable contained a consonant or a consonant cluster followed by a schwa, the most central and neutral vowel (Oostendorp 1995; Silverman 2011). All the words adhere to the strong-weak syllable pattern in English (Cutler and Carter 1987). The words were recorded by an L1 American English male speaker in a sound studio. The speaker read each word five times using an Audio-Technica AT2035 microphone at a 44,100 Hz sampling rate. For the experiment, two good tokens were selected out of the five elicitations for each word. The intensity of all tokens was normalized to 68 dB in PRAAT (Boersma and Weenink 2020). The durations were not manipulated since duration is a crucial property that distinguishes between semivowels and vowels (Burgdorf and Tilsen 2021; Catford 1977). On average, the durations were 485 milliseconds (ms) for /i/ words, 558 ms for /ji/ words, 504 ms for /li/ words, 494 ms for /u/ words, 530 ms for /wu/ words, and 528 ms for /lu/ words. See Appendix A for all the stimulus words.

3.3 Procedure

The experimental procedure was identical to that in Zhou and Nakayama (2023), ensuring a reliable comparison of the data. The experiment was hosted on Qualtrics XM (Qualtrics 2022) using personalized Java scripts for media control purposes. The procedure for the experiment was as follows:

3.3.1 Consent

Participation was entirely voluntary. Prior to any involvement, participants provided consent, which included completing a consent form detailing the research goals, procedures, time commitment, compensation, risks and benefits, and participant rights. Participants were informed of procedures regarding maintenance of confidentiality and privacy, as well as their right to withdraw from the study without penalty. Participants had opportunities to ask questions and address any concerns they may have had. Only those who provided explicit consent proceeded to participate in the study. They received extra credits in their EFL classes for their participation.

3.3.2 Background survey

Participants began with a background survey, which gathered demographic information and details about their language experiences.

3.3.3 Headphone check and screening

After the survey, participants underwent a headphone screening test (Woods et al. 2017). This test was designed to exclude participants not using headphones. It consisted of six questions, each containing three pure tones. Participants had to identify the softest tone among the three in each question. They must have answered at least five questions correctly before being allowed to continue in the experiment. The test is known to be easy with headphones but challenging with loudspeakers due to phase-cancellation.

3.3.4 Word presentation

After passing the headphone screening, participants watched a short presentation of the stimulus words one by one, accompanied by their pronunciation in the background. They were instructed to watch the entire presentation and pay attention to the words without the need to memorize or take notes. This exposure aimed to familiarize participants with the grapheme-phoneme correspondence in the stimulus words. They viewed the word presentation just once.

3.3.5 AX discrimination task

The AX discrimination task consisted of 120 trials, which included 80 critical trials and 40 filler trials. Each trial consisted of two words, which could be either the same word (e.g., eecha-eecha) or different words (e.g., eecha-yeecha). Notably, in the “same” trials, the two tokens of the same word were always from two different recordings, preventing participants from relying on recording-specific idiosyncrasies. The trials were presented in a pseudorandomized order across four sections, with 30 trials in each. In this task, participants were asked to determine whether the two words in each trial were the same or different. They had 3 s to respond after listening to each trial, with optional short breaks offered between sections.

3.3.6 Two-alternative forced-choice identification task

The identification task consisted of 120 trials, which included 80 critical trials and 40 filler trials. Each trial presented a word, with each of the 60 unique words appearing twice, each time from different recordings. The trials were presented in a pseudorandomized order across four sections, each containing 30 trials. In this task, participants were asked to choose the word they heard from two provided options in each trial. They had 3 s to respond to each given trial, with optional short breaks between sections.

The experiment took about 45 min.

3.4 Data analysis

The data analysis was conducted in R (R Core Team 2022). For both the discrimination task and the identification task, descriptive statistics, including mean accuracies and standard deviations for the FL group, the FL-L2 group, and the L1 English group, were calculated.

Logistic regressions with mixed-effects modeling were used to fit the accuracy data for the two learner groups using lme4 (Bates et al. 2015). In the discrimination task, the model predicted discrimination accuracy from two factors: word group (Group 1: /i/-/ji/; Group 2: /u/-/wu/) and acquisition context (Context 1: FL; Context 2: FL-L2). In the identification task, the initial model was constructed with a maximal random effects structure (Barr et al. 2013). This structure included random intercepts for subject and item, a by-subject random slope for word group, and a by-item random slope for acquisition context. However, this complex model encountered convergence issues, indicating it was overly complex. Subsequently, the model was simplified by removing the by-item random slope for acquisition context, which led to model convergence. The final model included word group and acquisition context as factors, by-subject and by-item random intercepts, and a by-subject random slope for word group. The main effects of word group and acquisition context were determined by a log-likelihood ratio comparison test. Post-hoc pairwise comparisons were performed using emmeans (Lenth et al. 2023). In the identification task, the model predicted identification accuracy from three factors: word group (Group 1: /i/ with/without preceding /j/; Group 2: /u/ with/without preceding /w/), sound type (Type 1: vowel; Type 2: semivowel-vowel sequence), and acquisition context (Context 1: FL; Context 2: FL-L2). The model with the maximal random effects structure again had a convergence error. Subsequently, the model was modified by removing random slopes, resulting in a converged model that included random intercepts for both subject and item. To determine the main effects of word group and acquisition context, a log-likelihood ratio comparison test was assessed. Post-hoc pairwise comparisons were again conducted using emmeans.

4 Results

4.1 Discrimination task

4.1.1 Descriptive statistics

Table 1 displays a summary of the average accuracy percentages for trials containing two different words by the FL group, the FL-L2 group, and the L1 English control group, with standard deviations shown in parentheses. Note that the FL-L2 and L1 data are from Zhou and Nakayama (2023).

Table 1:

Discrimination accuracy percentages by acquisition context.

/i/-/ji/ /u/-/wu/
FL 78.7 (24.0) 51.3 (26.2)
FL-L2 77.5 (22.2) 61.1 (29.8)
L1 99.5 (1.6) 97.0 (4.2)

As Table 1 shows, the L1 English group had a nearly perfect performance. The FL group and the FL-L2 group showed comparable discrimination abilities between /i/ and /ji/, achieving accuracy percentages of 78.7 % and 77.5 %, respectively. However, the FL group was outperformed by the FL-L2 group in discriminating between /u/ and /wu/, with accuracy percentages of 51.3 % and 61.1 % respectively. The full set of the results, including those from the filler items, is provided in Appendix B.

4.1.2 Regression analysis

As reported in Section 3.4, the converged logistic mixed-effects model included acquisition context and word group as fixed factors, with by-subject and by-item random intercepts, and a by-subject random slope for word group. To assess main effects, a log-likelihood ratio comparison test was performed. The result revealed a significant main effect of word group, χ(1)2 = 12.74, p < 0.001, suggesting that both learner groups were more accurate in distinguishing between /i/ and /ji/ than in differentiating between /u/ and /wu/.

Pairwise comparisons using emmeans (Lenth et al. 2023) revealed that FL learners achieved significantly higher discrimination accuracy between /i/ and /ji/ compared to their discrimination between /u/ and /wu/ (p < 0.001). In contrast, there were no significant differences observed between FL-L2 learners’ discrimination abilities for /i/ and /ji/ and for /u/ and /wu/. These findings indicate that while FL-L2 learners could perceive /j/ and /w/ comparably, FL learners could perceive /j/ much better than /w/. Given that /j/ is extremely similar between English and Japanese, both groups of learners possibly enjoyed an L1 advantage in their perception of /j/, but since /w/ is more different between the two languages, the FL learners found it more difficult to perceive /w/ than /j/.

These results provide insights into our two RQs, which explore the effects of acquisition contexts and phonetic/articulatory differences. Specifically, the perception of non-native sounds is influenced by their phonetic and articulatory similarity with L1 sounds. When non-native sounds can be easily mapped to L1 categories, as is the case with /j/, learners can perceive them competently, regardless of their acquisition contexts, since they already have the necessary L1 phonological sensitivity for those sounds. However, when the non-native sounds from the same phonological categories are similar to their L1 counterparts but still exhibit some important phonetic and articulatory differences, the FL-L2 learners could maintain a consistent level of perceptual performance whereas the FL learners appeared to have encountered some difficulty.

Furthermore, it is worth considering the characteristics of the specific stimulus items and the task adopted. The /i/-/ji/ stimulus items had a durational difference of 73 ms., while the /u/-/wu/ items had a difference of 36 ms. Thus, the FL learners may have used this cue and found it easier to distinguish between /i/ and /ji/ than between /u/ and /wu/.[1] While the relative contribution of phonetic/articulatory differences and duration is not easy to tease apart in the current task, it is expected that the importance of duration will diminish in the subsequent identification task, where each stimulus item was presented in isolation.

4.2 Identification task

4.2.1 Descriptive statistics

Table 2 provides a summary of the average accuracy percentages for different words by the FL group and the FL-L2 group in the identification task, along with the corresponding standard deviations in parentheses. The FL-L2 and F1 data are from Zhou and Nakayama (2023).

Table 2:

Accuracy percentages by word groups and acquisition context.

/i/ /ji/ /u/ /wu/
FL 89.0 (13.0) 64.3 (21.7) 48.0 (34.5) 81.2 (15.6)
FL-L2 84.3 (17.8) 68.0 (20.2) 63.9 (28.8) 81.8 (16.1)
L1 100.0 (0.0) 99.5 (1.6) 99.5 (1.6) 99.5 (1.6)

As Table 2 shows, the L1 English group achieved perfect or near-perfect accuracy in all the word groups. The FL group had an accuracy rate of 89.0 % for the /i/ words while the FL-L2 group scored slightly lower at 84.3 %. For the /ji/ and /u/ items, the FL group had lower accuracy rates compared to the FL-L2 group. Specifically, for /ji/, the FL group achieved 64.3 % correct, while the FL-L2 group achieved 68.0 %. For /u/, the FL group achieved 48.0 % correct, which was below chance, while the FL-L2 group had 63.9 %. For the /wu/ items, the accuracy rates between the two groups were minimal. The full set of the results, including those from the filler items, is included in Appendix C.

4.2.2 Regression analysis

As reported in Section 3.4, the converged logistic mixed-effects model included word group, sound type, and acquisition context as fixed factors, along with random intercepts for both subject and item. A log-likelihood comparison test confirmed a significant three-way interaction between word group, sound type, and acquisition context, χ(1) = 14.99, p < 0.001. This interaction indicates that the perception of the two types of sounds (vowels only and semivowel-vowel sequences) is dependent on the acquisition context and the given word groups being considered. In other words, participants’ acquisition context significantly shapes their perception of sound types within each word group.

Pairwise comparisons based on acquisition context were conducted using emmeans (Lenth et al. 2023). The crucial results indicated the following: (a) There were no differences between the FL group and the FL-L2 group in their perception of /i/ and /ji/; (b) The FL group was outperformed by the FL-L2 group in perceiving /u/ (p < 0.03), but there was no difference in their perception of /w/; (c) Between /i/ and /ji/, both the FL group and the FL-L2 group had higher accuracy in identifying /i/ compared to /ji/ (p < 0.001 for both groups), suggesting an /i/ bias; and (d) Between /u/ and /wu/, both learner groups were more accurate in identifying /wu/ than /u/ (p < 0.001 for both groups), indicating a /wu/ bias.

These results shed further light on our RQs on acquisition contexts and phonetic/articulatory differences. First, the null results between the two groups (FL 89.0 % and FL-L2 84.3 %) for words involving /i/ in (a) lend support to the claim that high L1-L2 phonetic and articulatory similarity is beneficial for acquiring non-native sounds. However, the null results between both groups (FL 64.3 % and FL-L2 68.0 %) for words involving /ji/ had much lower, yet above-chance, accuracy rates. The above-chance accuracies for /ji/ suggest that the presence of /j/ and /i/ as separate phonemes in Japanese can somewhat facilitate participants’ perception of /ji/. However, the lower accuracies for /ji/ suggest that the phonotactic constraint of */ji/ in Japanese (Labrune 2012; Vance 2008) still makes /ji/ a less favorable candidate than /i/ for both groups of learners. Second, the FL-L2 group’s better performance in perceiving /u/ in (b) demonstrates that when L1 and non-native sounds are similar but exhibit some phonetic and articulatory distance, having an L2 experience is advantageous for acquiring those sounds. Furthermore, although there is no difference between the two groups in the perception of /w/, there is an important caveat: The FL group achieved an accuracy rate of 81.2 % in perceiving /wu/, but this came at the cost of a below-chance accuracy rate of 48.0 % in perceiving /u/. Since the task was a two-way forced alternative task, this implies that the FL learners mistook /u/ for /wu/. In comparison, although the FL-L2 group also exhibited some bias toward /wu/, it still achieved an above-chance accuracy for /u/. This suggests that FL-L2 learners did not overgeneralize /u/ as /wu/ to the same extent as the FL learners. Further discussion about these biases will be provided in the next section.

5 Discussions

The current study, in conjunction with Zhou and Nakayama’s (2023) study, examined the perception of English semivowels and their vowel counterparts by Japanese-speaking learners. Specifically, this study asked how language acquisition contexts (RQ1), as well as phonetic/articulatory differences between the L1 and L2/FL sounds (RQ2), influence learners’ perception. The results showed that when the L1 and non-native sounds share high phonetic and articulatory similarities, language acquisition contexts do not exhibit facilitative effects; rather, learners may rely on their L1 phonological sensitivity to perceive these sounds, irrespective of their acquisition contexts. However, when the L1 and L2/FL sounds involve crucial phonetic/articulatory differences, such as lip-rounding, the L2 context proved more advantageous for learners compared to the FL context.

First and foremost, in the AX discrimination task, there was a significant main effect of word group. Post-hoc pairwise comparisons indicate that the significance of the word group was largely driven by the FL group’s better accuracy for /i/-/ji/ discrimination than /u/-/wu/ discrimination, as the FL-L2 group’s accuracies for /i/-/ji/ and /u/-/wu/ were not statistically different. These results suggest that while the FL-L2 group’s discrimination ability for the two groups of words was comparable, the FL group had less trouble with /i/-/ji/ discrimination than with /u/-/wu/ discrimination. It is possible that the high similarity between /j/ and /i/ in Japanese and English could leverage Japanese-speaking learners’ L1 phonological sensitivity, facilitating the discrimination between the two for both learner groups. On the other hand, due to the difference in the lip-rounding feature of /w/ and /u/ between the two languages, the FL group, which presumably had less exposure to rich, authentic language input, had greater difficulty in discriminating between /wu/ and /u/. In addition to the phonetic and articulatory differences, another cue that was available to the participants in this task was duration. It is possible that FL learners’ better success in /i/-/ji/ discrimination could be due to the larger duration difference exhibited by /i/-/ji/, as opposed to /u/-/wu/. Moreover, although there was no main effect of acquisition context in this task, it is worth noting that the accuracy difference between the two groups for /i/-/ji/ was much smaller than that for /u/-/wu/ (a 1.2 % versus a 9.8 % difference). It should also be noted that both FL and FL-L2 groups had good accuracy in /i/-/ji/ discrimination. However, neither group performed nearly as well as the L1 English speakers in the control group, suggesting that established L2 phonetic categories are not identical to those possessed by native speakers (e.g., Flege and Bohn 2021; Flege et al. 2021).

Similarly, the results of the discrimination task further confirmed the effects of L1 phonological sensitivity and L2 experience on non-native sound perception. Essentially, L2 immersion did not shape the perception of sounds with high L1-L2 similarity. When it comes to perceiving /i/ with and without the preceding semivowel /j/, the FL-L2 group did not outperform the FL group, confirming the prediction that Japanese-speaking learners can rely on their L1 sensitivity to Japanese /j/ and /i/ to identify the corresponding phonemes in English. In contrast, the FL-L2 group achieved significantly higher accuracy than the FL group in perceiving /u/, a phoneme that exhibits phonetic and articulatory differences between English and Japanese in terms of lip-rounding. This difference suggests an L2 advantage. Moreover, both the FL-L2 and the FL groups exhibited some perceptual biases in identification. Specifically, both groups showed a preference for the vowel /i/ over the semivowel-vowel sequence /ji/, while the trend was reversed for /u/ and /wu/, as both groups preferred the semivowel-vowel sequence /wu/ over the single vowel /u/. A possible explanation for these biases could be the varying frequency of phoneme occurrences in English. The frequencies of the phonemes of interest in this study are presented in Table 3. These frequencies are based on a database containing 103,887 instances of phoneme occurrences in conversational English (Mines et al. 1978).

Table 3:

Frequency of occurrence of phonemes /i/, /u/, /j/ and /w/ in conversational English.

Phoneme % # of occurrences
/i/ 3.69 3,831
/w/ 2.77 2,878
/u/ 1.13 1,175
/j/ 1.09 1,134

Although the phonological contexts used in this study were specific, it is evident that the accuracy rates in the identification task, where /i/ was more accurately identified than /u/, and /w/ was identified more accurately than /j/, are correlated with the frequency of occurrence of these phonemes in the conversation corpus. Specifically, strong positive correlations were observed between phoneme frequency and identification accuracy for both the FL-L2 learners (r = 0.97) and the FL learners (r = 0.93). This suggests that higher phoneme frequencies are associated with greater accuracy in both contexts. This observation aligns with the tenets of a frequency-based account in language acquisition, where the acquisition outcomes of specific linguistic components are shaped by the extent of learners’ exposure to those components (e.g., Ellis 2002, 2012).

In addition to frequency, there are other possibilities that could potentially contribute to these biases. For example, the Japanese-speaking learners’ preference for /i/ over /ji/ could also be due to their implicit knowledge about the phonological and phonotactic rules about their L1, in which */ji/ is not a legal combination. This reality could have biased them to choose /i/ over /ji/ when uncertain (e.g., Dupoux et al. 1999; Kilpatrick et al. 2019). However, this argument does not appear to hold for learners’ preference for /wu/ over /u/, as the former is an illegal combination in Japanese. It is possible that due to their unfamiliarity with lip-rounding, which is a rare feature in their language, Japanese-speaking learners overgeneralized /u/ as /wu/, the latter of which involves two rounded phonemes (rather than one in the former). In other words, these learners simply selected the option presumed to possess a higher degree of roundedness. If this is true, it suggests that in this case, more emphasis was placed on articulatory features than on phonotactic constraints, possibly because the articulatory details embedded in the feature can make the phonotactic violations not easily recognizable or noticeable. This interpretation, however, should be approached with great caution until more empirical evidence is obtained.

The findings of this study, along with those of Zhou and Nakayama’s (2023) study highlight a significant phenomenon: As FL learners transition into an environment where their FL becomes their L2, their non-native phonology can undergo changes. This demonstrates the advantages of L2 immersion in L2 phonological development across diverse contexts. Both PAM-L2 (Best 1994, 1995; Best and Tyler 2007) and PAM-FL (Tyler 2019) have previously predicted that in cases where the L1 and the non-native language have substitutable phonemes (e.g., two-category assimilations), learners can use their L1 phonological sensitivity to acquire the non-native phonemes. However, this study has provided evidence that the situation is even more nuanced. Specifically, for vowels /i/ and /u/ and the semivowels /j/ and /w/, which exhibit a perfect one-to-one phonological correspondence between English and Japanese, the extent of L1-L2 phonetic articulatory differences can significantly influence non-native perception. This further suggests that phonological development and refinement can occur through increased exposure to naturalistic input in the L2 context, even for adults (Flege and Liu 2001).

Moreover, this study has provided new empirical evidence to support the long-standing impressionistic observations on the acquisition of English semivowels by L1 Japanese learners (Avery and Ehrlich 1992; Kenworthy 1987; Nagase 1978; Tsujimura 2014). Together with Zhou and Nakayama’s (2023) study, our research has demonstrated that perceiving English semivowels remains extremely challenging for Japanese-speaking learners, even when they have been immersed in an L2, English-speaking environment. However, when comparing the FL learners to the FL-L2 learners in Zhou and Nakayama (2023), it is evident that having L2 experience still provides benefits for enhancing phonological sensitivity to specific sounds such as /w/. Considering the complex phonological contexts chosen for examination, i.e., /j/ and /w/ before /i/ and /u/, respectively, it is plausible that difficulties in perceiving semivowels would persist even in simpler phonological contexts. For instance, similar challenges could arise when /j/ and /w/ appear before non-high vowels, as seen in words such as yes, you, we, and win. Given the abundance of words involving /j/ and /w/ in English, mastering them is critical for learners to engage in daily communication and routines.[2] From a more universal perspective, English /j/ and /w/ appear to pose great difficulties for learners of different L1 backgrounds, such as those who speak Chinese (Chan 2023) and Russian (Mentcher 1979). However, to our knowledge, this study is the only existing research that provides empirical evidence for this acquisition phenomenon.

6 Limitations and future directions

Since the focus of this study is on the perception of /j/ and /w/ before their vowel counterparts /i/ and /u/, which are particularly challenging phonological environments, it does not include other phonological contexts involving the same semivowels followed by non-high vowels. Given that /j/ and /w/ followed by non-high vowels are prevalent in English, as seen in words such as yes ([j]+[ɛ]), yam ([j]+[æ]), web ([w]+[ɛ]), and wax ([w]+[æ]), future studies can explore this aspect to include a wider range of phonological environments to improve our understanding of the overall acquisition of semivowels in English.

Furthermore, a central area of interest in this study is to shed light on phonological development across different acquisition contexts. To achieve this goal, the study closely compared Japanese participants studying in Japan with those studying in the U.S. in Zhou and Nakayama (2023). However, one’s English learning circumstances can vary considerably. While the language background survey conducted and the mixed-effects models used for the data are attempts to account for these individual variations, one’s English learning experience is much more dynamic and individualized than this study can fully encompass. To minimize participant variations, future studies could potentially adopt a within-subject design, involving the same participants in both a pre-test (when they were FL learners) and a post-test (after an extended period of immersion in an L2 environment). Conducting such a longitudinal study could offer additional insights into the dynamic nature of phonological development over time in an L2 environment.

It should also be noted that language acquisition contexts are very diverse and complex, extending far beyond the scope of this study. For example, Leung (2012; 2014 found that children who were exposed to English-speaking, Filipino domestic helpers in Hong Kong had more robust phonological categories for English than their peers without such exposure at home. Albuquerque and Alves (2022) found that even for learners in the same macro acquisition context, their speech intelligibility and comprehensibility are influenced by their duration of exposure to the immersion context, their language learning format (e.g., formal vs. informal), as well as the personal experiences of their listeners (e.g., contact with foreigners). These findings emphasize the nuanced and dynamic nature of acquisition contexts. Additional research is needed to improve our understanding in this area.

Building upon Zhou and Nakayama’s (2023) work, this current study offers initial insights into phonological development across various acquisition contexts, and the results suggest a promising direction for gathering more evidence for the PAM model to account for FL-L2 phonological acquisition. While this study focused on only two pairs of sounds, it is of theoretical interest to expand the scope of research within the PAM model by considering various sound assimilation scenarios, including single-category assimilation, two-category assimilation, and category-goodness assimilation (or replacing a category with a more fine-grained feature). This can involve testing a variety of languages across contexts. Considering the increasing global mobility of people today, it is beneficial to study and understand the dynamic processes involved in phonological acquisition and development within diverse acquisition contexts.

7 Conclusions

The present study compared Japanese-speaking FL English learners’ perception of English semivowels with that of FL-L2 learners in Zhou and Nakayama (2023). The specific sounds under examination were the semivowels /j/ and /w/ preceding their corresponding vowels /i/ and /u/, as well as the vowels in isolation (/ji/ vs. /i/ and /wu/ vs. /u/). Among these sounds, Japanese /j/ and /i/ closely resemble their English counterparts, while Japanese /w/ and /u/ differ from their English counterparts in phonetic and articulatory features involving lip-rounding. The results showed that both FL and FL-L2 learners exhibited a reliable ability to perceive /ji/ and /i/, suggesting that they may have relied on their L1 phonological sensitivity to perceive these sounds. However, FL and FL-L2 learners exhibited some important differences in their perception of /wu/ and /u/. In the discrimination task, it was found that unlike their FL-L2 peers, FL learners had more difficulty perceiving the difference between /wu/ and /u/ than /ji/ and /i/. In the identification task, FL-L2 learners demonstrated a superior ability to perceive /u/ compared to FL learners, highlighting a facilitative effect of L2 immersion in phonological acquisition. Taken together, the results show that post-FL L2 immersion yields benefits for L2 phonological development, even among adults.


Corresponding author: Wei William Zhou, Department of East Asian Languages and Literatures, The Ohio State University, 398 Hagerty Hall, 1775 College Rd, Columbus, 43210-1132, OH, USA, E-mail:
We would like to express our gratitude to Yuichi Ono, Cynthia G. Clopper, Hunter Klie, Nicole Nicholson, and Marjorie K.M. Chan for their feedback and discussions on the earlier version of this paper. We also thank the IRAL editors and reviewers for their helpful comments. Any shortcomings remain ours. This research project was partially supported by the Nissen Chemitec America Scholarship from The Ohio State University Department of East Asian Languages and Literatures for the first author, a small grant from The Ohio State University College of Arts and Sciences and Department of East Asian Languages and Literatures for the second author, and the Japan Society for the Promotion of Science Grant-in-Aid for Scientific Research (C) 20K00806 for the third author. We gratefully acknowledge their support.

Funding source: The Ohio State University

Award Identifier / Grant number: College of Arts and Sciences Small Grants

Award Identifier / Grant number: Nissen Chemitec America Scholarship

Funding source: Japan Academic Promotion Foundation

Award Identifier / Grant number: KAKENHI grant C20K00806

Appendix A: Word list

/i/ words IPA /ji/ words IPA /li/ words IPA
eeba ˈi:bə yeeba ˈji:bə leeba ˈli:bə
eecha ˈi:ʧə yeecha ˈji:ʧə leecha ˈli:ʧə
eega ˈi:ɡə yeega ˈji:ɡə leega ˈli:ɡə
eeja ˈi:ʤə yeeja ˈji:ʤə leeja ˈli:ʤə
eepra ˈi:prə yeepra ˈji:prə leepra ˈli:prə
eesca ˈi:skə yeesca ˈji:skə leesca ˈli:skə
eesha ˈi:ʃə yeesha ˈji:ʃə leesha ˈli:ʃə
eeta ˈi:tə yeeta ˈji:tə leeta ˈli:tə
eetha ˈi:θə yeetha ˈji:θə leetha ˈli:θə
eeza ˈi:zə yeeza ˈji:zə leeza ˈli:zə
/u/ words IPA /wu/ words IPA /lu/ words IPA
ooba ˈu:bə wooba ˈwu:bə looba ˈlu:bə
oocha ˈu:ʧə woocha ˈwu:ʧə loocha ˈlu:ʧə
ooga ˈu:ɡə wooga ˈwu:ɡə looga ˈlu:ɡə
ooja ˈu:ʤə wooja ˈwu:ʤə looja ˈlu:ʤə
oopra ˈu:prə woopra ˈwu:prə loopra ˈlu:prə
oosca ˈu:skə woosca ˈwu:skə loosca ˈlu:skə
oosha ˈu:ʃə woosha ˈwu:ʃə loosha ˈlu:ʃə
oota ˈu:tə woota ˈwu:tə loota ˈlu:tə
ootha ˈu:θə wootha ˈwu:θə lootha ˈlu:θə
ooza ˈu:zə wooza ˈwu:zə looza ˈlu:zə

Appendix B: Accuracy rates (%) with standard deviations (SD) for all stimulus types in the discrimination task

Trial Language Target(s) Type Accuracy (SD)
Critical L1 /i/-/ji/ & /ji/-/i/ different 99.5 (1.6)
Critical L1 /i/-/i/ & /ji/-/ji/ same 96.5 (3.4)
Critical FL-L2 /i/-/ji/ & /ji/-/i/ different 77.5 (22.2)
Critical FL-L2 /i/-/i/ & /ji/-/ji/ same 74.3 (21.5)
Critical FL /i/-/ji/ & /ji/-/i/ different 78.7 (24.0)
Critical FL /i/-/i/ & /ji/-/ji/ same 79.5 (12.3)
Critical L1 /u/-/wu/ & /wu/-/u/ different 97.0 (4.2)
Critical L1 /u/-/u/ & /wu/-/wu/ same 94.5 (5.0)
Critical FL-L2 /u/-/wu/ & /wu/-/u/ different 61.1 (29.8)
Critical FL-L2 /u/-/u/ & /wu/-/wu/ same 84.3 (15.5)
Critical FL /u/-/wu/ & /wu/-/u/ different 51.3 (26.2)
Critical FL /u/-/u/ & /wu/-/wu/ same 90.8 (10.1)
Filler L1 /i/-/li/, /ji/-/li/, /li/-/i/, & /li/-/ji/ different 100.0 (0.0)
Filler L1 /li/-/li/ same 98.0 (4.2)
Filler FL-L2 /i/-/li/, /ji/-/li/, /li/-/i/, & /li/-/ji/ different 97.7 (5.3)
Filler FL-L2 /li/-/li/ same 90.0 (15.4)
Filler FL /i/-/li/, /ji/-/li/, /li/-/i/, & /li/-/ji/ different 97.7 (5.7)
Filler FL /li/-/li/ same 90.7 (14.1)
Filler L1 /u/-/lu/, /wu/-/lu/, /lu/-/u/, & /lu/-/wu/ different 100.0 (0.0)
Filler L1 /lu/-/lu/ same 99.0 (3.2)
Filler FL-L2 /u/-/lu/, /wu/-/lu/, /lu/-/u/, & /lu/-/wu/ different 98.6 (4.7)
Filler FL-L2 /lu/-/lu/ same 90.5 (14.3)
Filler FL /u/-/lu/, /wu/-/lu/, /lu/-/u/, & /lu/-/wu/ different 98.7 (7.3)
Filler FL /lu/-/lu/ same 92.3 (13.0)

Appendix C: Accuracy rates (%) with standard deviations (SD) for all stimulus types in the discrimination task

Trial Language Target Accuracy (SD)
Critical L1 /i/ 100.0 (0.0)
Critical L1 /ji/ 99.5 (1.6)
Critical FL-L2 /i/ 84.3 (17.8)
Critical FL-L2 /ji/ 68.0 (20.2)
Critical FL /i/ 89.0 (13.0)
Critical FL /ji/ 64.3 (21.7)
Critical L1 /u/ 99.5 (1.6)
Critical L1 /wu/ 99.5 (1.6)
Critical FL-L2 /u/ 63.9 (28.8)
Critical FL-L2 /wu/ 81.8 (16.1)
Critical FL /u/ 48.0 (34.5)
Critical FL /wu/ 81.2 (15.6)
Filler L1 /li/ 100.0 (0.0)
Filler FL-L2 /li/ 97.7 (5.5)
Filler FL /li/ 99.3 (2.9)
Filler L1 /lu/ 100.0 (0.0)
Filler FL-L2 /lu/ 98.4 (4.5)
Filler FL /lu/ 99.5 (1.5)

References

Albuquerque, Jeniffer Imaregna Alcantara de & Ubiratã Kickhöfel Alves. 2022. Dynamic paths of intelligibility and comprehensibility: Implications for pronunciation teaching from a longitudinal study with Haitian learners of Brazilian Portuguese. In Kickhöfel Alves Ubiratã & Jeniffer Imaregna Alcantara de Albuquerque (eds.), Second language pronunciation, 107–144. Berlin, Germany: De Gruyter.10.1515/9783110736120-005Search in Google Scholar

Aoyama, Katsura, James E. Flege, Susan G. Guion, Reiko Akahane-Yamada & Tsuneo Yamada. 2004. Perceived phonetic dissimilarity and L2 speech learning: The case of Japanese /r/ and English /l/ and /r. Journal of Phonetics 32(2). 233–250. https://doi.org/10.1016/S0095-4470(03)00036-6.Search in Google Scholar

Ashby, Patricia. 2005. Speech sounds, 2nd edn. London, UK: Routledge.Search in Google Scholar

Avery, Peter & Susan Ehrlich. 1992. Teaching American English pronunciation. Oxford, UK: Oxford University Press.Search in Google Scholar

Baese-Berk, Melissa Michaud. 2019. Interactions between speech perception and production during learning of novel phonemic categories. Attention, Perception, & Psychophysics 81(4). 981–1005. https://doi.org/10.3758/s13414-019-01725-4.Search in Google Scholar

Barr, Dale J., Roger Levy, Christoph Scheepers & Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68(3). 255–278. https://doi.org/10.1016/j.jml.2012.11.001.Search in Google Scholar

Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–51. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar

Best, Catherine T. 1994. The emergence of native-language phonological influences in infants: A perceptual assimilation model. In Judith C. Goodman & Howard C. Nusbaum (eds.), The development of speech perception: The transition from speech sounds to spoken words, 167–224. Cambridge, MA: MIT Press.Search in Google Scholar

Best, Catherine T. 1995. A direct realist view of cross-language speech perception. In Winifred Strange (ed.), Speech perception and linguistic experience: Issues in cross-language research, 171–204. Timonium, MD: York Press.Search in Google Scholar

Best, Catherine T. & Michael D. Tyler. 2007. Nonnative and second-language speech perception: Commonalities and complementarities. In Ocke-Schwen Bohn & Murray J. Munro (eds.), Language experience in second language speech learning: In honor of James Emil Flege, 13–34. Amsterdam, The Netherlands: John Benjamins.10.1075/lllt.17.07besSearch in Google Scholar

Boersma, Paul & David Weenink. 2020. Praat: Doing phonetics by computer. Available at: http://www.praat.org/.Search in Google Scholar

Bohn, Ocke-Schwen & James E. Flege. 1990. Interlingual Identification and the role of foreign language experience in L2 vowel perception. Applied PsychoLinguistics 11(3). 303–328. https://doi.org/10.1017/S0142716400008912.Search in Google Scholar

Borden, Gloria J. & Katherine S. Harris. 1980. Speech science primer: Physiology, acoustics, and perception of speech. Baltimore, Md: Williams & Wilkins.Search in Google Scholar

Bundgaard-Nielsen, Rikke L., Catherine T. Best & Michael D. Tyler. 2011a. Vocabulary size matters: The assimilation of second-language Australian English vowels to first-language Japanese vowel categories. Applied PsychoLinguistics 32(1). 51–67. https://doi.org/10.1017/S0142716410000287.Search in Google Scholar

Bundgaard-Nielsen, Rikke L., Catherine T. Best & Michael D. Tyler. 2011b. Vocabulary size is associated with second-language vowel perception performance in adult learners. Studies in Second Language Acquisition 33(3). 433–461. https://doi.org/10.1017/S0272263111000040.Search in Google Scholar

Burgdorf, Dan Cameron & Sam Tilsen. 2021. Temporal differences between high vowels and glides are more robust than spatial differences. Journal of Phonetics 88. 1–47. https://doi.org/10.1016/j.wocn.2021.101073.Search in Google Scholar

Catford, John Cunnison. 1977. Fundamental problems in phonetics. Bloomington, IN: Indiana University Press.Search in Google Scholar

Chan, Marjorie K.M. 2023. The zero initial in Chinese: A preliminary exploration into D2 and L2 acquisition. In Mineharu Nakayama, Marjorie K.M. Chan & Zhiguo Xie (eds.), Buckeye East Asian linguistics, vol. 6, 1–14. Columbus, OH: The Ohio State University.Search in Google Scholar

Cook, Vivian. 2009. Language user groups and language teaching. In Vivian Cook & Wei Li (eds.), Contemporary applied linguistics, 54–74. London, UK: Continuum.Search in Google Scholar

Cook, Vivian. 2010. Prolegomena to second language learning. In Paul Seedhouse, Steve Walsh & Chris Jenks (eds.), Conceptualising “learning” in applied linguistics, 6–22. London, UK: Palgrave Macmillan.10.1057/9780230289772_2Search in Google Scholar

Cutler, Anne & David M. Carter. 1987. The predominance of strong initial syllables in the English vocabulary. Computer Speech & Language 2(3–4). 133–142. https://doi.org/10.1016/0885-2308(87)90004-0.Search in Google Scholar

Dupoux, Emmanuel, Kazuhiko Kakehi, Yuki Hirose, Christophe Pallier & Jacques Mehler. 1999. Epenthetic vowels in Japanese: A perceptual illusion? Journal of Experimental Psychology: Human Perception and Performance 25(6). 1568–1578. https://doi.org/10.1037/0096-1523.25.6.1568.Search in Google Scholar

Ellis, Nick C. 2002. Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24(2). 143–188. https://doi.org/10.1017/S0272263102002024.Search in Google Scholar

Ellis, Nick C. 2012. Formulaic language and second language acquisition: Zipf and the phrasal teddy bear. Annual Review of Applied Linguistics 32. 17–44. https://doi.org/10.1017/S0267190512000025.Search in Google Scholar

Ellis, Rod. 2015. Understanding second language acquisition, 2nd edn. Oxford, UK: Oxford University Press.Search in Google Scholar

Escudero, Paola. 2007. Second-language phonology: The role of perception. In Martha C. Pennington (ed.), Phonology in context, 109–134. London, UK: Palgrave Macmillan.10.1057/9780230625396_5Search in Google Scholar

Flege, James E., Katsura Aoyama & Ocke-Schwen Bohn. 2021. The revised speech learning model (SLM-r) applied. In Ratree Wayland (ed.), Second language speech learning, 84–118. Cambridge, UK: Cambridge University Press.10.1017/9781108886901.003Search in Google Scholar

Flege, James E. & Ocke-Schwen Bohn. 2021. The revised speech learning model (SLM-r). In Ratree Wayland (ed.), Second language speech learning, 3–83. Cambridge, UK: Cambridge University Press.10.1017/9781108886901.002Search in Google Scholar

Flege, James E. & Serena Liu. 2001. The effect of experience on adults’ acquisition of a second language. Studies in Second Language Acquisition 23(4). 527–552. https://doi.org/10.1017/S0272263101004041.Search in Google Scholar

Flege, James E. & Ian R. A. MacKay. 2004. Perceiving vowels in a second language. Studies in Second Language Acquisition 26(1). 1–34. https://doi.org/10.1017/S0272263104026117.Search in Google Scholar

Gorba, Celia & Juli Cebrian. 2021. The role of L2 experience in L1 and L2 perception and production of voiceless stops by English learners of Spanish. Journal of Phonetics 88. 1–25. https://doi.org/10.1016/j.wocn.2021.101094.Search in Google Scholar

Guion, Susan G., James E. Flege, Reiko Akahane-Yamada & Jesica C. Pruitt. 2000. An investigation of current models of second language speech perception: The case of Japanese adults’ perception of English consonants. Journal of the Acoustical Society of America 107(5). 2711–2724. https://doi.org/10.1121/1.428657.Search in Google Scholar

Hattori, Kota & Paul Iverson. 2009. English /r/-/l/ category assimilation by Japanese adults: Individual differences and the link to identification accuracy. Journal of the Acoustical Society of America 125(1). 469–479. https://doi.org/10.1121/1.3021295.Search in Google Scholar

Institute of International Education. 2022. International students. Institute of International Education. Available at: https://opendoorsdata.org/data/international-students/.Search in Google Scholar

Kenworthy, Joanne. 1987. Teaching English pronunciation. London & New York: Longman.Search in Google Scholar

Kilpatrick, Alexander J., Rikke L. Bundgaard-Nielsen & Brett J. Baker. 2019. Japanese co-occurrence restrictions influence second language perception. Applied PsychoLinguistics 40(2). 585–611. https://doi.org/10.1017/S0142716418000711.Search in Google Scholar

Klein, Wolfgang. 1986. Second language acquisition. Cambridge, UK: Cambridge University Press.Search in Google Scholar

Kramsch, Claire. 2002. Beyond the second vs. foreign language dichotomy: The subjective dimensions of language learning. In Kristyan Spelman Miller & Paul Thompson (eds.), Unity and diversity in language use, 1–19. London, UK: Continuum.Search in Google Scholar

Labrune, Laurence. 2012. The phonology of Japanese. New York, NY: Oxford University Press.10.1093/acprof:oso/9780199545834.001.0001Search in Google Scholar

Ladefoged, Peter & Sandra Ferrari Disner. 2012. Vowels and consonants, 3rd edn. Malden, MA: Wiley-Blackwell.Search in Google Scholar

Lenth, Russell V., Ben Bolker, Paul Buerkner, Iago Giné-Vázquez, Maxime Herve, Maarten Jung, Jonathon Love, Hannes Riebl & Henrik Singmann. 2023. Estimated marginal means, aka least-squares means. CRAN. Available at: https://github.com/rvlenth/emmeans.Search in Google Scholar

Leung, Alex Ho-Cheong. 2012. Bad influence? – an investigation into the purported negative influence of foreign domestic helpers on children’s second language English acquisition. Journal of Multilingual and Multicultural Development 33(2). 133–148. https://doi.org/10.1080/01434632.2011.649038.Search in Google Scholar

Leung, Alex Ho-Cheong. 2014. Input multiplicity and the robustness of phonological categories in child L2 phonology acquisition. Concordia working papers in applied linguistics (Proceedings of the International symposium on the acquisition of second language speech) 5, 401–415. Montreal, Canada: Concordia University, Department of Education.Search in Google Scholar

Leung, Alex Ho-Cheong, Martha Young-Scholten, Wael Almurashi, Saleh Ghadanfari, Chloe Nash & Outhwaite Olivia. 2023. (Mis) perception of consonant clusters and short vowels in English as a foreign language. International Review of Applied Linguistics in Language Teaching 61(3). 731–764. https://doi.org/10.1515/iral-2021-0030.Search in Google Scholar

Littlewood, William T. 1984. Foreign and second language learning: Language-acquisition research and its implications for the classroom. Cambridge, UK: Cambridge University Press.Search in Google Scholar

Maddieson, Ian. 1984. Patterns of sounds. New York, NY: Cambridge University Press.10.1017/CBO9780511753459Search in Google Scholar

Maddieson, Ian & Karen Emmorey. 1985. Cross‐linguistic issues in the relationship between semivowels and vowels. Journal of the Acoustical Society of America 77(S1). S100. https://doi.org/10.1121/1.2022123.Search in Google Scholar

Mentcher, E. 1979. Teaching English to Russian students. ELT Journal 34(1). 47–52. https://doi.org/10.1093/elt/34.1.47.Search in Google Scholar

Mines, M. Ardussi, Barbara F. Hanson & June E. Shoup. 1978. Frequency of occurrence of phonemes in conversational English. Language and Speech 21(3). 221–241. https://doi.org/10.1177/002383097802100302.Search in Google Scholar

Nagase, Yoshiki. 1978. Two proposals for the I.P.A. Kawasaki Igakkaishi Liberal Arts & Sciences 4. 11–17. https://doi.org/10.11482/KMJ-LAS(4)11.Search in Google Scholar

Nagle, Charles L. 2018. Examining the temporal structure of the perception–production link in second language acquisition: A longitudinal study. Language Learning 68(1). 234–270. https://doi.org/10.1111/lang.12275.Search in Google Scholar

Nishi, Kanae, Strange Winifred, Reiko Akahane-Yamada, Rieko Kubo & Sonja A. Trent-Brown. 2008. Acoustic and perceptual similarity of Japanese and American English vowels. Journal of the Acoustical Society of America 124(1). 576–588. https://doi.org/10.1121/1.2931949.Search in Google Scholar

Oh, Yoon Mi. 2015. Linguistic complexity and information: Quantitative approaches. Lyon, Franch: University of Lyon Doctoral dissertation. Available at: http://www.ddl.cnrs.fr/fulltext/Yoonmi/Oh_2015_1.pdf.Search in Google Scholar

Oostendorp, Marc van. 1995. Vowel quality and phonological projection. Tilburg, The Netherlands: Tilburg University Doctoral dissertation.Search in Google Scholar

Padgett, Jaye. 2008. Glides, vowels, and features. Lingua 118(12). 1937–1955. https://doi.org/10.1016/j.lingua.2007.10.002.Search in Google Scholar

Qualtrics. 2022. Qualtrics. Provo, UT: Qualtrics. Available at: https://www.qualtrics.com.Search in Google Scholar

R Core Team. 2022. RStudio: Integrated development environment for R. Available at: http:/www.rstudio.org/.Search in Google Scholar

Raphan, Deborah & Michael Gertner. 1990. ESL and foreign language: A teaching and learning perspective. Research and Teaching in Developmental Education 6(2). 75–84.Search in Google Scholar

Saito, Kazuya & Kim van Poeteren. 2018. The perception–production link revisited: The case of Japanese learners’ English /ɹ/ performance. International Journal of Applied Linguistics 28(1). 3–17. https://doi.org/10.1111/ijal.12175.Search in Google Scholar

Sebastián-Gallés, Núria & Cristina Baus. 2005. On the relationship between perception and production in L2 categories. In Anne Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 279–282. Mahwah, NJ: Lawrence Erlbaum Associates.Search in Google Scholar

Silverman, Daniel. 2011. Schwa. In Marc van Oostendorp, Colin J. Ewen, Elizabeth Hume & Keren Rice (eds.), The Blackwell companion to phonology, 1–15. Oxford, UK: John Wiley & Sons.Search in Google Scholar

Thomson, Ron I. 2022. The relationship between L2 speech perception and production. In Tracey M. Derwing, Murray J. Munro & Ron I. Thomson (eds.), The Routledge handbook of second language acquisition and speaking, 372–385. New York, NY: Routledge.10.4324/9781003022497-32Search in Google Scholar

Tsujimura, Natsuko. 2014. An introduction to Japanese linguistics, 3rd edn. West Sussex, UK: John Wiley & Sons.Search in Google Scholar

Tyler, Michael D. 2019. PAM-L2 and phonological category acquisition in the foreign language classroom. In Anne Mette Nyvad, Michaela Hejná, Anders Højen, Anna Bothe Jespersen & Mette Hjortshøj Sørensen (eds.), A sound approach to language matters: In honor of Ocke-Schwen Bohn, 607–630. Aarhus, Denmark: Aarhus University.Search in Google Scholar

Tyler, Michael D. 2021. Perceived phonological overlap in second-language categories: The acquisition of English /r/ and /l/ by Japanese native listeners. Languages 6(4). 1–23. https://doi.org/10.3390/languages6010004.Search in Google Scholar

Vance, Timothy J. 1987. An introduction to Japanese phonology. Albany, NY: State University of New York Press.Search in Google Scholar

Vance, Timothy J. 2008. The sounds of Japanese. New York, NY: Cambridge University Press.Search in Google Scholar

VanPatten, Bill, Trisha Dvorak & James F. Lee (eds.). 1987. Foreign language learning: A research perspective. Rowley, MA: Newbury House.Search in Google Scholar

Woods, Kevin J. P., Max H. Siegel, James Traer & Josh H. McDermott. 2017. Headphone screening to facilitate web-based auditory experiments. Attention, Perception, & Psychophysics 79(7). 2064–2072. https://doi.org/10.3758/s13414-017-1361-2.Search in Google Scholar

Zhou, Wei William & Mineharu Nakayama. 2023. Perception of English semivowels by Japanese L2 English listeners. In Yuichi Ono & Masaharu Shimada (eds.), Data science in collaboration 6, 48–57. Tsukuba, Japan: Media JOHO., Ltd.Search in Google Scholar

Received: 2023-11-08
Accepted: 2024-07-09
Published Online: 2024-08-20

© 2024 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 7.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/iral-2023-0285/html?srsltid=AfmBOopNKk0QhzChxwP6jN1ZBq22SCATgN-jBBQCyOdPQAv0JGb7Sttx
Scroll to top button