Cross-language perception of the Japanese singleton/geminate contrasts: comparison of Vietnamese speakers with and without Japanese language experience

Kimiko Tsukada; Đích Mục Đào; Trang Thi Huyen Le

doi:10.1515/phon-2025-0025

Article Open Access

Cross-language perception of the Japanese singleton/geminate contrasts: comparison of Vietnamese speakers with and without Japanese language experience

, and

Published/Copyright: November 10, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Phonetica Volume 82 Issue 6

Abstract

We examined the perception of Japanese consonant length by three groups of Vietnamese speakers and a group of 10 Japanese speakers. Two of the Vietnamese groups consisted of learners of Japanese with one group participating in Vietnam (n = 17) and the other in Japan (n = 13). The third Vietnamese group consisted of 12 participants inexperienced in Japanese. Unlike Japanese, consonant length is non-contrastive in Vietnamese. Thus, we were interested in how different experience with Japanese may influence the perception of difficult Japanese contrasts. The overall mean discriminability in d-prime was 1.0, 1.9, 3.1 and 4.5 for the non-learner group, the learner group in Vietnam, the learner group in Japan and the native Japanese group, respectively. A clear difference between the two learner groups demonstrates learnability of Japanese consonant length for grownups. At the same time, the qualitative difference between the advanced learners and native Japanese speakers suggests genuine and persistent difficulty of Japanese consonant length. By providing additional empirical data beyond the segmental level, this study helps us to better evaluate the extent to which current theories of second language (L2) speech learning account for the acquisition of a wide range of L2 sounds by speakers from diverse first language backgrounds.

Keywords: cross-language speech perception; Japanese; consonant length; short/singleton versus long/geminate; Vietnamese

1 Introduction

1.1 Background

Japanese is a quantity language that uses durational variation contrastively for both vowels and consonants (e.g., Fujisaki et al. 1973; Vance 2008). For example, koko ‘individual, separate’ contrasts with kooko ‘finance corporation’ on the one hand and with kokko ‘national treasury’ on the other hand. While this is essential for efficient communication, it is difficult to learn to perceive and produce the durational contrast especially for non-native speakers from diverse linguistic backgrounds including Vietnamese, which is the target group of this study (e.g., Đỗ 2012, 2015; Sugimoto 2003, 2005, 2007 for perception; Kanamura 1999; Matsuda et al. 2018; Yamakawa et al. 2022 for production). Many studies on durational contrasts in second language (L2) Japanese have been conducted (e.g., Han 1992; Hirata 2004, 2015; Hirata et al. 2007; Tsukada et al. 2018; Tsukada and Hajek 2023) confirming clear and consistent differences between native and non-native speakers. However, valuable findings from some of the previous studies focusing on Vietnamese learners of Japanese are available mostly in Japanese and have not yet reached a wider audience in the field.

Recently, the number of Vietnamese learners of Japanese has been rapidly increasing (e.g., Hashimoto 2022). As of 2021, Vietnam is ranked within the top 6 countries/regions (169,582 or 4.5 % of the entire 3,794,714 learners) of the world in terms of the number of learners of Japanese (The Japan Foundation 2021). Within Japan, Vietnam (31,643 or 14.4 % of the entire 219,808 learners) is the second largest country of origin of non-native learners of Japanese after China (67,027 or 30.5 %) as of 2022 (Agency for Cultural Affairs, Government of Japan 2022). As such, improving our current understanding of how to facilitate communication and pronunciation pedagogy for Vietnamese learners of Japanese is crucial. Given that limited studies have focused on this emergent group of Vietnamese speakers, our aims are (1) to add to the existing literature investigating the role of prior linguistic experience by comparing three groups of native Vietnamese speakers differing in Japanese experience and a control group of native Japanese speakers, and (2) to determine how they may differ in the perception of Japanese consonant length contrasts (i.e., short/singleton vs long/geminate) on account of their experience with Japanese. Below we first provide a brief description of phonetic and phonological characteristics of consonant length contrasts in Japanese. We also briefly describe characteristics of the Vietnamese sound system. Then we review the relevant literature on L2 processing of consonant length.

1.2 Length contrasts in Japanese

As mentioned above, Japanese uses durational variation contrastively for both vowels and consonants. The difference in length (e.g., kite ‘wearing’ vs k ii te ‘listening’ vs ki tt e ‘stamp’) affects the lexical meaning as does the difference in the vowel type (e.g., k i te ‘wearing’ vs k a te ‘nourishment’) or consonant type (e.g., ki t e ‘wearing’ vs ki r e ‘cloth’). While segmental duration is consistently reported to be the primary (though not the only) acoustic cue to differentiate the short and long members of the length contrast in Japanese (e.g., Fujisaki et al. 1973; Kawahara 2015), non-durational cues including fundamental frequency (F0), intensity, voice quality among others have also been examined in various studies (e.g., Idemaru and Guion 2008; Kubozono et al. 2011). Due to the frequent occurrence (e.g., Tamaoka and Makioka 2004) and critical role in determining the meaning, from the very beginning, learners of Japanese are typically exposed to length contrasts both auditorily via instructors’ pronunciation and visually via the explicitly marked orthography (i.e., kana syllabaries). While the difference in vowel/consonant categories is primarily realized at the segmental level via the spectral cues, the difference in length categories is primarily realized at the suprasegmental level via the temporal cues.

Of relevance to the present study, numerous researchers (e.g., Kawahara 2015; Vance 2008) examined phonetic and phonological characteristics of Japanese singletons and geminates. In a mora-timed language such as Japanese, word duration covaries with the number of moras (e.g., Homma 1981). While a geminate sound counts as one mora, it does not form a syllable. It has been suggested that length contrasts may differ phonetically in mora-timed (e.g., Japanese) and syllable-timed (e.g., Italian) languages (Ham 2001; Idemaru and Guion 2008). Specifically, while vowels preceding geminates are known to be shorter (by up to 37 %) than vowels preceding singletons in quantity languages such as Italian (e.g., Esposito and Di Benedetto 1999; Hajek et al. 2007) among others, the exact opposite has been reported for Japanese. In other words, in Japanese, vowels tend to be phonetically longer before geminates than before singletons (e.g., Han 1992; Hussain and Shinohara 2019; Idemaru and Guion 2008; Kawahara 2015).

1.3 Characteristics of the Vietnamese sound system

The participants’ first language (L1), Vietnamese, is a tonal language with five (for the Southern dialect) or six (for the Northern dialect) lexical tones (e.g., Brunelle 2009; Đỗ 2015; Hwa-Froelich et al. 2002; Nguyễn 2009; Phạm and McLeod 2016). The robustness of voice quality in signaling lexical tones has been reported for some dialect speakers in previous research (e.g., Brunelle 2009; Brunelle and Kirby 2016; Hwa-Froelich et al. 2002; Michaud 2004; Phạm and McLeod 2019). Vietnamese is generally classified as a monosyllabic (e.g., Do-Hurinville and Dao 2015; Hwa-Froelich et al. 2002; Thompson 1988; Verdenschot et al. 2022), syllable-timed language (e.g., Kanamura 1999; Nguyễn et al. 2008; Phạm and McLeod 2019; Sugimoto 2003, 2005, 2007; Yamakawa et al. 2022; Yin and Wang 2021) and differs from Japanese, which is classified as a mora-timed language with lexical pitch accents (e.g., Goss and Tamaoka 2019; Vance 2008).

Vowel length contrasts are reported to be limited (e.g., Kirby 2011; Pajak and Levy 2014) or non-existent (Đỗ 2012, 2015) in Vietnamese. Vietnamese learners of Japanese have been shown to have difficulty perceiving and/or producing Japanese vowel length in various studies (e.g., Đỗ 2012, 2015; Sugimoto 2005; Yin and Yasuhara 2020). A recent study (Le et al. in press) focusing on the perception of Japanese vowel length reported that only advanced (but not lower proficiency) Vietnamese learners modestly resembled native Japanese speakers in their use of duration and F0 cues. Crucially, however, unlike Japanese, consonant length is not contrastive in Vietnamese either at the morpheme level or the word level (e.g., Đỗ 2015; Do-Hurinville and Dao 2015; Kanamura 1999; Pajak and Levy 2014; Phạm and McLeod 2016). Not surprisingly, perceiving and producing singleton/geminate contrasts in the context of voiceless consonants has been reported to be problematic for Vietnamese learners of Japanese (Sugimoto 2007).

We present data on the discrimination of Japanese consonant length (i.e., short/singleton vs long/geminate) by three groups of native Vietnamese speakers differing in Japanese experience as mentioned above. While previous research involving Vietnamese learners of Japanese in Hanoi (e.g., Sugimoto 2003, 2005, 2007) tends to be subjectively observational by the authors, the present study provides empirical data via speech perception experiments.

1.4 Review of literature

1.4.1 Positive effects of L2 experience

Positive effects of L2 experience (e.g., length of residence (LOR), length of L2 learning, amount of L2 use) have been reported in many studies for various L2s and for various types of sounds including contrastive length (e.g., Finnish: Ylinen et al. 2005; Italian: Altmann et al. 2012; Japanese: Hardison and Motohashi Saigo 2010; Hayes-Harb 2005; Hayes-Harb and Masuda 2008; Tsukada and Hajek 2023; Tsukada and Yurong 2022). Of relevance to the present study, Hayes-Harb (2005) examined the perception of Japanese consonant length by English speakers with varying degrees of experience with Japanese. Native English speakers who have studied Japanese for up to 1 year were found to be more target-like (or categorical) in identifying the length categories of the Japanese consonants (/t/-/tː/, /k/-/kː/, /s/-/sː/)^[1] than monolingual English speakers without Japanese experience. At the same time, due to lack of robust length categories, both groups of non-native speakers were more variable than the control group of native Japanese speakers.

In another study focusing on length contrasts, Ylinen et al. (2005) found that the Russian speakers with a longer (more than 5 years) LOR in Finland were more target-like (albeit still distinguishable from native speakers) than those with a shorter (less than 5 years) LOR in their categorization of vowel length contrasts in Finnish. However, there was no difference between the Russians with a shorter LOR and non-Finnish-speaking Russians for the category boundary locations. As for the steepness of the categorization functions, the group with a shorter LOR was even less target-like than the group of Russians naïve to Finnish. However, in the follow-up experiment, the two (non-Finnish-speaking, inexperienced L2 learner) Russian groups did not differ from each other in either categorization or discrimination tasks, and the native Finnish group had a steeper categorization function than the combined Russian group. As the authors suggested, the learner and non-learner groups may have differed in their use of an L1-based strategy, which resulted in the difference of categorization consistency of Finnish vowels. This is because only the learners, regardless of experience or proficiency, “were aware of the differences between the L1 and L2 phonological systems” (Ylinen et al. 2005). It may be the case that non-native learners need to be sufficiently experienced to show an advantage over non-learners for between-group differences to emerge.

In Altmann et al. (2012), the proficient (advanced) German learners of Italian who studied at the university level for more than 11 months were more sensitive to Italian consonant length than the native German speakers without Italian experience. In a speeded same-different discrimination task, the learner group perceived the contrasts consisting of tokens with the different length categories more accurately than the non-learners, but both German groups were less accurate than the Italian group. The three groups did not differ in discrimination accuracy and were highly accurate when they responded to the contrasts consisting of tokens with the same length categories. Based on the findings comparing the processing of German vowel length contrasts and Italian consonant length contrasts, the authors concluded that “consonantal length contrasts remain a challenge even after prolonged exposure to (and increased proficiency in) a language that employs such contrasts”.

1.4.2 No/limited effects of L2 experience

No/limited effects of L2 experience have also been reported in some studies that focused on length contrasts (e.g., Estonian: Leppik et al. 2018, 2019, 2020; Italian: Feng and Busà 2022; Japanese: Đỗ 2012, 2015; Lee and Mok 2018; Swedish: McAllister et al. 2002). Of direct relevance to the present study, Đỗ (2012, 2015) conducted a large-scale study involving 533 Vietnamese learners of Japanese focusing on the identification of long vowels and geminate consonants. The learners aged between 18 and 24, 98 % of whom had never stayed in Japan, consisted of 192 second-year, 183 third-year and 158 fourth-year students at 4 universities in Vietnam. The influence of Japanese learning experience was limited and the learners generally had a tendency to misidentify short sounds as long sounds, in particular, in word-final position. According to Lee and Mok (2018), Cantonese-speaking learners of Japanese at both beginner (first year) and advanced (last year) levels of proficiency were less accurate than the native Japanese speakers in producing vowel and consonant length contrasts whilst not substantially differing from each other. This is despite the advanced learners having the experience of staying in Japan for one year as exchange students. Both groups of learners were in the same BA Japanese Studies Program at the Chinese University of Hong Kong and controlled “for proficiency in terms of formal instruction input (i.e., year group)”. However, it is unclear if there were any between-group differences in pronunciation instruction in the classroom. Typically, whether pronunciation instruction is emphasized or not depends on each instructor and is difficult to quantify accurately.

In a recent study targeting Italian, Feng and Busà (2022) reported that three groups of Mandarin-speaking learners differing in their Italian experience (i.e., first-, second- and third-year university students majoring in Italian at a university in China) were all less accurate in their identification and production of Italian consonant length than the native control group whilst not differing from one another. Further, McAllister et al. (2002) reported that English and Spanish learners of Swedish who differed in their experience with Swedish did not significantly differ in their perception of Swedish vowel length contrasts which are cued by spectral as well as durational variations. Similarly, in a more recent study examining the categorization of Estonian three-way vowel length contrasts (Leppik et al. 2018), experience with Estonian (as indexed by the duration of study of Estonian and LOR in Estonia) did not positively affect Spanish learners’ performance.

Whether learners measurably benefit from their increasing L2 experience or not may depend on the tasks performed by the participants and/or the stimuli (i.e., specific speech materials) used in the studies. Intuitively, it might be expected that as learners gain greater experience in the L2, they become more proficient and target-like in various aspects of their L2. However, while higher proficiency may presuppose greater experience, greater experience may not necessarily lead to higher proficiency, as it is not straightforward to objectively quantify one’s experience in the L2 and/or to determine what kind of L2 experience is necessary to attain target-like proficiency. There is also a suggestion that learners’ levels of proficiency as opposed to experience may be more closely related to the results obtained in an experimental task (e.g., Flege and Eefting 1987). If so, it is not surprising that the results from L2 speech learning research which compared more versus less experienced learners are mixed as reviewed above.

1.5 Acquisition of durational variations within the theoretical framework of L2 phonetics and phonology

Prominent models of L2 speech learning such as an L2 version of the Perceptual Assimilation Model (PAM-L2) (Best and Tyler 2007), the Speech Learning Model (SLM) (Flege 1995) and its successor, the Revised Speech Learning Model (SLM-r) (Flege and Bohn 2021) all focus on the learning of L2 segments (i.e., vowels and consonants). Unlike segmental contrasts signaled by vocalic or consonantal variations which are found in all languages, length contrasts signaled by durational variations are found only in some, but not all, languages. For native Japanese speakers, durational variations are contrastive (phonological) and function at both segmental and suprasegmental levels. For non-native learners without phonological representations for length in their L1, one or both members of the length contrasts may be classified as “new” sounds (Flege 1995; Flege and Bohn 2021). Their perception of durational variations is likely to be unstable, i.e., initially continuous/phonetic and gradually developing to become discrete/phonological as they become experienced in Japanese. Extending theories of L2 speech learning such as the SLM-r (Flege and Bohn 2021) from segmental to durational variations, Vietnamese speakers, for whom geminates are non-existent, may be expected to have no preexisting L1 phonetic categories which interfere with the formation of new phonetic categories for L2 geminates.

However, as pointed out by Best (2019) for the perception of pitch variations by speakers whose L1s do not employ lexical tones, it is important to recognize that non-tonal language speakers “are not lacking entirely in experience with phonological information being conveyed by pitch variations”. The same logic may apply to “false” (aka fake, post-lexical, concatenated) geminates (e.g., cat tail, Lee et al. 2023) which occur in many languages even though they may not always transfer positively to L2.

Further, according to the “full access hypothesis” adopted by the Revised Speech Learning Model (SLM-r) (Flege and Bohn 2021), L2 learners can gain access to the L2 sounds which are defined, at least in part, by features not used in the learner’s L1. Due perhaps to the fact that (1) all languages make use of durational variations for linguistic or non-linguistic purposes and that (2) length categories are usually fewer in number compared to segmental categories, durational cues may be accessible and learnable to speakers of any language. Thus, given that singleton versus geminate contrasts are primarily based on durational variations which all languages make use of at least to some extent, with extensive experience, learners may ultimately be able to process them successfully. The present study aims to contribute to the current L2 speech learning literature by further broadening target sounds and learner populations.

1.6 Present study

The main objective of the present study was to evaluate the effects of Japanese language experience on native Vietnamese speakers’ perception of Japanese short/singleton versus long/geminate contrasts. As pointed out by some researchers (e.g., Altmann et al. 2012; Hallé et al. 2016), previous L2 research on consonant length contrasts mostly focused on non-native learners of Japanese from English (e.g., Hardison and Motohashi Saigo 2010; Hayes-Harb 2005; Tsukada et al. 2018), Korean (e.g., Sonu et al. 2013; Tsukada et al. 2018) or, more recently, Cantonese (e.g., Lee and Mok 2018; Lee et al. 2023) backgrounds. We aim to increase our current understanding on contrastive length perception by focusing on the learner population from a different L1 background that has been substantially growing. Furthermore, due to recruitment difficulties, research involving highly advanced L2 learners in languages other than English is still limited. We hope to fill this gap by examining if and how advanced Vietnamese learners perceive Japanese singleton/geminate contrasts.

The questions addressed in this study are as follows. Firstly, is there a difference across four (three Vietnamese and one Japanese) groups of participants in their perception of Japanese consonant length? Specifically, do the two groups of learners, one in Vietnam (less experienced/proficient) and the other in Japan (more experienced/proficient), differ from each other on the one hand and from the native Japanese speakers on the other hand? We predict that the Vietnamese groups are less accurate than the native Japanese group, because Vietnamese does not use consonant length contrastively and Vietnamese speakers have less experience with consonant length contrast than native Japanese speakers. In addition, we predict that the advanced learners in Japan are more accurate than the intermediate learners in Vietnam, because the former has much higher proficiency of Japanese and much longer LOR in Japan than the latter. Our second research question was to examine if the place of articulation effect which we observed with speakers from diverse quantity and non-quantity L1 backgrounds (e.g., Hamzah et al. 2023; Tsukada and Hajek 2022; Tsukada and Yurong 2023) also applies to the Vietnamese speakers. Specifically, in many cases, length discrimination was more accurate for the alveolars than for the velars. If this finding is replicated with our Vietnamese participants, it may provide further evidence that this place effect is a language-general phenomenon.

2 Methods

2.1 Speakers and speech materials

The experimental stimuli and procedures were identical to those used in our previous research (e.g., Hamzah et al. 2023; Tsukada and Hajek 2022; Tsukada and Yurong 2023). Six (3 males, 3 females) native Japanese speakers participated in the recording sessions, which lasted between 45 and 60 min. The speakers’ age ranged from late twenties to early forties. According to self-report, which was confirmed by the first author who is a native Japanese speaker originally from Tokyo, all speakers spoke standard Japanese, having been born or having spent most of their life in the Kanto region surrounding the Greater Tokyo Area. The speakers were recorded in the recording studio at the National Institute of Japanese Language and Linguistics, Tokyo.

Table 1 shows 12 Japanese real word pairs used in this study. Each word pair had the same pitch accent patterns. However, some word pairs had a different accent pattern from the other pairs (i.e., High-Low (HL) for ka t o/ka tt o, ma t e/ma tt e, sa t e/sa tt e, i k a/i kk a, and ka k o/ka kk o and LH for the rest of the word pairs). Only words with intervocalic stops were considered in this study. As voiced geminates are disfavoured and their occurrence is limited mostly to loanwords in Japanese (e.g., Hussain and Shinohara 2019; Kawahara 2015; Sano 2019), only voiceless stops (/t, k/) were used (see also Lee and Mok 2018). Of all the Japanese consonants, these two stops occur most frequently both as singletons and geminates (Tamaoka and Makioka 2004). Further, voiceless bilabials are limited to loanwords and onomatopoeic words in Japanese (e.g., Tsujimura 2013).

Table 1:

Twelve pairs of Japanese real words used with target sounds underlined and bolded.

	Singleton (S)		Geminate (G)
/t/	he t a	‘unskilled’	he tt a	‘decreased’
	ka t o	‘transient’	ka tt o	‘cut’
	ma t e	‘wait’	ma tt e	‘waiting’
	o t o	‘sound’	o tt o	‘husband’
	sa t e	‘well, then’	sa tt e	‘leaving’
	wa t a	‘cotton’	wa tt a	‘broke’
/k/	a k e	‘open’	a kk e	‘appalled’
	ha k a	‘grave’	ha kk a	‘mint’
	i k a	‘below’	i kk a	‘lesson one’
	ka k o	‘past’	ka kk o	‘parenthesis’
	sa k a	‘slope’	sa kk a	‘author’
	shi k e	‘rough sea’	shi kk e	‘humidity’

On average, the closure durations were 96 ms and 262 ms for singletons and geminates, respectively. The geminate-to-singleton ratios were 2.7 for alveolars (/t/-/tː/) and 2.8 for velars (/k/-/kː/), respectively. These durational values are generally in good agreement with what has been reported in previous research (e.g., Han 1992; Hayes-Harb 2005) (see, however, Sano 2019 for alveolars).

2.2 Participants

Three groups of native Vietnamese speakers participated in this study. The first group of 12 (6 males, 6 females, mean age = 21.0 years, sd = 2.3) consisted of university students in Ho Chi Minh City and had no Japanese experience (VTN). Seven of them were born in Ho Chi Minh City and the rest in other locations in the South region of the country (e.g., Ben Tre, Dong Nai, Kien Giang, Tay Ninh, Tien Giang). Although this diversity is undesirable, it was not practical to tightly control the participants’ dialectal variations in this study. In each of the dialects, the durational contrast for consonants is not contrastive. The next two groups, one living in Ho Chi Minh City, Vietnam (7 males, 10 females, mean age = 21.5 years, sd = 4.4) (VTI) and the other in Tokyo, Japan (5 males, 8 females, mean age = 28.6 years, sd = 5.1) (VTA) were learners of Japanese. In addition to the countries of residence, the two learner groups substantially differed in their Japanese proficiency based on the Japanese-Language Proficiency Test (JLPT), according to which the easiest level is N5 and the most difficult level is N1 (Nishizawa et al. 2022).

The 13 VTA learners had all passed the JLPT at N1 (roughly equivalent to CEFR (Common European Framework of Reference for Languages) C1) (The Japan Foundation/Japan Educational Exchanges and Services 2025) and were considered highly advanced learners of Japanese. They participated in the study at a university in Tokyo. Broadly speaking, their places of origin in Vietnam were 7 from North, 1 from Central and 5 from South. Their mean LOR in Japan was 8.4 years (sd = 3.4) at the time of participation. As for their self-reported use of Japanese in Japan, six of them reported more frequent use of Japanese, four reported more frequent use of Vietnamese and the rest reported using both languages with comparable frequency. Three learners reported using Japanese always. They started learning Japanese at the mean age of 17.4 years (sd = 4.1). The learners’ Japanese language learning backgrounds varied somewhat. However, most of them reported learning Japanese intensively at a language school in Japan. All reported having native Japanese-speaking teachers. As mentioned above, whether pronunciation instruction is emphasized or not is typically left up to each instructor’s discretion and, therefore, difficult to quantify accurately. Nevertheless, taken together, compared to the nominally advanced learners included in previous research (e.g., Hardison and Motohashi Saigo 2010; Lee and Mok 2018), it is expected that the VTA learners in the present study were more experienced as well as immersed in the Japanese-speaking environment.

The 17 VTI learners were students at a university in or around Ho Chi Minh City and were at the JLPT N3 level (roughly equivalent to CEFR B1) (The Japan Foundation/Japan Educational Exchanges and Services 2025), which requires “the ability to understand Japanese used in everyday situations to a certain degree”. Except for two participants, all were studying Japanese in the same program at the same university and were considered (pre-)intermediate learners. Nine of them were born in Ho Chi Minh City and the rest in the South/Central regions of the country (e.g., Ben Tre, Dong Nai, Gia Lai, Khanh Hoa, Kien Giang, Nha Trang, Quang Nam). None of them had lived overseas for an extended period of time (i.e., more than 6 months).

Unfortunately, it is difficult to objectively measure and/or control learners’ levels of proficiency in L2 speech research due to large individual variations and different curricula at different educational institutions in different countries. While we used the JLPT as an index of the Vietnamese participants’ proficiency in Japanese, this is not intended to imply that it is the optimal measure of non-native learners’ Japanese experience or capability.

The Vietnamese participants’ results were compared to that of a control group of 10 (2 males, 8 females) native Japanese (NJ) speakers (mean age = 21.0 years, sd = 0.8) who were students at a university in the US where the first author was based at that time. All NJ speakers were from geographical areas where no absence or neutralization of consonant length contrast has been reported. Their mean LOR in the US was 0.4 years (sd = 0.22) at the time of participation. None of the NJ speakers participated in the recording sessions. According to self-report, all four groups of participants had normal hearing.

All participants were tested individually in a session lasting approximately 30–40 min in a sound-attenuated laboratory or a quiet room at their own university. The experimental session was self-paced. The participants heard the stimuli at a self-selected, comfortable amplitude level over the high-quality headphones on a computer.

2.3 Procedure

The 12 pairs of words in Table 1 containing singleton (S) or geminate (G) consonant intervocalically (underlined and bolded) were arranged in triads in four patterns (S-S-G, S-G-G, G-S-S, and G-G-S, 48 trials each). Each pair was presented 16 times (4 times using different combinations of tokens for each of the four patterns), resulting in a total of 192 trials. This was preceded by eight practice trials, which were not analyzed.

The participants completed a two-alternative forced-choice AXB discrimination task, in which they were asked to listen to trials arranged in a triad (A-X-B). The presentation of the stimuli and the collection of perception data were controlled by the PRAAT program (Boersma and Weenink 2016). In the AXB task, the first (A) and third (B) tokens always came from different length categories, and the participants had to decide whether the second token (X) (i.e., “word” in a language which may be unfamiliar to them) belonged to the same category as A (e.g., kato₂ – kato₁ – katto₃) or B (e.g., kako₁ – kakko₃ – kakko₂; where the subscripts indicate different speakers).

The participants listened to a total of 200 trials. The first eight trials were for practice and were not analyzed. The three tokens in all trials were spoken by three different speakers, who appeared equiprobably in each within-AXB position. Thus, X was never physically identical to either A or B. This was to ensure that the participants focused on relevant phonetic characteristics that group two tokens as members of the same length category without being distracted by audible but phonemically (or phonologically) irrelevant within-category variation (e.g., in voice quality). This was considered a reasonable measure of participants’ perceptual capabilities in real world situations (Strange and Shafer 2008).

The participants were given two (‘A’, ‘B’) response choices on the computer screen. They were asked to select the option ‘A’ if they thought that the first two tokens in the AXB sequence were the same and to select the option ‘B’ if they thought that the last two tokens were the same. No feedback was provided during the experimental sessions. The participants could take a break after every 50 trials if they wished. The participants were required to respond to each trial, and they were told to guess if uncertain. A trial could be replayed as many times as the participants wished to reduce their anxiety, but responses could not be changed once given. The interstimulus interval in all trials was 0.5 s.

3 Results

We used R version 4.5.1 (R Core Team 2025) for statistical analyses and data visualization reported below. The packages used include ez (Lawrence 2016) and tidyverse (Wickham et al. 2019). As concern was raised that discrimination accuracy averaged across trials where the target tokens were singletons as opposed to geminates may include listeners’ response bias, the participants’ raw responses (%) were used to calculate a sensitivity (or discriminability) index (d′) by following bias correction procedures (Macmillan and Creelman 2005). The scores were based on the proportion of “hits” and the proportion of “false alarms” obtained for each length contrast. The highest d′ score was set to 4.65.

3.1 Overall

The overall mean d′ scores were 1.0, 1.9, 3.1 and 4.5 for the VTN group, two groups of learners (VTI, VTA) and the control NJ group, respectively. Figure 1 shows overall d′ scores for the four groups of participants. The NJ group was at ceiling with little individual variation. The two groups of learners clearly outperformed the non-learners in perceiving Japanese consonant length contrasts. Regardless of Japanese experience, all Vietnamese groups were apparently much more variable than the NJ group. It is noteworthy that two non-learners achieved the d′ scores of 2.1 and 2.6, respectively. These scores exceed the mean score of the VTI group and are well within the range (1.52–4.65) set by the VTA group. While none of the 12 non-learners reached the range of d′ scores (3.77–4.65) set by the NJ group, 5 Vietnamese learners (4 in VTA, 1 in VTI) reached the NJ range.

Figure 1:

Overall discrimination index (d′) by four groups of participants. The horizontal line and the red circle in each box indicate the median and mean, respectively. The bottom and top of the box indicate the first and third quartiles. The horizontal position of each empty dot represents randomly added variation to reduce overplotting.

Levene’s test for homogeneity of variance was significant [F(3, 48) = 4.8, p < 0.05] when all four groups were included, but non-significant [F(2, 39) = 0.86, p = 0.43] when the NJ group was excluded, suggesting that homogeneity of variance can only be assumed for the three Vietnamese groups. This qualitative as well as quantitative difference is clearly seen in Figure 1. Based on the results of Levene’s test, between-group comparisons were conducted non-parametrically including the NJ group and parametrically via repeated-measures analysis of variance (ANOVA) excluding the NJ group, respectively.

First, a non-parametric Kruskal-Wallis test with Group (VTN, VTI, VTA, NJ) reached significance [χ²(3) = 34.4, p < 0.001, η² = 0.65]. Dunn’s test for multiple comparisons (Bonferroni adjusted) showed that the NJ group outperformed the VTN and VTI groups [p < 0.001], but did not significantly differ from the VTA group.

Next, we ran a one-way repeated-measures ANOVA excluding the NJ group to determine if the three Vietnamese groups differed in their Japanese consonant length discrimination. The results reached statistical significance [F(2, 39) = 15.6, p < 0.001, η G 2 = 0.44]. According to the post-hoc t-tests, all between-group differences were statistically significant [t(22.3–26.6) = −5.8 – −2.8, p < 0.05].

3.2 Comparison of length contrast discrimination at alveolar (/t/-/tː/) and velar (/k/-/kː/) places of articulation

Next, we examined if the overall pattern of results remained the same for different places of articulation. Table 2 and Figure 2 show d′ scores for the four groups of participants when the place of articulation (Alveolar, Velar) of the target token (i.e., X in the AXB sequence) was taken into consideration. On average, all four groups performed better when the alveolar rather than velar stops occurred in the target position (VTN: 1.0 vs 0.9, VTI: 2.1 vs 1.8, VTA: 3.5 vs 2.9, NJ: 4.65 vs 4.3 for Alveolar and Velar, respectively). The NJ group never misperceived the length categories when the target stop was alveolar. Thus, this group was excluded in the following analysis.

Table 2:

Mean discrimination index (d′) by four groups of participants for trials differing in the place of articulation (Alveolar, Velar) of the target token. Standard deviations are in parentheses.

Group	Alveolar	Velar
VTN	1.0 (0.8)	0.9 (0.8)
VTI	2.1 (1.3)	1.8 (1.0)
VTA	3.5 (1.1)	2.9 (1.1)
NJ	4.7 (0)	4.3 (0.5)

Figure 2:

Discrimination index (d′) by four groups of participants for trials differing in the place of articulation (Velar, Alveolar) of the target token. The horizontal line and the red circle in each box indicate the median and mean, respectively. The bottom and top of the box indicate the first and third quartiles. The light lines connect individual participants’ scores.

Two-way repeated-measures ANOVA with Group (VTN, VTI, VTA) and Place (Alveolar, Velar) reached significance for the main effects of Group [F(2, 39) = 15.7, p < 0.001, η G 2 = 0.42] and Place [F(1, 39) = 10.2, p < 0.01, η G 2 = 0.03] only. The overall pattern of between-group differences seen in Figure 1 was retained in Figure 2. According to the post-hoc t-tests, all between-group differences were significant except for the comparison between the VTN and VTI groups for the target velar. Although all groups performed slightly better for the alveolar than for the velar contrasts on average, the difference in the place of articulation did not reach statistical significance perhaps due to large variability especially for the VTA and VTN groups.

4 Discussion

This study examined the perception of short/singleton versus long/geminate consonants in Japanese by three groups of native Vietnamese speakers differing in Japanese experience and a control group of NJ speakers. We reasoned that the cross-linguistic phonetic characteristics summarized in the Introduction section would make it difficult for Vietnamese speakers to perceive Japanese singleton/geminate contrasts at least initially.

Our main findings were: (1) clear and expected between-group difference based on Japanese experience and (2) different influence of the place of articulation on the discrimination of the singleton/geminate length contrasts. Despite the lack of consonant length at both the word and morpheme levels^[2] (e.g., Do-Hurinville and Dao 2015; Phạm and McLeod 2016) in their L1 Vietnamese, the VTA learners were more successful in their perception of Japanese singleton/geminate contrasts and resembled the NJ speakers to a greater extent than the Vietnamese speakers who were less experienced in Japanese. At the same time, the qualitative difference shown in homogeneity of variance between the learners including the advanced ones and the NJ speakers suggests genuine and persistent difficulty of Japanese consonant length.

4.1 Between-group difference: advanced learners’ performance

Whilst not completely target-like, the VTA learners’ performance was closer to that of the NJ speakers than to that of the VTI or VTN speakers. Undoubtedly, the VTA learners in this study benefitted from L2 Japanese learning unlike the learners in some previous research (e.g., Đỗ 2012, 2015; Lee and Mok 2018).

We speculate that the VTA learners’ superior performance may be due to various reasons. One possibility is that not only the cross-language speech perception skills, but high frequencies of occurrence of the target sounds (i.e., /k/ and /t/) in the Japanese lexicon (Tamaoka and Makioka 2004) and/or orthography (i.e., kana syllabaries) that explicitly marks length contrasts, i.e., auditory and/or visual reinforcement in classrooms, may have facilitated the VTA learners to a greater extent than the VTI or VTN speakers (see Alarifi and Tucker 2024 for the role of orthography in non-native vowel length learning). This possibility needs to be verified by examining how participants from quantity L1 backgrounds who are familiar with consonant length in their L1 (e.g., Arabic, Finnish, Italian), but unfamiliar with Japanese might perform in the same task.

The VTA learners’ success may also be related to how Japanese geminates are related to the Vietnamese sound system. Presumably, geminate sounds are non-existent and “new” to native Vietnamese speakers. If so, they should be free from “equivalence classification” (e.g., Flege 1987). Thanks, perhaps, to the absence of preexisting L1 phonetic categories which might interfere with the formation of new phonetic categories for L2 geminates as proposed in the SLM-r (Flege and Bohn 2021), the VTA learners with sufficient exposure to Japanese may have been able to learn to perceive Japanese consonant length successfully.^[3]

Alternatively, the VTA learners’ advantage may be more to do with a combination of their extensive experience in Japanese and an extended stay in Japan, i.e., quality and/or quantity of L2 Japanese input. Indeed, the VTA learners in this study had a much longer LOR of 8.4 years in Japan compared to the advanced learners (LOR of 1 year) in Lee and Mok (2018). However, factors such as L2 experience and LOR may be ambiguous and unreliable as a predictor of L2 pronunciation. Thus, it would be necessary to search for measures other than LOR that can accurately characterize learners’ capacity. For example, in the SLM-r, Flege and Bohn (2021) used full-time equivalent (FTE) years of L2 input by multiplying years of residence in the target country by self-reported proportion of target language use as possibly “a somewhat better estimate of quantity of L2 input than LOR alone (p. 7, also p. 32)”. However, as they point out themselves, “it says nothing regarding the quality of L2 input (p. 7)”. Given that “the quality of L2 input may matter more than the age of first exposure to an L2 (Flege and Bohn 2021, p. 8)”, securing the quality as well as quantity of L2 input may remain a pedagogical challenge especially for those learning Japanese outside Japan.

4.2 Influence of the place of articulation on consonant length discrimination

Regardless of Japanese experience, all groups discriminated consonant length slightly better for the alveolar than for the velar contrasts on average. We frequently observed the same pattern of results with listeners from diverse quantity and non-quantity L1 backgrounds in our previous studies (Hamzah et al. 2023; Tsukada and Hajek 2022; Tsukada and Yurong 2023). However, in the present study, the difference in the place of articulation did not reach statistical significance perhaps due to large within-group variability especially for the VTA and VTI groups (Figure 2). This differs from previous research (Sugimoto 2007), in which Vietnamese learners in Hanoi at the beginner/intermediate level were reported to have misperceived the alveolar geminates more frequently than the velar geminates. The results obtained may be related to the Vietnamese phonology. Velars in coda position are reported to be phonetically and orthographically more variable than alveolars in L1 Vietnamese (Phạm and McLeod 2016). If so, it may negatively affect how velars are cognitively represented in L2 Japanese, especially for the VTA (and VTI) learners who are expected to have actively competing L1 versus L2 systems.

Following one of the reviewers’ suggestions, we revisited the data to see if some word pairs were discriminated more or less accurately. Although we did not find a consistent pattern according to the word pair, for the VTA and VTI groups, the most poorly discriminated pair was kako ‘past’ versus kakko ‘parenthesis’ with kakko in the target (X) position. It is possible that the learners have been exposed to and/or familiar with these words. However, as we did not collect information about their lexical knowledge or mastery of writing systems, it is unclear if and how these words are internalized by the learners. Given that learners’ L2 vocabulary size positively affects their L2 speech processing (Bundgaard-Nielsen et al. 2011), gaining information regarding learners’ L2 Japanese vocabulary size as well as a tighter control over lexical frequency would be useful in future work.

Based on the finding that consonant length was generally more discriminable (albeit non-significantly) when the target token was the alveolar rather than the velar, potential pedagogical interventions may be to conduct targeted listening training with alveolar singletons/geminates first and delay the introduction of velar singletons/geminates until learners gain sufficient experience with consonant length. As alveolar contrasts such as -te versus -tte or -ta versus -tta occur very frequently in verb conjugations, in particular, gaining a solid grasp of this is useful in grammar as well as pronunciation learning. However, as mentioned above, we need to bear in mind that the opposite pattern (i.e., more misperception for the alveolar than for the velar geminates) has been reported previously (Sugimoto 2007) and that more data are needed to determine generalizability of our results.

4.3 Individual differences

Regarding individual differences, we observed that two of the VTN speakers achieved the d′ scores which were above the mean score of the VTI group and were well within the range (1.52–4.65) set by the VTA group. In addition, the majority (9 out of 12 [75 %]) of the VTN speakers performed just as well as the VTI participants who had some experience learning Japanese. Based on the background information provided, one of the two high-performing VTN speakers was majoring in cultural studies and the other in German linguistics and reported having studied music in junior high school. The former started learning English at the age of 12 and the latter at 8. Neither reported extended stays in foreign countries. While it is possible that these individuals had a keen interest in language and/or culture studies and possessed an exceptional aptitude for processing unfamiliar speech sounds or some other endogenous factors (e.g., superior auditory acuity, early-stage (precategorical) auditory processing, and working auditory memory) proposed in the SLM-r (Flege and Bohn 2021, p. 66), unfortunately, we do not have further details that account for their excellent performance.

Individual learners’ profiles characterized by factors such as prior language learning experience or cognitive abilities including their “L1 category precision” (Flege and Bohn 2021, p. 36, also on p. 65) or “L1 category compactness” (Kogan and Mora 2022) may play an important role in information processing in general and L2 speech processing in particular. For example, regardless of L2 experience or places of residence, the Vietnamese speakers with relatively precise L1 categories may have discriminated Japanese consonant length contrasts more accurately than those with relatively imprecise L1 categories. It is also possible that, due to the partial use (i.e., informativeness) of L1 vowel length as briefly presented in Section 1.3, the Vietnamese speakers with a good control of L1 category precision may be capable of transferring or repurposing their existing (albeit limited) L1 experience with vowel length in the context of processing consonant length in unfamiliar languages (e.g., Pajak and Levy 2014; Tsukada et al. 2021; Tsukada et al. 2024; Tsukada and Yurong 2022).

Finally, globally and especially in many Asian countries, Japanese popular culture such as manga and anime is hugely popular among young people. Anecdotally, some Japanese instructors noted a non-negligible influence of the popular culture on individuals’ L2 Japanese speech processing skills. Strong desire to impersonate characters in manga/anime may drive learners to seek massive auditory as well as visual input and to practice repetitively, which is likely to improve listening/speaking skills. Consumption of popular culture is another area where one individual is likely to vary widely from another.

5 Limitations and future directions

While we observed clear between-group differences in the perception of Japanese consonant length, some limitations regarding characteristics of the learners need to be acknowledged. Specifically, extensive L2 experience and advanced proficiency were combined in this study. In other words, the VTA learners had the greater advantage compared to the VTI learners by living in Japan for an extended period of time and being at the advanced proficiency level. Thus, their chance of success was expectedly high. To better understand if and how LOR and proficiency may explain the pattern of results, we are currently examining Vietnamese learners in Japan and Vietnam who are both at JLPT N3.

Inclusion of a greater range of consonants such as fricatives, affricates and nasals would be desirable (e.g., Hardison and Motohashi Saigo 2010) as the present study only examined stops at two (alveolar, velar) places of articulation. Given that consonants in coda position are largely limited to stops and nasals in Vietnamese (e.g., Do-Hurinville and Dao 2015; Hwa-Froelich et al. 2002; Nguyễn 2009; Phạm and McLeod 2016, 2019), examining the perception of length contrasts for other manners of articulation such as fricatives, affricates and nasals by Vietnamese speakers would be very interesting. In fact, it has been reported that nasal geminates did not pose much difficulty to Vietnamese learners of Japanese except for some cases of nasal insertion in perception and production (Sugimoto 2003, 2007).

Another procedural limitation was that we employed the discrimination task only, because we included participants who were naïve to Japanese, for whom length categories did not exist. However, identification of vowel and consonant length is a necessary skill for learners who need to learn to categorize length contrasts to efficiently communicate in Japanese. To gain an insight into how Vietnamese speakers process and acquire this essential skill, their perceptual assimilation patterns of Japanese to Vietnamese sounds need to be empirically established via identification as well as discrimination tasks.

In future, it would be valuable to conduct longitudinal as well as cross-sectional investigations, controlling for the country of residence, to gain an insight into the time course of learners’ L2 speech acquisition process. We are keen to determine how long it might take for VTN speakers to learn to perceive consonant length as efficiently as the VTI or VTA learners of the present study. It will be of practical importance for language learners and teachers to find out if and to what extent it is possible to achieve this skill without a lengthy stay in Japan. A study failed to show a positive effect of a 3-month study abroad program on the reduction of L2 learners’ foreign accent (Pilar Avello 2018). While immersion may be beneficial if a massive quantity of input can be constantly secured in the target language community, deep engagement with the target language (i.e., quality of input) on a regular basis may be necessary to attain robust spoken language skills. It is important to carefully assess the relative effects of formal instruction in pronunciation and short study abroad programs.

Finally, as we only investigated the cross-linguistic perception of Vietnamese speakers, it is unclear if our findings may generalize to their production skills of Japanese consonant length. In a recent study which examined the effect of intensive training on the acquisition of Finnish vowel length contrasts (Saloranta and Heikkola 2023), L2 learners from a wide range of L1 backgrounds showed different development patterns for the perception and production skills. To gain a better understanding of how and to what extent L2 learning may affect individuals’ linguistic performance, it is necessary to consider speech production as well as perception.

6 Conclusions

A clear difference between the VTI and VTA groups who shared the L1, i.e., Vietnamese, but differed in their experience with Japanese demonstrates learnability of Japanese consonant length for grownups. At the same time, a non-negligible difference between the VTA learners and NJ speakers confirms genuine and persistent difficulty of Japanese consonant length, which was demonstrated in numerous previous studies. By providing additional empirical data beyond the segmental level, this study brings us a step closer to determining the extent to which current theories of L2 speech learning account for the acquisition of a wide range of L2 sounds by speakers from diverse L1 backgrounds.

Corresponding author: Kimiko Tsukada, The University of Melbourne, Melbourne, Australia, E-mail: kimiko.tsukada@gmail.com

Acknowledgments

Portions of this work were presented at the 2nd International Conference on Tone and Intonation (TAI 2023) held in Singapore and the 5th International Symposium on Applied Phonetics (ISAPh 2024) held in Tartu, Estonia. We thank the editors and two anonymous reviewers for their thorough reading and insightful comments.

Author contributions: Kimiko Tsukada was responsible for the conception and design of the work, the acquisition, analysis and interpretation of data for the work and writing of the paper. Đích Đào and Trang Le were responsible for the acquisition of data for the work. We agree to be accountable for all aspects of the work.
Conflict of interest: The authors have no conflicts of interest to declare.
Ethics statement: We received approval from the University of Oregon Institutional Review Board and the Macquarie University Research Office (human research ethics) to conduct this research. All participants gave informed consent before taking part.

References

Agency for Cultural Affairs, Government of Japan. 2022. Nihongo Kyooiku Jittai Choosa Hookokusho – Kokunai no Nihongo Kyooiku no Gaiyoo [Survey on Japanese language education in Japan]. Available at: https://www.bunka.go.jp/tokei_hakusho_shuppan/tokeichosa/nihongokyoiku_jittai/r04/.Search in Google Scholar

Alarifi, Abdulaziz. & Benjamin V. Tucker. 2024. Orthographic influence in the distributional learning of non-native speech sounds. Second Language Research 40(4). 833–863. https://doi.org/10.1177/02676583231191611.Search in Google Scholar

Altmann, Heidi, Irena Berger & Bettina Braun. 2012. Asymmetries in the perception of non-native consonantal and vocalic length contrasts. Second Language Research 28(4). 387–413. https://doi.org/10.1177/0267658312456544.Search in Google Scholar

Avello, Pilar. 2018. Assessing learners’ changes in foreign accent during study abroad. In Carmen Pérez Vidal, Sonia López-Serrano, Jennifer Ament & Dakota J. Thomas-Wilhelm (eds.), Learning context effects: Study abroad, formal instruction and international immersion classrooms, 131–154. Berlin: Language Science Press.Search in Google Scholar

Best, Catherine T. 2019. The diversity of tone languages and the roles of pitch variation in non-tone languages: Considerations for tone perception research. Frontiers in Psychology 10. 364. https://doi.org/10.3389/fpsyg.2019.00364.Search in Google Scholar

Best, Catherine T. & Michael D. Tyler. 2007. Nonnative and second-language speech perception. In Ocke-Schwen Bohn & Murray J. Munro (eds.), Language experience in second language speech learning: In honor of James Emil Flege, 13–34. Amsterdam: John Benjamins Publishing Company.10.1075/lllt.17.07besSearch in Google Scholar

Boersma, Paul & David Weenink. Praat: Doing phonetics by computer [version 6.0.19]. Available at: http://www.praat.org (accessed 13 June 2016).Search in Google Scholar

Brunelle, Marc. 2009. Tone perception in Northern and Southern Vietnamese. Journal of Phonetics 37(1). 79–96. https://doi.org/10.1016/j.wocn.2008.09.003.Search in Google Scholar

Brunelle, Marc & James Kirby. 2016. Tone and phonation in Southeast Asian languages. Language and Linguistics Compass 10(4). 191–207. https://doi.org/10.1111/lnc3.12182.Search in Google Scholar

Bundgaard-Nielsen, Rikke L., Catherine T. Best & Michael D. Tyler. 2011. Vocabulary size is associated with second-language vowel perception performance in adult learners. Studies in Second Language Acquisition 33(3). 433–461. https://doi.org/10.1017/s0272263111000040.Search in Google Scholar

Common European Framework of Reference for Languages. Available at: https://www.coe.int/en/web/common-european-framework-reference-languages/.Search in Google Scholar

Đỗ, Hoang Ngan. 2012. Choo-on/Soku-on no Kikitori ni Ataeru Choo-on no Ichi/Akusento no Eekyoo – Nihongo o Senkoo to suru Betonamujin Gakusee o Taishoo ni [The influence of position and accent on the perception of long vowels and geminate consonants: Vietnamese learners majoring in Japanese]. Journal of Science, Foreign Languages 28. 242–254.Search in Google Scholar

Đỗ, Hoang Ngan. 2015. Betonamujin Gakushuusha no Nihongo ni okeru Choo-on/Soku-on no Chikaku ni Kansuru Mondai [Issues regarding Vietnamese learners’ perception of long vowels and geminate consonants in Japanese]. Journal of Science, Foreign Languages 31. 31–38.Search in Google Scholar

Do-Hurinville, Danh Thành & Huy Linh Dao. 2015. Vietnamese. In Nick J. Enfield & Bernard Comrie (eds.), Languages of mainland Southeast Asia: The state of the art, 385–431. Berlin: De Gruyter Mouton.Search in Google Scholar

Esposito, Anna & Maria Gabriella Di Benedetto. 1999. Acoustical and perceptual study of gemination in Italian stops. Journal of the Acoustical Society of America 106(4). 2051–2062. https://doi.org/10.1121/1.428056.Search in Google Scholar

Feng, Qiang & Maria Grazia Busà. 2022. Mandarin Chinese-speaking learners’ acquisition of Italian consonant length contrast. System 111. 102938. https://doi.org/10.1016/j.system.2022.102938.Search in Google Scholar

Flege, James E. 1987. The production of “new” and “similar” phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics 15(1). 47–65. https://doi.org/10.1016/S0095-4470(19)30537-6.Search in Google Scholar

Flege, James E. 1995. Second-language speech learning: Theory, findings, and problems. In Winifred Strange (ed.), Speech perception and linguistic experience: Issue in cross-language research, 233–277. Timonium, MD: York Press.Search in Google Scholar

Flege, James E. & Ocke-Schwen Bohn. 2021. The revised speech learning model (SLM-r). In Ratree Wayland (ed.), Second language speech learning: Theoretical and empirical progress, 3–83. Cambridge, UK: Cambridge University Press.10.1017/9781108886901.002Search in Google Scholar

Flege, James E. & Wieke Eefting. 1987. Cross-language switching in stop consonant perception and production by Dutch speakers of English. Speech Communication 6(3). 185–202. https://doi.org/10.1016/0167-6393(87)90025-2.Search in Google Scholar

Fujisaki, Hiroya, Kimie Nakamura & Toshiaki Imoto. 1973. Auditory perception of duration of speech and nonspeech stimuli. Annual Bulletin, Research Institute of Logopedics and Phoniatrics 7. 45–64.Search in Google Scholar

Goss, Seth J. & Katsuo Tamaoka. 2019. Lexical accent perception in highly-proficient L2 Japanese learners: The roles of language-specific experience and domain-general resources. Second Language Research 35(3). 351–376. https://doi.org/10.1177/0267658318775143.Search in Google Scholar

Hajek, John, Mary Stevens & Georgia Webster. 2007. Vowel duration, compression and lengthening in stressed syllables in Italian. Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007), Saarbrücken, Germany. 1057–1060.Search in Google Scholar

Hallé, Pierre A., Rachid Ridouane & Catherine T. Best. 2016. Differential difficulties in perception of Tashlhiyt Berber consonant quantity contrasts by native Tashlhiyt listeners vs. Berber-naïve French listeners. Frontiers in Psychology 7. 209.10.3389/fpsyg.2016.00209Search in Google Scholar

Ham, William H. 2001. Phonetic and phonological aspects of geminate timing. New York, NY: Routledge.Search in Google Scholar

Hamzah, Hilmi M., Kimiko Tsukada & John Hajek. 2023. Perception of the Japanese word-medial singleton/geminate contrast by Kelantan Malay speakers. Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS 2023), Prague, Czech Republic. 162–166.Search in Google Scholar

Han, Mieko S. 1992. The timing control of geminate and single stop consonants in Japanese: A challenge for nonnative speakers. Phonetica 49(2). 102–127. https://doi.org/10.1159/000261906.Search in Google Scholar

Hardison, Debra M. & Miki Motohashi Saigo. 2010. Development of perception of second language Japanese geminates: Role of duration, sonority, and segmentation strategy. Applied Psycholinguistics 31(1). 81–99. https://doi.org/10.1017/s0142716409990178.Search in Google Scholar

Hashimoto, Kayoko. 2022. Why are you learning Japanese? Vietnamese university students’ perspectives on work and life between Vietnam and Japan. Asian Studies Review 46(4). 631–649. https://doi.org/10.1080/10357823.2022.2025577.Search in Google Scholar

Hayes-Harb, Rachel. 2005. Optimal L2 speech perception: Native speakers of English and Japanese consonant length contrasts. Journal of Language and Linguistics 4. 1–29.Search in Google Scholar

Hayes-Harb, Rachel & Kyoko Masuda. 2008. Development of the ability to lexically encode novel second language phonemic contrasts. Second Language Research 24(1). 5–33. https://doi.org/10.1177/0267658307082980.Search in Google Scholar

Hirata, Yukari. 2004. Training native English speakers to perceive Japanese length contrasts in word versus sentence contexts. Journal of the Acoustical Society of America 116(4). 2384–2394. https://doi.org/10.1121/1.1783351.Search in Google Scholar

Hirata, Yukari. 2015. L2 phonetics and phonology. In Haruo Kubozono (ed.), Handbook of Japanese phonetics and phonology, 719–762. Berlin: De Gruyter Mouton.10.1515/9781614511984.719Search in Google Scholar

Hirata, Yukari, Elizabeth Whitehurst & Emily Cullings. 2007. Training native English speakers to identify Japanese vowel length contrast with sentences at varied speaking rates. Journal of the Acoustical Society of America 121(6). 3837–3845. https://doi.org/10.1121/1.2734401.Search in Google Scholar

Homma, Yayoi. 1981. Durational relationship between Japanese stops and vowel. Journal of Phonetics 9(3). 273–281. https://doi.org/10.1016/s0095-4470(19)30971-4.Search in Google Scholar

Hussain, Qandeel & Shigeko Shinohara. 2019. Partial devoicing of voiced geminate stops in Tokyo Japanese. Journal of the Acoustical Society of America 145(1). 149–163. https://doi.org/10.1121/1.5078605.Search in Google Scholar

Hwa-Froelich, Deborah, Barbara W. Hodson & Harold T. Edwards. 2002. Characteristics of Vietnamese phonology. American Journal of Speech-Language Pathology 11(3). 264–273. https://doi.org/10.1044/1058-0360(2002/031).Search in Google Scholar

Idemaru, Kaori & Susan G. Guion. 2008. Acoustic covariants of length contrast in Japanese stops. Journal of the International Phonetic Association 38(2). 167–186. https://doi.org/10.1017/s0025100308003459.Search in Google Scholar

Japanese-Language Proficiency Test. Available at: https://www.jlpt.jp/e/.Search in Google Scholar

Japanese-Language Proficiency Test. Available at: https://en.wikipedia.org/wiki/Japanese-Language_Proficiency_Test.Search in Google Scholar

Kanamura, Kumi. 1999. Betonamu Bogowasha ni yoru Nihongo no Hatsuon no Onchoojoo no Tokuchoo [Prosodic features in Japanese speech by native Vietnamese speakers]. Studia Linguistica 12. 73–91.Search in Google Scholar

Kawahara, Shigeto. 2015. The phonetics of sokuon, or geminate obstruents. In Haruo Kubozono (ed.), Handbook of Japanese phonetics and phonology, 43–78. Berlin: De Gruyter Mouton.10.1515/9781614511984.43Search in Google Scholar

Kirby, James P. 2011. Illustrations of the IPA: Vietnamese (Hanoi Vietnamese). Journal of the International Phonetic Association 41(3). 381–392. https://doi.org/10.1017/s0025100311000181.Search in Google Scholar

Kogan, Vita V. & Joan C. Mora. 2022. The effects of individual differences in native perception on discrimination of a novel non-native contrast. Laboratory Phonology 13(1). https://doi.org/10.16995/labphon.6431.Search in Google Scholar

Kubozono, Haruo, Hajime Takeyasu, Mikio Giriko & Manami Hirayama. 2011. Pitch cues to the perception of consonant length in Japanese. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS 2011), Hong Kong, China. 1150–1153.Search in Google Scholar

Lawrence, Michael A. 2016. ez: Easy Analysis and Visualization of Factorial Experiments. R package version 4.4-0. Available at: https://CRAN.Rproject.org/package=ez.Search in Google Scholar

Le, Trang Thi Huyen, Mariko Kondo & Kimiko Tsukada. In press. Cross-linguistic transfer of acoustic cues: Perception of Japanese vowel length by learners from Vietnamese-speaking background. Second Language Research. https://doi.org/10.1177/02676583251355203.Search in Google Scholar

Lee, Albert, Xiaolin Li & Peggy Mok. 2023. False geminates as an effective transitional strategy for Cantonese learners of Japanese. Second Language Research 39(4). 1219–1234. https://doi.org/10.1177/02676583221128530.Search in Google Scholar

Lee, Albert & Peggy Mok. 2018. Acquisition of Japanese quantity contrasts by L1 Cantonese speakers. Second Language Research 34(4). 419–448. https://doi.org/10.1177/0267658317739056.Search in Google Scholar

Leppik, Katrin, Pärtel Lippus & Eva Liina Asu. 2018. The perception of Estonian quantity degrees by Spanish Listeners. Proceedings of the 9th International Conference on Speech Prosody (Speech Prosody 2018), Poznań, Poland. 478–482. https://doi.org/10.21437/speechprosody.2018-97.Search in Google Scholar

Leppik, Katrin, Pärtel Lippus & Eva Liina Asu. 2019. The production of Estonian vowels in three quantity degrees by Spanish L1 speakers. Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019), Melbourne, Australia. 1154–1158.Search in Google Scholar

Leppik, Katrin, Pärtel Lippus & Eva Liina Asu. 2020. The production of Estonian quantity degrees by Spanish L1 speakers. Proceedings of the 10th International Conference on Speech Prosody (Speech Prosody 2020), Tokyo, Japan. 881–885. https://doi.org/10.21437/speechprosody.2020-180.Search in Google Scholar

Macmillan, Neil A. & C. Douglas Creelman. 2005. Detection theory: A user’s guide, 2nd edn. Mahwah, NJ: Lawrence Erlbaum Associates.Search in Google Scholar

Matsuda, Makiko, Natsuya Yoshida & Kumi Kanamura. 2018. Betonamujin Gakushuusha no Nihongo Hatsuwa Rizumu – Nihongo Bogowasha to no Hikaku [Japanese utterance rhythm by Vietnamese learners of Japanese: Comparison with native Japanese speakers]. Proceedings of the 32nd General Meeting of the Phonetic Society of Japan. 267–272.Search in Google Scholar

McAllister, Robert, James E. Flege & Thorsten Piske. 2002. The influence of L1 on the acquisition of Swedish quantity by native speakers of Spanish, English and Estonian. Journal of Phonetics 30(3). 229–258. https://doi.org/10.1006/jpho.2002.0174.Search in Google Scholar

Michaud, Alexis. 2004. Final consonants and glottalization: New perspectives from Hanoi Vietnamese. Phonetica 61(2-3). 119–146. https://doi.org/10.1159/000082560.Search in Google Scholar

Nguyễn, Ðình-Hoà. 2009. Vietnamese. In Bernard Comrie (ed.), The world’s major languages, 677–692. London: Routledge.Search in Google Scholar

Nguyễn, T. Anh-Thư, John C. L. Ingram & Rob J. Pensalfini. 2008. Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns. Journal of Phonetics 36(1). 158–190. https://doi.org/10.1016/j.wocn.2007.09.001.Search in Google Scholar

Nishizawa, Hitoshi, Daniel R. Isbell & Yuichi Suzuki. 2022. Review of the Japanese-Language proficiency test. Language Testing 39(3). 494–503. https://doi.org/10.1177/02655322221080898.Search in Google Scholar

Pajak, Bozena & Roger Levy. 2014. The role of abstraction in non-native speech perception. Journal of Phonetics 46. 147–160. https://doi.org/10.1016/j.wocn.2014.07.001.Search in Google Scholar

Phạm, Ben & Sharynne McLeod. 2016. Consonants, vowels and tones across Vietnamese dialects. International Journal of Speech-Language Pathology 18(2). 122–134. https://doi.org/10.3109/17549507.2015.1101162.Search in Google Scholar

Phạm, Ben & Sharynne McLeod. 2019. Vietnamese-speaking children’s acquisition of consonants, semivowels, vowels, and tones in Northern Viet Nam. Journal of Speech, Language, and Hearing Research 62(8). 2645–2670. https://doi.org/10.1044/2019_jslhr-s-17-0405.Search in Google Scholar

R Core Team. 2025. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Saloranta, Antti & Leena M. Heikkola. 2023. Acquisition of non-native vowel duration contrasts through classroom education: Perception and production affected differently. Journal of Second Language Pronunciation 9(2). 208–233. https://doi.org/10.1075/jslp.20040.sal.Search in Google Scholar

Sano, Shinichiro. 2019. The distribution of singleton/geminate consonants in spoken Japanese and its relation to preceding/following vowels. Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019), Melbourne, Australia. 1833–1837.Search in Google Scholar

Sonu, Mee, Hiroaki Kato, K. Keiichi Tajima, Reiko Akahane-Yamada & Yoshinori Sagisaka. 2013. Non-native perception and learning of the phonemic length contrast in spoken Japanese: Training Korean listeners using words with geminate and singleton phonemes. Journal of East Asian Linguistics 22(4). 373–398. https://doi.org/10.1007/s10831-013-9107-1.Search in Google Scholar

Strange, Winifred & Valerie L. Shafer. 2008. Speech perception in second language learners: The re-education of selective perception. In Jette G. Hansen Edwards & Mary L. Zampini (eds.), Phonology and second language acquisition, 153–191. Amsterdam: John Benjamins.10.1075/sibil.36.09strSearch in Google Scholar

Sugimoto, Taeko. 2003. Betonamugoken Nihongo Gakushuusha no Hatsuon ni Kakawaru Goyoo nitsuite I [Error analysis of Japanese pronunciation by Vietnamese learners I]. Bulletin of the College of Humanities, Ibaraki University. Studies in Communication 14. 19–45.Search in Google Scholar

Sugimoto, Taeko. 2005. Betonamugoken Nihongo Gakushuusha no Hatsuon ni Kakawaru Goyoo nitsuite II – Onsee Kikitori Choosa to Hatsuon Choosa ni okeru Choo-on-ka/Tan-on-ka no Goyoo no Hikaku to Koosatsu [Error analysis of Japanese pronunciation of Vietnamese learners II: Distinctions between long-vowels and short-vowels]. Bulletin of the College of Humanities, Ibaraki University. Studies in Communication 17. 73–93.Search in Google Scholar

Sugimoto, Taeko. 2007. Betonamugoken Nihongo Gakushuusha no Hatsuon ni Kakawaru Goyoo nitsuite III – Soku-on to Hatsu-on ni okeru Goyoo no Hikaku to Koosatsu [Error analysis of Japanese pronunciation by Vietnamese learners III: On the geminate consonant and the syllabic nasal]. Departmental Bulletin of College of Humanities, Ibaraki University 2. 149–164.Search in Google Scholar

Tamaoka, Katsuo & Shogo Makioka. 2004. Frequency of occurrence for units of phonemes, morae, and syllables appearing in a lexical corpus of a Japanese newspaper. Behavior Research Methods, Instruments, & Computers 36(3). 531–547. https://doi.org/10.3758/bf03195600.Search in Google Scholar

The Japan Foundation. 2021. Survey report on Japanese-language education abroad 2021. Available at: https://www.jpf.go.jp/e/project/japanese/survey/result/survey21.html.Search in Google Scholar

The Japan Foundation/Japan Educational Exchanges and Services. 2025. Indication of the CEFR level for reference. Available at: https://www.jlpt.jp/e/about/cefr_reference.html.Search in Google Scholar

Thompson, Laurence C. 1988. A Vietnamese reference grammar. Honolulu: University of Hawaii Press.Search in Google Scholar

Tsujimura, Natsuko. 2013. An introduction to Japanese linguistics. Oxford: Wiley Blackwell.Search in Google Scholar

Tsukada, Kimiko, Felicity Cox, John Hajek & Yukari Hirata. 2018. Non-native Japanese learners’ perception of consonant length in Japanese and Italian. Second Language Research 34(2). 179–200. https://doi.org/10.1177/0267658317719494.Search in Google Scholar

Tsukada, Kimiko & John Hajek. 2022. Adaptation to L3 phonology? Perception of the Japanese consonant length contrast by learners of Italian. The 18th Australasian International Conference on Speech Science and Technology (SST 2022), Canberra, Australia. 171–175.Search in Google Scholar

Tsukada, Kimiko & John Hajek. 2023. Cross-language perception of Japanese consonant length by speakers from Italian- and Mandarin-speaking backgrounds. Second Language Research 39(3). 925–938. https://doi.org/10.1177/02676583221108269.Search in Google Scholar

Tsukada, Kimiko, Jeong-Im Han, Yurong, Pierre Hallé & Rachid Ridouane. 2024. Perception of Tashlhiyt singleton-geminate contrasts by speakers from Korean, Mandarin, and Mongolian backgrounds. Proceedings of the 5th International Symposium on Applied Phonetics (ISAPh 2024), Tartu, Estonia. 110–114.10.21437/ISAPh.2024-21Search in Google Scholar

Tsukada, Kimiko & Yurong. 2022. Non-native perception of Japanese singleton/geminate contrasts: Comparison of Mandarin and Mongolian speakers differing in Japanese experience. Proceedings of the 23rd Annual Conference of the International Speech Communication Association (Interspeech 2022), Incheon, Korea. 3068–3072.10.21437/Interspeech.2022-397Search in Google Scholar

Tsukada, Kimiko & Yurong. 2023. Perception of Japanese consonant length by advanced learners from Mandarin- and Mongolian-speaking backgrounds. Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS 2023), Prague, Czech Republic. 2393–2397.Search in Google Scholar

Tsukada, Kimiko, Yurong, Joo-Yeon Kim, Jeong-Im Han & John Hajek. 2021. Cross-linguistic perception of the Japanese singleton/geminate contrast: Korean, Mandarin and Mongolian compared. Proceedings of the 22nd Annual Conference of the International Speech Communication Association (Interspeech 2021), Brno, Czechia. 3910–3914.10.21437/Interspeech.2021-21Search in Google Scholar

Vance, Timothy J. 2008. The sounds of Japanese. New York, NY: Cambridge University Press.Search in Google Scholar

Verdenschot, Rinus G., Hoàng Thị Lan Phương & Katsuo Tamaoka. 2022. Phonological encoding in Vietnamese: An experimental investigation. Quarterly Journal of Experimental Psychology 75(7). 1355–1366. https://doi.org/10.1177/17470218211053244.Search in Google Scholar

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, Alex Hayes, Lionel Henry, Jim Hester, Max Kuhn, Thomas Lin Pedersen, Evan Miller, Stephan Milton Bache, Kirill Müller, Jeroen Ooms, David Robinson, Dana Paige Seidel, Vitalie Spinu, Kohske Takahashi, Davis Vaughan, Claus Wilke, Kara Woo & Hiroaki Yutani. 2019. Welcome to the tidyverse. Journal of Open Source Software 4(43). 1686. https://doi.org/10.21105/joss.01686.Search in Google Scholar

Yamakawa, Kimiko, Shigeaki Amano & Mariko Kondo. 2022. Vietnamese speakers’ mispronunciation of Japanese singleton and geminate stops. Acoustical Science and Technology 43(5). 241–249. https://doi.org/10.1250/ast.43.241.Search in Google Scholar

Yin, Shuai & Xin Wang. 2021. Betonamujin Nihongo Gakushuusha no Nihongo Hatsuwa Rizumu Tokuchoo – Pairwise Variability Index o Riyooshita Bunseki [The Japanese speaking rhythm of Vietnamese learners of Japanese: An analysis using pairwise accuracy index]. Bulletin of International Pacific University 19. 77–80.Search in Google Scholar

Yin, Shuai & Rin Yasuhara. 2020. Betonamujin Nihongo Gakushuusha no Nihongo no Sanshutsu ni okeru Choonka Genshoo [The sound prolongation in Japanese of Vietnamese Japanese learners]. Bulletin of International Pacific University 15. 43–49.Search in Google Scholar

Ylinen, Sari, Anna Shestakova, Paavo Alku & Minna Huotilainen. 2005. The perception of phonological quantity based on durational cues by native speakers, second-language users and nonspeakers of Finnish. Language and Speech 48(3). 313–338. https://doi.org/10.1177/00238309050480030401.Search in Google Scholar

Received: 2025-04-09

Accepted: 2025-10-16

Published Online: 2025-11-10

Published in Print: 2025-12-17

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/phon-2025-0025

Keywords for this article

cross-language speech perception; Japanese; consonant length; short/singleton versus long/geminate; Vietnamese

Creative Commons

BY 4.0