Home Linguistics & Semiotics Investigating the role of musical experience in lexical tone perception: non-musicians and amateur musicians’ perception of Mandarin tones
Article Open Access

Investigating the role of musical experience in lexical tone perception: non-musicians and amateur musicians’ perception of Mandarin tones

  • ORCID logo EMAIL logo and
Published/Copyright: September 18, 2025

Abstract

Previous studies have found that musicians typically discriminate Mandarin tones better than non-musicians. However, the relationship between musical experience and tone perception is unclear. In the current study, 39 monolingual native English speakers with no previous experience of tone languages and a range of musical backgrounds (non-musicians and amateur musicians) completed 6 tasks, including lexical tone identification, working memory, test of first language (L1) and second language (L2) segmental perception and the Goldsmiths Musical Sophistication Index which measures musical ability and experience. Results indicated that tone identification was significantly correlated with music training, musical ability, and pitch discrimination. However, a path analysis showed that pitch discrimination and musical ability, but not music training, directly influenced tone identification. Music training had a positive direct influence on pitch discrimination and musical ability, and indirectly influences tone identification via these mediators. Follow-up multivariate multiple regression showed that different tones are affected differently: pitch discrimination ability mainly influenced identification of Tones 3 and 4, while musical ability significantly influenced Tones 1 and 4. Overall, naïve, non-tone language speakers do not need music training or a musical background to be able to identify Mandarin tones with a high degree of accuracy.

1 Introduction

Different languages use pitch differently. All languages use pitch variation to convey different kinds of information, including linguistic information (Ladefoged and Johnson 2014: 264–266). In tonal languages like Mandarin, pitch variation (F0) is used to distinguish lexical meanings through the use of contrastive tones (Yip 2002: 229). For example, Mandarin’s four tones – high-level (Tone1), rising (Tone2), low-dipping (Tone3), and falling (Tone4) – can be used to assign distinct meanings to otherwise identical syllables (Ladefoged and Johnson 2014: 264–266; Pfordresher and Brown 2009). In contrast, non-tonal languages like English utilize pitch prosodically (i.e., at the utterance-level) to signal pragmatic rather than lexical meaning (Cruttenden 1997; Wennerstrom 2001). Consequently, speakers of non-tonal languages often struggle to perceive lexical tones (Krishnan et al. 2005).

Pitch also plays an important role in music (e.g., McDermott and Oxenham 2008), and extensive evidence from both behavioural and neurophysiological research suggests that non-tone language native speakers with music training may be able to transfer their ability with pitch in music to perceive lexical tone in a second language (L2; Marie et al. 2011). Neuroimaging studies have shown that there are both structural (Hyde et al. 2009) and functional (Lappe et al. 2008) brain differences between musicians and non-musicians. For example, musicians are better at tracking linguistic pitch changes than non-musicians; musicians’ brainstem responses show more faithful representation of the F0 contours and more robust neural phase-locking for Tone2 and Tone3, the most acoustically complex tone (Wong et al. 2007). Both professional and amateur musicians consistently outperform non-musicians in lexical tone discrimination, even without prior tone language experience (Alexander et al. 2005; Marie et al. 2011; Wong and Perrachione 2007). Likewise, musicians who have been playing for longer have been shown to be better at identifying lexical tone than those with fewer years’ training (Chua and Brunt 2014).

One explanation for why musicians are better than non-musicians at perceiving tone languages is that through musical practice, they have developed enhanced pitch sensitivity vis-à-vis non-musicians. Evidence suggests that musical pitch perception shares neural and cognitive mechanisms with linguistic pitch processing (Bidelman et al. 2011; Krishnan et al. 2005), and that musical training thus enhances pitch discrimination across domains (Moreno et al. 2009; Schön et al. 2004). This implies that domain-general pitch acuity is enhanced by formal music training, and that their increased ability to track pitch direction (i.e., onset-to-offset contour comparisons) facilitates tone perception.

Another potential explanation is that in addition to pitch processing, music training enhances general linguistic and cognitive abilities and that these in turn, support L2 lexical tone perception. For example, musicians have been shown to outperform non-musicians in phonological categorisation (Chobert and Besson 2013; Kraus and Chandrasekaran 2010) and music training has been shown to enhance segmental (Degé and Schwarzer 2011) and suprasegmental (Moreno et al. 2009) phonological processing. However, as far as we know, no study has investigated whether the influence of music training on L2 lexical tone perception is mediated by general segmental speech processing ability. Previous research has found that English listeners treat tones and consonants as perceptually separable (Lin and Francis 2014), and that a listener’s first language (L1) can play a vital role in attention distribution when processing a tonal second language (Liu and Ning 2021). As segmental and tonal information are presented simultaneously in a Mandarin word, enhanced segmental processing ability may allow a listener to attribute more attention to processing tonal information, in turn leading to better tone identification.

Music training may also lead to better cognitive processing as measured in terms of verbal working memory (e.g., Talamini et al. 2017). Both music performance (requiring memorisation of phrases during execution) and L2 tone perception (requiring retention of unfamiliar segmental and suprasegmental features) rely on working memory (e.g., Bidelman et al. 2013), and so music training may support L2 tone perception through enhancing verbal or melodic working memory. For example, Jaschke et al. (2018) found that children who were musically trained showed significant gains in verbal working memory. Additionally, Perrachione et al. (2011) tested the working memory capacity and Mandarin-like tone perception ability of native English speakers with no prior exposure to tone languages. Both studies found that learners with better working memory were faster and more accurate in tone categorisation, likely because they can better maintain and manipulate tonal information in memory.

However, whilst an explanation based on musical training might be appealing, individuals without formal music training can also exhibit highly developed pitch sensitivity (e.g., Mankel and Bidelman 2018). This suggests that advanced pitch discrimination may not require musical expertise or extensive music training. Individuals with no musical training but high baseline pitch aptitude (tested via the Montreal Battery of Evaluation of Amusia, MBEA) matched musicians in identifying L2 lexical tones, indicating that innate pitch sensitivity can offset lack of formal training (Delogu et al. 2010). This indicates that music training is not necessary to enhance perception of pitch in speech (Bidelman and Alain 2015).

Similarly, while music training enhances rhythm perception accuracy and melodic memory for unfamiliar songs (Chen et al. 2020; Matthews et al. 2016), untrained individuals can also exhibit exceptional musical perceptual abilities (Correia et al. 2022). Non-musicians with strong melodic contour identification skills (e.g., detecting rising/falling pitch patterns) performed comparably to musicians in discriminating Mandarin tones (Tone2 vs. Tone3), and neural evidence showed that these individuals had enhanced brainstem encoding of pitch contours (FFR) (Giuliano et al. 2011). Such evidence indicates that innate musical ability, rather than music training itself, may facilitate non-native lexical tone perception.

Notably, the direct or indirect influence of music training on Mandarin tone perception appears tone-specific (e.g., Delogu et al. 2010). For instance, Wong et al. (2007) measured frequency-following responses (FFRs) in non-tone-language-speaking musicians and non-musicians as they passively listened to Mandarin /mi/ synthesized with Tones 1, 2, and 3. While musicians exhibited more faithful neural tracking of F0 contours overall, this advantage was significant for Tone3 and Tone2. Moreover, enhanced neural phase-locking in musicians was observed solely for Tone3, with no group differences in phase-locking across tones. Although the exclusion of Tone4 arguably precludes conclusions about the effect of music training on the full Mandarin tone inventory, these findings suggest that musicians’ superior tone perception may not generalise uniformly across all Mandarin tones.

To sum up, music training enhances a broad range of auditory and cognitive abilities, including pitch discrimination, musical ability, phonological processing and working memory, all of which could collectively support L2 tone perception. However, existing research seldom disentangles the contributions of music training, musical ability, and general pitch processing in Mandarin tone perception. Given that individuals without formal music training can exhibit strong musical ability and perform well in tone perception tasks, it remains unclear whether observed performance advantages in musicians reflect training-induced plasticity or pre-existing auditory processing abilities that may exist independently of training (Mankel and Bidelman 2018; Wong and Perrachione 2007). To fill this gap, this study investigated how non-professional music training and related cognitive abilities influence Mandarin tone identification in native, monolingual English speakers with no prior experience of Mandarin or any tone language. Participants completed a battery of tasks assessing pitch discrimination, verbal/melodic working memory, and L1/L2 phonological processing. Musical experience was measured using the Gold-MSI questionnaire (Müllensiefen et al. 2014). We hypothesised that music training would improve tone identification, mediated by the abilities that it can potentially enhance, that Tone3 (dipping contour) would be the most challenging to identify and that this tone would show the strongest dependence on musical ability, consistent with its acoustic complexity.

2 Methods

2.1 Participants

Thirty-nine (female, N = 25) monolingual native English-speaking participants (age 18–35 years, M = 24.51 years, SD = 4.44) were recruited from the UCL Psychology subject pool. An a priori power analysis for the path analysis (see Section 3) indicated that a minimum of 37 participants were required to achieve a good fit of the model (GFI = 0.90, alpha = 0.05, power = 0.80). Participants had no history of speech, hearing or language impairment, and no experience with tonal languages.

As the current study aimed to give a more holistic view of the effects of musical experience on lexical tone perception, participants were recruited to have a range of musical experience. Professional musicians were excluded and instead we targeted amateur musicians of various levels and non-musicians. Nineteen participants had had formal music training, though the number of years of training varied (years of music training = 2–22 years, M = 9.21 years, SD = 5.59); more than half of them (N = 11) still practised regularly. They had been trained either with a melodic musical instrument (N = 11) or in singing (N = 1), or both (N = 7); 5 had been trained with two or more melodic musical instruments. The remaining 20 participants had no formal music training experience. However, some had taken part in regular musical activities (e.g., teaching themselves a musical instrument, N = 2) or singing in an amateur choir (N = 4). Only two of them were still practising when they took part in the experiment.

2.2 Materials and procedure

All testing took place in the Speech Sciences Laboratory, Chandler House, UCL. All participants provided informed consent after reviewing an information sheet. They then completed a battery of tasks on a PC in a single testing session, in the following order: Reading span, Melodic working memory Task, L1 consonant categorisation, L2 consonant discrimination, Tone Identification task, Pitch Discrimination task and the Goldsmiths Musical Sophistication Index (Gold-MSI). Audio stimuli were presented over headphones (Sennheiser HD 25-1 II) at a comfortable volume. All tasks were self-paced and the testing session lasted around 80 min, including breaks.

2.2.1 Verbal working memory: reading span task

The reading span task developed by Van den Noort et al. (2008) was used to measure participants’ auditory verbal working memory (WM). The task comprises 105 sentences, of which 5 sentences are used for practice. The 100 experimental sentences are divided into five series of five blocks; each block consists of 2–6 sentences (Van den Noort et al. 2008). The semantic relationships between words are controlled to minimise any potential semantic influences on verbal working memory (e.g., Kittler et al. 2004).

Written sentences were presented via a PC running Psychopy 2 (version 1.84.1; Peirce 2007). Participants saw a sentence on the screen and were asked to read this aloud. Sentences were presented automatically; the next sentence was automatically presented within 6.5 s. Sentence blocks (i.e., 2–6 sentences) were presented in a pseudo-randomised order, with sentences within each block presented in a fixed order. After a block of sentences had been presented, participants saw the word “recall” on the screen and had 5–7 s to list aloud the final words of the sentences they had just read. Responses were recorded manually by the researcher who sat quietly behind the participant during the task. Neither the order in which the participants listed the sentence-final words nor if they included incorrect words mattered. The final score was the percentage of correctly recalled words.

2.2.2 Melodic working memory task

Participants’ melodic working memory was measured using an implementation of the adaptive melodic discrimination test (MDT) developed by Harrison and Müllensiefen (2018). The task uses a three-alternative forced-choice (3-AFC) paradigm with synthesized melodies. New melodies were generated from these base melodies using a computational model (see Harrison et al. 2017 for further details). Melodies were between 3 and 16 notes in length and were synthesized with identical piano timbre and a tempo of 120 beats per minute (Collins and Laney 2017). Melodies used in the task were therefore novel and unfamiliar to listeners, which ensured the results were not affected by participants’ previous experience.

Before the experiment, participants completed an example and two practice trials. The example used the melody of the last two phrases of “Happy Birthday”, chosen for its high recognisability. In the practice, two novel and unfamiliar melodies with 6 and 19 notes were used. Neither was used in the experiment. Participants were permitted to repeat the practice as many times as necessary until they felt confident in their understanding of the task.

In the current study, the test was used with its default settings. Participants completed 20 trials. Each trial consisted of three versions of a melody played in different keys, presented with an ISI of one second, with one version containing an altered note. The first version of the melody always took the key of D major, and successive versions were transposed a semitone higher in pitch. Participants were instructed to pick out the version with the altered note by clicking the corresponding button on the computer screen. They could answer only after they had heard all versions of the melody. The position of the correct answers varied equally and randomly across the trials. The test used Urry’s rule for item selection (Magis and Raiche 2012); it presented harder items to higher-ability participants and easier items to lower-ability participants. Harder items were longer (i.e., more complex) and had fewer contour and tonality variations, thus increasing similarity and making them harder to distinguish (see Harrison et al. 2017 for further details).

Participants’ melodic working memory score was estimated using Item Response Theory with weighted-likelihood estimation, ranging approximately from -4 to +4. A score of 0 reflected chance level performance; the higher the score is, the better the participant’s melodic working memory (Harrison et al. 2017). For a detailed description of the scoring algorithm, refer to Harrison and Müllensiefen (2018) and Harrison et al. (2017).

2.2.3 L1 consonant categorisation

This task provided a measure of participants’ L1 speech perception ability. The task was the same as that used in previous studies (e.g., McCarthy et al. 2014), and consisted of two synthetic continua varying in Voice Onset Time (VOT): bee-pea (Hazan et al. 2009) and coat-goat (Ramus et al. 2003). The task used existing stimuli which were generated using the cascade branch of the Klatt (1980) synthesizer based on utterances produced by a single female British English native speaker.

As described in McCarthy et al. (2014), the total syllable duration for bee-pea was 390 ms. For bee, F1 was set at 390 Hz and reached 185 Hz at the end of the syllable. The F2 and F3 transitions increased from 1,400 to 2,500 Hz, respectively, to 2,540 and 2,970 Hz. F2 and F3 then increased to reach 2,760 and 3,377 Hz at the end of the syllable. F4 was set at 3,950 Hz. The total syllable duration for goat-coat was 459 ms. For goat, the F1 transition increased from 477 to 640 Hz, and F1 then decreased from 640 to 306 Hz by the end of the syllable. F2, F3, and F4 began in 2,080, 2,900, and 4,380 Hz, respectively, and reached 1,645, 2,800, 4,130 Hz at the end of the syllable. VOT was delayed while simultaneously increasing the duration of aspiration. All continua varied VOT in 1 ms steps, ranging from 0 ms for bee (/biː/) to 60 ms for pea (/pʰiː/), and 20 ms for goat (/ɡəʊt/) to 70 ms for coat (/kʰəʊt/). For the velar continuum, which used a vowel with a relatively high F1, the F1 onset frequency co-varied with VOT, increasing with increasing VOT, as it does in natural speech. For the bilabial continuum, the low F1 of the /i/ vowel precludes any significant transition, so F1 onset frequency varied little, again as would be the case naturally.

Participants completed a forced-choice identification task, in which they heard the synthetic continua varying in 50 equal steps. Participants were instructed to identify which word they had heard by clicking the on-screen picture of the target word (e.g., a coat or a goat). After responding, the next trial was presented automatically. Feedback was only given for catch trials (continuum endpoints) and the practice trials.

To start the experiment, participants were required to score 100 % correct in the four-trial practice identifying the continuum endpoints (two trials per endpoint). The difficulty level of subsequent trials was determined using an adapted version of the modified Levitt (1971) procedure. Two independent randomly interleaved adaptive tracks were used, starting at the two endpoints of the continuum (e.g., one at a clear /p/ and another at a clear /b/). A modified Levitt (1971) procedure was then used to estimate the points on the continuum where the stimuli would be labeled as one word of the pair (e.g., goat) 29 % and 71 % of the time. As described in McCarthy et al. (2014), when the participant labeled two stimuli in a row as the same category in which the track started, the subsequent trial moved closer to the phoneme boundary, making the trial harder to categorise. In contrast, when the participant identified a sound as coming from the other category, the experiment adjusted the stimulus in the next trial to make it more likely that the participant would again identify it as that category by moving towards its endpoint. The initial step size was 10 ms, reducing linearly to 4 ms over the first three changes in direction of the track (i.e., reversals). To track attention, and maintain stable phoneme boundaries, continuum endpoints (catch trials) were randomly interspersed 20 % of the time. The task ended after seven reversals or a maximum of 40 trials.

For this task, responses to both the pea-bee and coat-goat continua were aggregated. Logistic regression was used to obtain a best fit sigmoid function to calculate the categorisation slope, which indicates the listener’s sensitivity to variation. An overall slope score was calculated by averaging over the two continua and was used as the measure of L1 categorisation: a higher slope score represents better categorisation.

2.2.4 L2 consonant discrimination

Ability with unfamiliar L2 contrasts was tested using a 3-way forced-choice oddity task in which they discriminated unfamiliar voiceless uvular /q/ and palatal /c/ plosives from their native voiceless velar plosive, /k/. Plosives were presented in 3 different VCV contexts, /i/-C-/i/, /ɑ/-C-/ɑ/ or /u/-C-/u/, consisting of four minimal pairs, /ɑkɑ/-/ɑqɑ/, /uku/-/uqu/, /iki/-/ici/ and /ɑkɑ/-/ɑcɑ/, giving seven VCV words in total with /ɑkɑ/ used in two pairs. These contrasts were chosen for the two reasons. First, Mandarin consonants were avoided as future study plans included testing participants with Mandarin learning experience. Second, it is well known that native English speakers find these consonants, neither of which are in the English phoneme inventory, difficult to perceive as English listeners typically assimilate both uvular and palatal sounds to their native velar category.

The audio stimuli were recorded using Audacity (version 2.1.2) in a sound attenuated booth in the Division of Psychology and Language Science, UCL, and processed in Praat (Boersma and Weenink 2016). The stimuli were produced by 3 experienced female phoneticians from UCL. All stimuli were produced with a falling intonation contour and the stress on the second syllable which was initiated by the consonant. Recordings were made in stereo using a RØDE NT1-A Condenser Microphone and a Focusrite Scarlett 2i4 USB Computer Audio Interface preamplifier plugged into the sound card input of a Dell PC at 44.1 kHz, 16-bit resolution. Each speaker recorded each word three times, and the best version (i.e., good voice quality, no hesitation) was saved to an individual wav file for use in the task. Recordings were then converted to mono and equalised for duration to 1.31 s (average word duration for all speakers) using the Praat Vocal Toolkit (Corretge 2012) before being bandpass-filtered at 60–20,000 Hz with a smoothing factor of 10. Finally, intensity was scaled to 70 dB.

The procedure was the same as that used in Iverson et al. (2012). On each trial, participants heard three VCV words spoken by three different speakers. Two words were identical and one was different: participants were required to select the odd one out. Each stimulus was played only once, with an ISI of one second, and no feedback was given. Each of the four pairs of words (i.e., /ɑkɑ/-/ɑqɑ/, /uku/-/uqu/, /iki/-/ici/, /ɑkɑ/-/ɑcɑ/) were played 18 times, nine with the first word and nine with the latter word in the pair as the odd stimulus, and with the odd stimulus played in first, second, or third position. This gave 72 trials in total. Trials were presented in a randomised order using Praat (Boersma and Weenink 2016).

2.2.5 Tone identification (ToneID) task

Twenty Mandarin monosyllabic real words were recorded, five words for each of the four Mandarin tones. To avoid the potential influence of unfamiliar phonemes on perception of Mandarin tones, words containing Mandarin phonemes which could not easily be mapped to English were excluded (cf. Wong and Perrachione 2007). All the recordings and the offline processing in this task were the same as for the L2 consonant discrimination test, except that the duration of the stimuli in this test were not equalised to ensure ecological validity. The words were produced by two native Mandarin speakers (1 male, 1 female).

For practice, four words different from those used in the experiment were recorded by two additional qualified Mandarin native speakers in the same way as for the 20 experimental stimuli. Each Mandarin tone occurred once in the practice session.

Given that all the participants had no previous experience with any tonal language, the experimenter (first author) briefly explained the notion of lexical tones and Mandarin tones orally to participants. As a Mandarin native speaker, she produced the syllable /ma/ with the four Mandarin tones, while showing the participants flashcards with the corresponding tone marks that would be used as response options. Once the participants confirmed their understanding of the task, they started the experiment. They first completed a 4-trial practice to familiarise them with the task before completing 40 experimental trials. Each word was presented only once in a randomised order. Participants gave their response by clicking on the corresponding tone marks on the screen. There was no time limit and they received no feedback.

2.2.6 Pitch discrimination task

This task measured participants’ ability to differentiate pitch height change direction. The stimuli were designed based on those used by Stevens et al. (2011). Pure tones were generated with Audacity (version 2.1.2) as sine waves sampled at 44.1 kHz, 16-bit resolution and the default amplitude (i.e., 80 % of the maximum possible volume) to avoid clipping, with a duration of 400 ms. The pure tones ranged in frequency from 392 Hz to 416 Hz, which correspond to the music notes G4 and G#4 respectively, in steps of 2 Hz, giving a total of 13 tones. Each tone higher than 392 Hz was paired with the lowest tone (i.e., 392 Hz) in both a rising and falling combination, with an inter-stimulus interval (ISI) of 100 ms, giving 24 pairs in total.

Among the stimuli, four pairs of pure tones were pseudo-randomly selected for the practice session. All four pairs contained a pure tone at 392 Hz. Two pairs were rising pairs (i.e., the second pure tone had higher frequency than the first pure tone), and the other two pairs were falling pairs (i.e., the second pure tone had lower frequency than the first pure tone). The 392 Hz pure tone was paired with 402 Hz and 408 Hz in the rising pairs, and with 394 Hz and 416 Hz in the falling pairs. Thus, the pitch differences of the pure tone pairs were 2 Hz, 10 Hz, 16 Hz, and 24 Hz, and so they contained the smallest (i.e., 2 Hz) and the biggest (i.e., 24 Hz) frequency difference.

On each trial, participants heard 2 tones and responded whether the second tone was higher or lower than the first by clicking on the corresponding word (i.e., “higher” or “lower”) on the screen. Participants completed 72 experimental trials (3 repetitions of the 24 pure tone pairs) presented in a randomized order. Before the experiment, all participants completed a four-trial practice to familiarise them with the task. They were given feedback (percentage correct score) after they had completed all practice trials. To be included in the study, participants had to provide a correct response for the pure-tone pair with a 24 Hz frequency difference (i.e., the easiest pure-tone pair). All participants provided the correct response for this pair.

2.2.7 Goldsmiths Musical Sophistication Index (Gold-MSI questionnaire)

Participants’ music training level and musical perceptual ability were measured using the Goldsmiths’ Musical Sophistication Index (Gold-MSI) v1.0 questionnaire, a standardized, self-assessment questionnaire developed by Müllensiefen et al. (2014). We chose the Gold-MSI because it measures expertise as developed through a range of musical activities, not just learning to play a musical instrument or studying singing. Musical skill and expertise varies greatly across individuals, and those who have no theoretical or technological knowledge of music (i.e., those who would describe themselves as non-musicians) may also develop musical expertise through engagement with music in other ways. The Gold-MSI thus enables us to obtain a more holistic measure of musical sophistication that is not limited to expertise developed through formal training.

Participants completed all parts of the questionnaire, however only the Music Training and Musical Ability (perceptual abilities) subscale scores were used in this study. The Music Training subscale score is calculated from participants’ answers to seven, 7-point Likert Scale questions that combine questions about the time spent learning and practising musical instruments/singing, formally and informally (e.g., “I engaged in regular daily practice of a musical instrument including voice for years”, “I have had formal training in musical theory for ___ years”), with questions that elicit the degree of self-assessed musicianship (e.g., “I would not consider myself a musician”). The Musical Ability subscale comprises nine 7-point Likert scale questions that represent participants’ self-assessment of their musical abilities, most of them related to music listening skills and long-term melodic memory and are independent from years of musical training, e.g., confidence in judging others’ singing ability, recognising familiar and novel tunes, identifying the genre of a piece of music, and rating their own tonal perception.

Participants completed the questionnaire after the six experimental tasks described above. They were instructed to answer the questions as accurately and truthfully as they could and were informed that there were no good or bad answers. Answers were scored used the Gold-MSI v1.0 template. Music Training and Musical Ability scores originally ranged from 7 to 49 and from 7 to 63 points respectively. To enable comparison with scores from the experimental tasks, raw scores were converted to percentages. Higher scores represented more music training or better musical ability.

3 Results

Participants’ performance in all 6 tasks and the Gold-MSI Questionnaire are plotted in Figure 1.

Figure 1: 
Boxplots of all observed variables: Tone Identification (ToneID), reading span (Reading span), melodic memory (Melodic memory), L1 consonant categorisation (L1), L2 consonant discrimination (L2), Pitch Discrimination (PD), Musical Ability (Musical ability), and Music Training (Music training). Melodic memory and L1 scores were Min–Max normalised to match the scales of the other measures for visual comparison. The scores of the other tasks are percentage correct.
Figure 1:

Boxplots of all observed variables: Tone Identification (ToneID), reading span (Reading span), melodic memory (Melodic memory), L1 consonant categorisation (L1), L2 consonant discrimination (L2), Pitch Discrimination (PD), Musical Ability (Musical ability), and Music Training (Music training). Melodic memory and L1 scores were Min–Max normalised to match the scales of the other measures for visual comparison. The scores of the other tasks are percentage correct.

To investigate if performance on the different tasks predicted tone identification performance, path analysis was used. To determine the factors for the path analysis, a correlation analysis was carried out first. Given that the scores for reading span, Pitch Discrimination, Musical Ability, and Music Training were not normally distributed, a Spearman correlation was used to investigate potential correlations between ToneID and all other tasks, i.e., predicting variables. The Benjamini-Hochberg procedure was applied to adjust for multiple comparisons. Results showed that ToneID was significantly correlated with Pitch Discrimination (mean = 83.44, SD = 15.21; ρ = 0.56, p < 0.05), Musical Ability (mean = 46.85, SD = 8.06; ρ = 0.41, p < 0.05), and Music Training (mean = 21.56, SD = 10.73; ρ = 0.46, p < 0.05), but not with the other measures (ρ = 0.04 to 0.36, p > 0.05; see Figure 2 for a summary of results). Therefore, these three variables were used in the path analysis to further investigate their relationship with tone identification. Significant correlations are illustrated in the scatterplots shown in Figure 3.

Figure 2: 
Results of the Spearman correlations between all 8 variables – Tone Identification (ToneID), Reading span, Melodic memory, L1 consonant categorisation (L1), L2 consonant discrimination (L2), Pitch Discrimination (PD), Gold-MSI music training score (Music training) and Gold-MSI musical ability score (Musical ability). Correlation coefficients are given in each box and non-significant correlations marked with a cross.
Figure 2:

Results of the Spearman correlations between all 8 variables – Tone Identification (ToneID), Reading span, Melodic memory, L1 consonant categorisation (L1), L2 consonant discrimination (L2), Pitch Discrimination (PD), Gold-MSI music training score (Music training) and Gold-MSI musical ability score (Musical ability). Correlation coefficients are given in each box and non-significant correlations marked with a cross.

Figure 3: 
Scatterplots for all significant correlations. Each datapoint (i.e., a participant) is represented by a black dot and the linear regression line is shown in red.
Figure 3:

Scatterplots for all significant correlations. Each datapoint (i.e., a participant) is represented by a black dot and the linear regression line is shown in red.

A path model (Figure 4) was fitted with Pitch Discrimination, Musical Ability, and Music Training as continuous variables to predict performance on the ToneID task, i.e., tone identification ability. The multivariate normality assumption was assessed using the Henze-Zirkler test given the moderately small but sufficient sample size. Due to the violation of the multivariate normality assumption, maximum likelihood estimation with robust standard errors and the Satorra-Bentler corrected test statistic were used (Barbeau et al. 2019; Savalei 2019). The model exhibited a good fit (χ2(1) = 0.01, p = 0.92; Standardized Root Mean Square Residual [SRMR] = 0.004; Robust Root Mean Square Error of Approximation [RMSEA] = 0.00; Robust Comparative Fit Index [CFI] = 1.00). Although the RMSEA and CFI suggested potential of the model overfitting the data due to the model’s minimal complexity (df = 1), the non-significant chi-square value, lower than 0.08 SRMR indicated a good fit of the mode, and the strong theoretical justification supported the model’s reliability and generalizability.

Figure 4: 
Path model using scores from the ToneID task (TID), Pitch Discrimination (PD) and Musical Ability (MsA) as measured using the Gold-MSI as endogenous variables (whose variance is dependent on at least one of the other variables in the model). Gold-MSI Music Training score (MsT) is the exogenous variable, meaning that its variance is not dependent on any other variable in the model. The paths in the model are directional. The significance-level of the path coefficients is shown as follows; p < 0.0001 ‘***’, p < 0.001 ‘**’, p < 0.01 ‘*’, p < 0.05.
Figure 4:

Path model using scores from the ToneID task (TID), Pitch Discrimination (PD) and Musical Ability (MsA) as measured using the Gold-MSI as endogenous variables (whose variance is dependent on at least one of the other variables in the model). Gold-MSI Music Training score (MsT) is the exogenous variable, meaning that its variance is not dependent on any other variable in the model. The paths in the model are directional. The significance-level of the path coefficients is shown as follows; p < 0.0001 ‘***’, p < 0.001 ‘**’, p < 0.01 ‘*’, p < 0.05.

The path coefficients showed that Pitch Discrimination (β = 0.37, p < 0.05) and Musical Ability (β = 0.27, p < 0.05), but not Music Training (β = 0.17, p = 0.28), had a significant direct effect on Tone Identification. However, Music Training appeared to significantly influence Musical Ability (β = 0.46, p < 0.05) and Pitch Discrimination (β = 0.46, p < 0.05), and had a significant indirect influence on Tone Identification, with Pitch Discrimination (β = 0.17, p < 0.05) and Musical Ability (β = 0.12, p < 0.05) both acting as mediating variables.

As displayed in Figure 5, participants were able to identify the tones at better than chance level. Although there were some differences in how they performed with individual tones, these differences were relatively small. Tone1 was the best identified (mean = 53.33, SD = 28.50) and was identified slightly more accurately than Tones 2 and 3, both of which were identified with a similar level of accuracy (Tone2, mean = 48.72, SD = 24.62; Tone3, mean = 50.00, SD = 25.24). Tone1 was most commonly confused with Tone2 (mean = 33.33, SD = 24.85) though Tone2 itself was rarely confused with Tone1 (mean = 12.05, SD = 14.54) and instead, was most commonly confused with Tone3 (mean = 25.38, SD = 17.30). Tone3 was most commonly confused with Tone4 (mean = 18.97, SD = 14.65). The least well-identified tone was Tone4 (mean = 42.82, SD = 29.29) which was misidentified most frequently as Tone1 (mean = 33.08, SD = 22.38) or 2 (mean = 16.41, SD = 17.24). A one-way repeated measures ANOVA with the tone type as the independent variable and the percentage of correct identification of each tone as the dependent variable was used to investigate if there were any significant differences in identification of individual tones. Results showed no significant differences in identification of the four tones [F(3,152) = 1.03, p > 0.05].

Figure 5: 
Confusion matrix of the stimuli tone (i.e., Stimulus) and participants’ responses (i.e., Response) in tone identification task for each tone (percentage correct).
Figure 5:

Confusion matrix of the stimuli tone (i.e., Stimulus) and participants’ responses (i.e., Response) in tone identification task for each tone (percentage correct).

To further examine the effects of Pitch Discrimination and Musical Ability on the accuracy of identification of each tone, a multivariate multiple linear regression was conducted. Results showed that Pitch Discrimination had a significant positive overall effect on the combined outcome [F(4, 33) = 3.31, p < 0.05, Pillai’s trace = 0.29], but that Musical Ability did not [F(4, 33) = 1.48, p > 0.05, Pillai’s trace = 0.15].

Separate linear regression models for each tone showed that Pitch Discrimination significantly predicted accuracy with Tone3 (β = 0.61, t(36) = 2.32, p < 0.05) and Tone4 (β = 0.84, t = 3.12, p < 0.05), while Musical Ability predicted performance for Tone1 (β = 0.66, t = 2.15, p < 0.05) and Tone4 (β = 0.58, t = 2.04, p < 0.05). The models for Tone4 (R2 = 0.29, F(2, 36) = 8.80, p < 0.05) and Tone1 (R2 = 0.13, F(2, 36) = 3.95, p < 0.05) were significant, indicating a good fit. The model for Tone3 was marginally significant (R2 = 0.10, F(2, 36) = 3.08, p = 0.06), while the model for Tone2 was not significant (R2 = 0.04, F(2, 36) = 1.87, p > 0.05).

4 Discussion

This study examined how musical expertise and associated cognitive skills (pitch discrimination, verbal and melodic memory, phonological processing) affect the ability of English speakers’ with no previous experience of tone languages to identify Mandarin tones. Individuals can develop musical expertise without formal training (Zendel and Alexander 2020): we were interested in whether music training per se (i.e., formal or self-guided training in playing an instrument or singing) or whether the cognitive abilities that are potentially been enhanced by music training confers an advantage in perceiving and potentially learning lexical tone. We recruited participants with a range of music experience; amateur musicians with formal training, those with informal musical engagement (e.g., instrument playing or choir singing) and those with no musical training at all. In contrast to previous studies (e.g., Alexander et al. 2005; Bidelman et al. 2013; Musacchia et al. 2007), we assessed music training holistically using the Gold-MSI questionnaire and treated music training as a continuous variable to avoid grouping participants based on duration of music training.

Results showed that lexical tone identification was significantly correlated with Music Training, Pitch Discrimination, and Musical Ability, but not with working memory or general segmental speech perception abilities. Path analysis revealed that Music Training indirectly influenced Tone Identification with Pitch Discrimination and Musical Ability as mediators. Although there were no significant differences in identification of individual tones, there was an asymmetrical confusion pattern; Tone1 was most commonly misidentified as Tone2, Tone2 as Tone3, Tone4 as Tone1, and Tone3 was confused equally with Tone2 and Tone4. Pitch Discrimination and Musical Ability predicted the accuracy of identification of different tones, with Pitch Discrimination a better predictor than Musical Ability.

It is generally well-accepted that musical training facilitates identification and discrimination of lexical tone for non-tone language speakers (e.g., Chua and Brunt 2014; Delogu et al. 2010). Our results present a more nuanced picture. Only Pitch Discrimination and Musical Ability (listening skills not limited to formal training) had a significant direct effect on Tone Identification. In contrast, Music Training did not have a direct effect on Tone Identification. Instead, Music Training significantly influenced Musical Ability and Pitch Discrimination. Music Training therefore had an indirect influence on Tone Identification via Pitch Discrimination and Musical Ability. This supports the idea that perception of pitch in music and in L2 tone are transferable (e.g., Marie et al. 2011), and further indicates that the transfer of pitch ability between music and language does not require music training, in particular the large amounts of music training typical of professional musicians.

As illustrated in the path analysis, both Pitch Discrimination and Musical Ability are important mediators between Music Training and Tone Identification, but they appear to have differing effects on tone identification. Multivariate multiple regression results demonstrated that performance on our Pitch Discrimination task explained more variance in the identification of each tone than did Musical Ability. While Musical Ability was not statistically significant as a predictor in the overall regression model, it had a significant influence on the mean score of the identification of all four tones. Results from individual linear regression models for each tone indicated that music training modulates tone processing in a selective manner, with stronger facilitatory effects on some specific tones compared to others. Specifically, Pitch Discrimination performance primarily predicted the identification of Tone3 and Tone4, whereas Musical Ability predicted the identification of Tone1 and Tone4. Neither Pitch Discrimination nor Musical Ability was a good predictor of Tone2 identification. This suggests that individuals with strong pitch discrimination and musical ability, regardless of the amount of music training they have had, may perform better in Mandarin tone identification, primarily due to enhanced accuracy in perceiving Tones 1, 3, and 4.

It is widely accepted that some tones are inherently more difficult to identify than others and so it was surprising that we found no significant differences in identification of the different tones. Among the 4 Mandarin tones, Tone1 and Tone4 are commonly considered to be easier to identify for listeners with no prior experience with a tonal language while Tone3 is thought to be the hardest to identify (e.g., Kirkham et al. 2011). There was a trend in this direction: all participants performed best with Tone1 but they performed worst with Tone4 not Tone3. In contrast to previous studies (e.g., So and Best 2010), we used natural stimuli. One possibility is that variation in the natural speech recordings masked difficulty with individual tones (cf. Wong et al. 2007), especially as our study used fewer trials per tone than previous studies that have shown this hierarchy (cf. So and Best 2010; Wang et al. 1999). A smaller number of trials may also have reduced our ability to detect subtle differences in performance. Still another explanation is that unlike our study which used identification, much of the research showing differences in performance between Mandarin tones has relied on discrimination tasks. For example, discrimination of Tone2-Tone3 has been shown to be harder than Tone1-Tone4 or other possible tone pairs (e.g., Hao 2012, 2018; Tsukada and Kondo 2019). Discrimination assesses perceptual sensitivity to differences between tones, while identification requires categorisation. Specifically, in our study, it means mapping individual tones to their pitch contours as represented by the tone marks used for the response options. This task requires listeners to track pitch changes and to match them to linguistic pitch categories, which may equalise difficulty across tones due to greater cognitive engagement. Thus, while Tone2-Tone3 may be perceptually confusable in a discrimination task, identification tasks may mitigate for these differences through categorical knowledge.

Although there was no significant difference in identification of individual tones, the confusion matrix shows that tone confusions were asymmetrical with non-reciprocal confusion patterns among the four tones. While Tone1 was mostly confused with Tone2, Tone2 was mostly confused with Tone3. Interestingly, Tone4 was mostly confused with Tone1, even though it is typically thought to be easy to distinguish from Tone1 (e.g., Hao 2018). This could be related to the fact that Tone4 resembles the intonation change at the end-of-statement sentences in non tonal language, such as English (So and Best 2008, 2014) and English speakers may thus consider Tone4 as neutral and with no special pitch changes, and such match it to Tone1.

Contrary to our expectations, we did not observe a relationship between verbal or melodic working memory on tone identification performance. This was surprising given that our participants had no previous experience with Mandarin and were only familiarised with each tone at the beginning of the experiment. In our task, participants heard a single word and identified which of the 4 tones they had heard. To do our task successfully, they would have needed to access stored representations held in memory from earlier in the experiment. On the other hand, they only heard a single word on each trial and responses were given by clicking on tone marks which depict tone direction. This may have acted as a prompt and thus reduced working memory load, leading to the absence of correlations with verbal or melodic working memory.

It was also surprising that while Music Training correlated with L1 phonological processing ability, it showed no such association with L2 phonological processing. This dissociation may stem from differences in the linguistic features used in each task. The L1 task assessed sensitivity to voice onset time (VOT), whereas the L2 task focused on place of articulation. Given that VOT and musical rhythm both involve temporal processing, it is possible that music training facilitated performance in our L1 perception task. In contrast, place of articulation is specific to the speech domain (Sadakata et al. 2010), which may explain the absence of a correlation with L2 performance. Critically, neither VOT nor place of articulation sensitivity directly relates to pitch perception, which is essential to Mandarin tone identification. Thus, despite the fact that we initially hypothesised that better segmental processing (in both L1 and L2) might free attentional resources for tone identification (Liu and Ning 2021), our results did not support this. One possible explanation is that our tone identification task used Mandarin syllables with segments shared between English and Mandarin, minimising segmental difficulty for naïve English listeners. Consequently, individual differences in segmental processing may not have significantly influenced tone identification performance. Future studies could test this by comparing tone identification performance using stimuli with shared versus Mandarin-unique segments.

In sum, the present study extends prior work that has demonstrated systematic differences in Mandarin tone identification between amateur musicians and non-musicians. It investigated the relationship between music training, L2 Mandarin tone identification and the cognitive abilities that music training can enhance. Our results highlight that musical training influences Mandarin tone perception specifically through enhanced pitch discrimination and musical ability, rather than through general linguistic-phonological processing or working memory capacity. Together the results imply that music and speech share perceptual resources but that the direct influence of pitch discrimination and musical ability on tone identification is tone-specific rather than uniform across all lexical tones. These findings demonstrate that music training can support, but is not necessary, for successful L2 lexical tone perception, and as such, may confer an initial advantage for a non-tone language speaking learner of Mandarin. Future studies should consider carefully the design and type of the experimental tasks used, for example, including both identification and discrimination tasks, employing more sensitive measures of pitch discrimination and using longer speech stimuli, investigating domain-transfer effects between music and individual tones, as well as extending these findings to natural sentence contexts.


Corresponding author: Xiao Fu, Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK, E-mail:

Award Identifier / Grant number: 201808060312

Acknowledgments

This work was supported by a China Scholarship Council PhD studentship to the first author. Portions of this work appeared in the Proceedings of the 19th International Congress of Phonetic Sciences.

  1. Research ethics: This study has been approved by the UCL Research Ethics Committee: SHaPS-2018-BE-026.

  2. Author contributions: Xiao Fu designed and performed the experiments, derived the models and analysed the data, wrote the manuscript. Bronwen G. Evans supervised all the progress of the project and contributed towards the writing of the manuscript.

  3. Conflict of interest: There were no conflicts of interest.

References

Alexander, J. A., P. C. M. Wong & A. R. Bradlow. 2005. Lexical tone perception in musicians and non-musicians. Paper presented at 9th european conference on speech communication and technology, 397–400. Lisbon, Portugal.10.21437/Interspeech.2005-271Search in Google Scholar

Barbeau, K., K. Boileau, F. Sarr & K. Smith. 2019. Path analysis in Mplus: A tutorial using a conceptual model of psychological and behavioral antecedents of bulimic symptoms in young adults. The Quantitative Methods for Psychology 15(1). 38–53. https://doi.org/10.20982/tqmp.15.1.p038.Search in Google Scholar

Bidelman, G. M. & C. Alain. 2015. Hierarchical neurocomputations underlying concurrent sound segregation: Connecting periphery to percept. Neuropsychologia 68. 38–50. https://doi.org/10.1016/j.neuropsychologia.2014.12.020.Search in Google Scholar

Bidelman, G. M., J. T. Gandour & A. Krishnan. 2011. Musicians and tone-language speakers share enhanced brainstem encoding but not perceptual benefits for musical pitch. Brain and Cognition 77(1). 1–10. https://doi.org/10.1016/j.bandc.2011.07.006.Search in Google Scholar

Bidelman, G. M., S. Hutka & S. Moreno. 2013. Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: Evidence for bidirectionality between the domains of language and music. PLoS One 8(4). e60676. https://doi.org/10.1371/journal.pone.0060676.Search in Google Scholar

Boersma, P. & D. Weenink. 2016. Praat: Doing phonetics by computer (computer program). http://www.praat.org/.Search in Google Scholar

Chen, S., Y. Zhu, R. Wayland & Y. Yang. 2020. How musical experience affects tone perception efficiency by musicians of tonal and non-tonal speakers? PLoS One 15(5). e0232514. https://doi.org/10.1371/journal.pone.0232514.Search in Google Scholar

Chobert, J. & M. Besson. 2013. Musical expertise and second language learning. Brain Sciences 3(4). 923. https://doi.org/10.3390/brainsci3020923.Search in Google Scholar

Chua, A. J. & J. Brunt. 2014. Effect of musical experience on tonal language perception. Journal of the Acoustical Society of America 21. 060009. https://doi.org/10.1121/2.0000088.Search in Google Scholar

Collins, T. & R. Laney. 2017. Computer-generated stylistic compositions with long-term repetitive and phrasal structure. Journal of Creative Music Systems 1. https://doi.org/10.5920/JCMS.2017.02.Search in Google Scholar

Correia, Ana Isabel, Margherita Vincenzi, Patrícia Vanzella, Ana P Pinheiro, E Glenn Schellenberg & César F Lima. 2022. Individual differences in musical ability among adults with no music training. Quarterly Journal of Experimental Psychology 76(7). 1585–1598.10.1177/17470218221128557Search in Google Scholar

Corretge, R. 2012–2025. Praat vocal toolkit. https://www.praatvocaltoolkit.com.Search in Google Scholar

Cruttenden, A. 1997. Intonation (Cambridge textbooks in linguistics), P68–P127. Cambridge: Cambridge University Press.Search in Google Scholar

Degé, F. & G. Schwarzer. 2011. The effect of a music program on phonological awareness in preschoolers. Frontiers in Psychology 2(124). 124. https://doi.org/10.3389/fpsyg.2011.00124.Search in Google Scholar

Delogu, F., G. Lampis & M. O. Belardinelli. 2010. From melody to lexical tone: Musical ability enhances specific aspects of foreign language perception. European Journal of Cognitive Psychology 22(1). 46–61. https://doi.org/10.1080/09541440802708136.Search in Google Scholar

Giuliano, R. J., P. Q. Pfordresher, E. M. Stanley, S. Narayana & N. Y. Wicha. 2011. Native experience with a tone language enhances pitch discrimination and the timing of neural responses to pitch change. Frontiers in Psychology 2. 146. https://doi.org/10.3389/fpsyg.2011.00146.Search in Google Scholar

Hao, Y. C. 2012. Second language acquisition of Mandarin Chinese tones by tonal and non-tonal language speakers. Journal of Phonetics 40. 269–279. https://doi.org/10.1016/j.wocn.2011.11.001.Search in Google Scholar

Hao, Y. C. 2018. Second language perception of Mandarin vowels and tones. Language and Speech 61. 135–152. https://doi.org/10.1177/0023830917717759.Search in Google Scholar

Harrison, P. M. C. & D. Müllensiefen 2018. Melodic Discrimination Test (MDT). psychTestR Implementation. https://doi.org/10.5281/zenodo.1300950.Search in Google Scholar

Harrison, P. M., D. Müllensiefen & T. Collins. 2017. Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation. Scientific Reports 7(1). https://doi.org/10.1038/S41598-017-03586-Z.Search in Google Scholar

Hazan, V., S. Messaoud-Galusi, S. Rosen, S. Nouwens & B. Shakespeare. 2009. Speech perception abilities of adults with dyslexia: Is there any evidence for a true deficit? Journal of Speech Language Hearing Research 52. 1510–1529. https://doi.org/10.1044/1092-43882009/08-0220.Search in Google Scholar

Hyde, K. L., J. Lerch, A. Norton, M. Forgeard, E. Winner, A. C. Evans & G. Schlaug. 2009. The effects of musical training on structural brain development: A longitudinal study. Annals of the New York Academy of Sciences 1169. 182–186. https://doi.org/10.1111/j.1749-6632.2009.04852.x.Search in Google Scholar

Iverson, P., M. Pinet & B. Evans. 2012. Auditory training for experienced and inexperienced second-language learners: Native French speakers learning English vowels. Applied PsychoLinguistics 33(1). 145–160. https://doi.org/10.1017/S0142716411000300.Search in Google Scholar

Jaschke, A. C., H. Honing & E. J. A. Scherder. 2018. Longitudinal analysis of music education on executive functions in primary school children. Frontiers in Neuroscience 12. 103. https://doi.org/10.3389/fnins.2018.00103.Search in Google Scholar

Kirkham, J., S. Lu, R. Wayland & E. Kaan. 2011. Comparison of vocalists and instrumentalists on lexical tone perception and production tasks. Paper presented at the 17th international congress of phonetic sciences, 1098–1101. August 17–21, Hong Kong, China.Search in Google Scholar

Kittler, P., S. J. Krinsky-McHale & D. A. Devenny. 2004. Semantic and phonological loop effects on verbal working memory in middle-age adults with mental retardation. American Journal on Mental Retardation 109. 467–480.10.1352/0895-8017(2004)109<467:SAPLEO>2.0.CO;2Search in Google Scholar

Klatt, D. 1980. Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America 67(3). 971–995. https://doi.org/10.1121/1.383940.Search in Google Scholar

Kraus, N. & B. Chandrasekaran. 2010. Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8). 599–605. https://doi.org/10.1038/nrn2882.Search in Google Scholar

Krishnan, A., Y. Xu, J. Gandour & P. Cariani. 2005. Encoding of pitch in the human brainstem is sensitive to language experience. Cognitive Brain Research 25(1). 161–168. https://doi.org/10.1016/j.cogbrainres.2005.05.004.Search in Google Scholar

Ladefoged, P. & K. Johnson. 2014. A course in phonetics, 7th edn., P264–P266. Boston, MA: Cengage Learning.Search in Google Scholar

Lappe, C., S. C. Herholz, L. J. Trainor & C. Pantev. 2008. Cortical plasticity induced by short-term unimodal and multimodal musical training. Journal of Neuroscience 28(39). 9632–9639. https://doi.org/10.1523/JNEUROSCI.2254-08.2008.Search in Google Scholar

Levitt, H. 1971. Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America 49(2, Pt. 2). 467–477. https://doi.org/10.1121/1.1912375.Search in Google Scholar

Lin, M. & A. L. Francis. 2014. Effects of language experience and expectations on attention to consonants and tones in English and Mandarin Chinese. Journal of the Acoustical Society of America 136(5). 2827–2838. https://doi.org/10.1121/1.4898047.Search in Google Scholar

Liu, Y. & J. Ning. 2021. The effect of language dominance on the selective attention of segments and tones in Urdu-Cantonese speakers. Frontiers in Psychology 12. 710713. https://doi.org/10.3389/fpsyg.2021.710713.Search in Google Scholar

Marie, C., F. Delogu, G. Lampis, M. O. Belardinelli & M. Besson. 2011. Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive Neuroscience 23(10). 2701–2715. https://doi.org/10.1162/jocn.2010.21585.Search in Google Scholar

Magis, D. & G. Raiche. 2012. On the relationships between jeffreys modal and weighted likelihood estimation of ability under logistic IRT models. Psychometrika 77(1). 163–169. https://doi.org/10.1007/s11336-011-9233-5.Search in Google Scholar

Mankel, K. & G. Bidelman. 2018. Inherent auditory skills rather than formal music training shape the neural encoding of speech. Proceedings of the National Academy of Sciences of the United States of America 115(51). 13129–13134. https://doi.org/10.1073/pnas.1811793115.Search in Google Scholar

Matthews, T. E., J. N. L. Thibodeau, B. P. Gunther & V. B. Penhune. 2016. The impact of instrument-specific musical training on rhythm perception and production. Frontiers in Psychology 7. 69. https://doi.org/10.3389/fpsyg.2016.00069.Search in Google Scholar

McCarthy, K. M., M. Mahon, S. Rosen & B. G. Evans. 2014. Speech perception and production by sequential bilingual children: A longitudinal study of voice onset time acquisition. Child Development(85). 1965–1980. https://doi.org/10.1111/cdev.12275.Search in Google Scholar

McDermott, J. H. & A. J. Oxenham. 2008. Music perception, pitch, and the auditory system. Current Opinion in Neurobiology 18(4). 452–463. https://doi.org/10.1016/j.conb.2008.09.005.Search in Google Scholar

Moreno, S., C. Marques, A. Santos, M. Santos, S. L. Castro & M. Besson. 2009. Musical training influences linguistic abilities in 8-year-old children: More evidence for brain plasticity. Cerebral Cortex 19(3). 712. https://doi.org/10.1093/cercor/bhn120.Search in Google Scholar

Müllensiefen, D., B. Gingras, J. Musil & L. Stewart. 2014. The musicality of non-musicians: An index for assessing musical sophistication in the general population. PLoS One 9. 1–23. https://doi.org/10.1371/journal.pone.0089642.Search in Google Scholar

Musacchia, G., M. Sams, E. Skoe & N. Kraus. 2007. Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proceedings of the National Academy of Sciences 104(40). 15894–15898. https://doi.org/10.1073/pnas.0701498104.Search in Google Scholar

Peirce, J. W. 2007. Psychopy – psychophysics software in python. Journal of Neuroscience Methods 162(1-2). 8–13. https://doi.org/10.1016/j.jneumeth.2006.11.017.Search in Google Scholar

Perrachione, T. K., J. Lee, L. Y. Y. Ha & P. C. M. Wong. 2011. Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. Journal of the Acoustical Society of America 130(1). 461–472. https://doi.org/10.1121/1.3593366.Search in Google Scholar

Pfordresher, P. Q. & S. Brown. 2009. Enhanced production and perception of musical pitch in tone language speakers. Attention, Perception, & Psychophysics 71(6). 1385–1398. https://doi.org/10.3758/APP.71.6.1385.Search in Google Scholar

Ramus, F., S. Rosen, S. Dakin, B. L. Day, J. M. Castellote, S. White & U. Frith. 2003. Theories of developmental dyslexia: Insights from a multiple case study of dyslexic adults. Brain 126. 841–865. https://doi.org/10.1093/brain/awg076.Search in Google Scholar

Sadakata, M., L. v. d. Zanden & K. Sekiyama. 2010. Influence of musical training on perception of L2 speech. Paper presented at the 11th annual conference of the international speech communication association, 118–121. Makuhari, Japan.10.21437/Interspeech.2010-64Search in Google Scholar

Savalei, V. 2019. A comparison of several approaches for controlling measurement error in small samples. Psychological Methods 24(3). 352–370. https://doi.org/10.1037/met0000181.Search in Google Scholar

Schön, D., C. Magne & M. Besson. 2004. The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology 41(3). 341–349. https://doi.org/10.1111/1469-8986.00172.x.Search in Google Scholar

So, C. K. & C. T. Best. 2008. Do English speakers assimilate Mandarin tones to English prosodic categories? Paper presented at INTERSPEECH 2008, 9th annual conference of the international speech communication association, Brisbane, Australia, September 22–26.Search in Google Scholar

So, C. K. & C. T. Best. 2010. Cross-language perception of non-native tonal contrasts: Effects of native phonological and phonetic influences. Language and Speech 53. 273–293. https://doi.org/10.1177/0023830909357156.Search in Google Scholar

So, C. K. & C. T. Best. 2014. Phonetic influences on English and French listeners’ assimilation of Mandarin tones to native prosodic categories. Studies in Second Language Acquisition 36. 195–221. https://doi.org/10.1017/S0272263114000047.Search in Google Scholar

Stevens, C. J., P. E. Keller & M. D. Tyler. 2011. Tonal language background and detecting pitch contour in spoken and musical items. Psychology of Music 41(1). 59–74. https://doi.org/10.1177/0305735611415749.Search in Google Scholar

Talamini, F., G. Altoè, B. Carretti & M. Grassi. 2017. Musicians have better memory than nonmusicians: A meta-analysis. PLoS One 12(10). 1–21. https://doi.org/10.1371/journal.pone.0186773.Search in Google Scholar

Tsukada, K. & M. Kondo. 2019. The perception of Mandarin lexical tones by native speakers of Burmese. Language and Speech 62(4). 625–640. https://doi.org/10.1177/0023830918806550.Search in Google Scholar

Van Den Noort, M., P. Bosch, M. Haverkort & K. Hugdahl. 2008. A standard computerized version of the reading span test in different languages. European Journal of Psychological Assessment 24(1). 35–42. https://doi.org/10.1027/1015-5759.24.1.35.Search in Google Scholar

Wang, Y., M. M. Spence, A. Jongman & J. A. Sereno. 1999. Training American listeners to perceive Mandarin tones. Journal of the Acoustical Society of America 106(6). 3649–3658. https://doi.org/10.1121/1.428217.Search in Google Scholar

Wennerstrom, A. 2001. Intonation and evaluation in oral narratives. Journal of Pragmatics 33(8). 1183–1206.10.1016/S0378-2166(00)00061-8Search in Google Scholar

Wong, P. C. M. & T. K. Perrachione. 2007. Learning pitch patterns in lexical identification by native English-speaking adults. Applied PsychoLinguistics 28(4). 565–585.10.1017/S0142716407070312Search in Google Scholar

Wong, P. C. M., E. Skoe, N. M. Russo, T. Dees & N. Kraus. 2007. Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4). 420–422. https://doi.org/10.1038/nn1872.Search in Google Scholar

Yip, M. 2002. Tone. Cambridge: Cambridge University Press.Search in Google Scholar

Zendel, B. R. & E. J. Alexander. 2020. Autodidacticism and music: Do self-taught musicians exhibit the same auditory processing advantages as formally trained musicians? Frontiers in Neuroscience 14. https://doi.org/10.3389/fnins.2020.00752.Search in Google Scholar

Received: 2024-12-17
Accepted: 2025-08-21
Published Online: 2025-09-18
Published in Print: 2025-10-27

© 2025 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 29.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/phon-2024-0055/html
Scroll to top button