Abstract
John Ohala claimed that the source of sound change may lie in misperceptions which can be replicated in the laboratory. We tested this claim for a historical change of /t/ to /k/ in the coda in the Southern Min dialect of Chaoshan. We conducted a forced-choice segment identification task with CVC syllables in which the final C varied across the segments [p t k ʔ] in addition to a number of further variables, including the V, which ranged across [i u a]. The results from three groups of participants whose native languages have the coda systems /p t k ʔ/ (Zhangquan), /p k ʔ/ (Chaoshan) and /p t k/ (Dutch) indicate that [t] is the least stably perceived segment overall. It is particularly disfavoured when it follows [a], where there is a bias towards [k]. We argue that this finding supports a perceptual account of the historically documented scenario whereby a change from /at/ to /ak/ preceded and triggered a more general merger of /t/ with /k/ in the coda of Chaoshan. While we grant that perceptual sound changes are not the only or even the most common type of sound change, the fact that the perception results are essentially the same across the three language groups lends credibility to Ohala’s perceptually motivated sound changes.
1 Introduction
Sound changes frequently result from gradual articulatory adaptations of a conservative towards an innovative realization of some phonological element. For instance, as late Middle English /iː/ of time developed an increasingly wide upward tongue glide, it was at some point no longer interpreted and phonologically represented as a long/tense high monophthong /iː/, but as a diphthong, /əɪ/ or /aɪ/. While in the initial phase of the diphthongization younger generations may have differed from their older contemporaries in the details of the phonetic implementation of /iː/, after the reinterpretation of the vowel as a diphthong, along with related changes participating in the Great Vowel Shift, a new vowel system had arisen. This Neogrammarian scenario of gradually changing pronunciations followed by a phonological reinterpretation enjoys wide recognition (Blevins 2010; Hyman 1976; Lightfoot 2010, 2013; Recasens 2020; Yu 2013). There have, however, been proposals for a radically different type of sound change. A number of linguists, most notably Ohala (1981), but also Lindblom et al. (1995), Hume and Johnson (2001) and Blevins (2004), have drawn attention to the fact that phonological reinterpretations may take place abruptly, without intervening intermediary phonetic precursors. Specifically, Ohala (1981, 1985, 1986, 1992 argued that listeners may create phonological innovations by misparsing the speech signal, while pointing out that these are typically replicable in the laboratory (Ohala 1989, 1993a). Two types of misparsing he discusses are hypo-correction, as when a fronting of [u] due to a palatal environment is interpreted as an inherent feature of the vowel, causing a perception as [y], and hyper-correction, as when the frontness of [y] in a palatal context is interpreted as a coarticulation effect on [u].
Our interest here is to provide experimental data to support a perceptual interpretation of a sound change /t/→/k/ in coda position after /a/ in the Chaoshan variety of Southern Min spoken in Guangdong Province, China. In this case, it is difficult to attribute an assumed misperception as resulting from either hypo-correction or hyper-correction, because the effects of [k] and [t] on formant transitions of preceding vowels have not been sufficiently charted, which also makes any predictions about the directionality of the change speculative. We therefore interpret the change from [at] to [ak] as an example of the more general acoustic ambiguity in speech recognized by Ohala (1981), which Recasens refers to as the ‘acoustic equivalence hypothesis’ (2020: 7).
The subsequent generalization of the change /t/→/k/ in syllables with other vowels than /a/ does not figure in our research question. This phonological change is due to analogy, similar to the regularization of exceptional forms, word boundary shifts of the type occurring in English (an) adder, earlier nǣddre, and generalizations of phonological rules (Kiparsky 1982).
As far as we are aware, this is the first targeted experiment that seeks to support the perceptual basis of an attested sound change. The sound change is similar to the more widely reported changes of /t/→/k/ in all environments in Austronesian languages spoken in the Pacific and Southeast Asia (Blust 2004), although those languages, unlike Chaoshan, had no /k/ before the change. In Chaoshan, a change of this type ultimately merged /t/ in coda position with /k/, restricting /t/ to onset position. This change does not follow the tendency noted by Chen (1973) for coda mergers to remove labials first, but is unambiguously attested in historical sources. Table 1 presents a comparison between the historical forms taken from the Da Song Chongxiu Guangyun, a Middle Chinese rime dictionary dating back to AD 1008 and generally known as Guǎngyùn (see the rows in the table), together with the present-day pronunciations as reported by Lin and Chen (1996). The absence of a row for /ʔ/ and the disappearance of occurrences in the column for /t/ indicate that an older coda system of /p t k/ developed into /p k ʔ/. More specifically, Table 1 shows that the majority of 208 words that ended in /p k/ in AD 1008 still did so in 1996. This is true for 69 % of the words originally ending in /p/ and for 64 % of the words originally ending in /k/. The percentages of occurrences of /p t k/ that developed into /ʔ/ are about the same, about one-third of cases (respectively. 29 %, 33 %, and 32 %). This suggests that around the same time that /t/ developed into /k/, approximately one third of instances of /p t k/ developed into /ʔ/. This independent change has no direct bearing on the /t/→/k/ change and has been explained as the replacement of /p t k/ by /ʔ/ in diminutive forms in which the diminutive suffix /-ʔ/ was used (Tsao and Chen 2012). That is, there are no predictions deriving from the forms with /ʔ/-codas that are relevant to our hypotheses of a perceptually induced sound change as formulated at the close of this section.
Frequencies of present-day reflexes based on Lin and Chen (1996) of historical Chaoshan coda /p t k/ based on the Guǎngyùn rime dictionary (Chen and Qiu 1088). The bold cell gives the percentage of /t/→/k/ cases in coda position, while the column headed /-ʔ/ shows the percentages of glottal stops in words originally ending in /p t k/, which presumably go back to an original suffixal glottal stop.
Present-day forms | |||||
---|---|---|---|---|---|
Historical forms AD 1008 | /-p/ | /-t/ | /-k/ | /-ʔ/ | Deleted |
/-p/ (49) | 34 (69 %) | 0 | 1 (2 %) | 14 (29 %) | 0 |
/-t/ (96) | 0 | 0 | 62 (65 %) | 32 (33 %) | 2 (2 %) |
/-k/ (159) | 0 | 0 | 101 (64 %) | 52 (32 %) | 6 (4 %) |
Sound changes, whether of the Neogrammarian type or the Ohalian perceptual type, often begin in a specific context, only to become more general later. For example, the diphthongization of Inland Northern American English /æ/ to [eə] in cities like Chicago, Buffalo, and Syracuse began before coda nasals, but was later extended to voiced oral consonants (Labov et al. 2006). Our database for considering the inception of the Chaoshan /t/→/k/ change, to use the term suggested by Lass (1976: 59), consists of 62 monosyllabic words in which /t/ merged with /k/. Significantly, Goddard’s (1883) dictionary of the Chaozhou, variety of Chaoshan, and Fielde’s (1883) dictionary of the closely related Shantou variety give different data for /t/ after /a/. In Chaozhou, /at/ was still unchanged, but in Shantou /t/ after /a/ had categorically been replaced with /k/. That is, towards the end of the 19th century, the Shantou variety had forms with /ak/ corresponding to Chaozhou /at/, while those with /it/ and /ut/ were still the same in the two varieties, as shown in Table 2. This reveals the crucial intermediate stage between the occurrence of coda /t/ in the 1883 Chaozhou dialect and its wholesale replacement with /k/ in the present-day variety (Lin and Chen 1996). This suggests that the sound change /t/→/k/ began when the stop was preceded by /a/, as indicated in the shaded area, and must have generalized subsequently to words with other vowels. No words with coda /t/ appear in Lin and Chen (1996).
Representative examples of present-day words ending in /k/ that go back to /t/ in the Chaozhou and Shantou varieties of Chaoshan. Data from Goddard (1883), Fielde (1883) and Lin and Chen (1996). The grey cells refer to the words with an early change from [t] to [k].
Chaozhou 1883 | Shantou 1883 | Chaozhou/Shantou 1996 | ||
---|---|---|---|---|
筆 | ‘pen’ | pit | pit | pik |
吉 | ‘auspicious’ | kit | kit | kik |
匹 | ‘to be equal to’ | phit | phit | phik |
達 | ‘to reach’ | tat | tak | tak |
察 | ‘to investigate’ | tʃʰat | tʃʰak | tʃʰak |
結 | ‘to end’ | kat | kak | kak |
突 | ‘suddenly’ | tut | tut | tuk |
骨 | ‘bone’ | kut | kut | kuk |
屈 | ‘to bend’ | khut | khut | khuk |
Given the historical development outlined above, a perception experiment that aims to present data to support it should address three predictions about general listener behaviour. Hypothesis I focuses on the unidirectional confusion of [t] with [k] after [a], while Hypothesis II singles out [at] as more confusable with [ak] than with either [ap] or [aʔ]. Hypothesis III concerns the negative effect of [a] compared to [i u] on the recognition of the coronal and velar articulation places of the coda.
[at] is more frequently misperceived as [ak] than [ak] is as [at].
[at] is more frequently misperceived as [ak] than as either [ap] or [aʔ].
[at] is more frequently misperceived as [ak] than [it] and [ut] are as [ik] and [uk], respectively.
2 Methods
We tested the hypotheses in a phoneme identification task with natural stimuli obtained from a speaker of Taiwanese Southern Min, in which all required syllable rhymes are available with identical onset and tone conditions and coda plosives are unreleased. The task was performed by three groups of participants with different language backgrounds. Section 2.1 describes the construction of the stimuli, while Section 2.2 describes the recruitment and characteristics of the participant groups as well as the administration and structure of the experiments.
2.1 Stimuli
Taiwanese Southern Min allows all four experimental consonants /p t k ʔ/ in its coda and possesses the three vowels [i a u]. We decided to use syllables with an aspirated [ph th kh] or unaspirated voiceless onset [p t k], potentially 6 (onsets) * 3 (vowels) * 4 (codas), or 72 words. Since we wanted to avoid effects of voice quality in the source stimuli, we excluded low-toned words, which often have creak. By selecting only high-toned words, our source stimuli all had modal voice. The words were recorded by a 48-year-old male speaker of Taiwanese Southern Min, who used a condenser microphone and CoolEdit Pro 2.0 (44.1 kHz sampling rate and 16 bit quantization level. He read each word three times, with ample pauses between them, from a list in Chinese characters, where necessary disambiguated by the addition of a representation in bei-wei-zi, an IPA-like transcription system used to spell Southern Min, as shown in the Appendix. We only recorded real words, after finding that our speaker could not pronounce nonce words naturally as words. We took four to be the minimum number of recorded words per rhyme for including that rhyme in the data, which led to the exclusion of [up, uk, ik], for which only two out of the potential six words were available. Table 3 shows the numbers of words for each included rhyme, amounting to 48 words in total.
The number of high-toned words with each of the twelve possible rimes in Taiwanese Southern Min that were included in the experiment. The rimes [ik], [up] and [uk] occur in fewer than three words and were not included in the experiment.
Vowel | Coda | |||
---|---|---|---|---|
p | k | ʔ | t | |
I | 4 | 0 | 5 | 6 |
A | 4 | 6 | 5 | 6 |
U | 0 | 0 | 6 | 6 |
The best of the three tokens of each of the 48 words was selected as a source file for the generation of the stimuli. The 48 source files were curated in three ways, using Praat (Boersma and Weenink 1992–2016). First, instead of normalizing intensity levels by equalizing these throughout each source file, we largely preserved the original intensity profiles by normalizing the root mean square amplitude (RMS) of a 20 ms window over the highest original intensity in each source file to 60 dB and adjusting the RMS of the remainder proportionately. Second, since Southern Min has a binary tone contrast in syllables closed by a stop (‘checked’ syllables), we included a set with 164 Hz and one with 123 Hz, each with declining slopes of approximately 20 ms, in order to see if pitch (and if so presumably also tone) makes a difference. The high and low Hz values were based on the mean f0 of 14 high-toned stimuli and the mean f0 of a segmentally equivalent group of 14 low-toned words recorded by the same speaker. Third, although the effect of added noise might be small in VC stimuli (Wang and Bilger 1973), we included a noise variable, creating three sets of stimuli with varying levels of white noise superimposed on the signal to enhance our chances of obtaining adequate levels of misperceptions. Accordingly, two levels of white noise were superimposed on copies of the 96 manipulated speech files so as to create two additional sets of stimuli with harmonic-to-noise ratios 10 and 20 dB below average RMS for the vowel in the original stimulus. For instance, with a maximal intensity level of 60 dB, the noise masking of −10 dB and −20 dB corresponded to harmonic-to-noise ratios of 50 dB and 40 dB, respectively. Figure 1 shows spectrograms and waveforms of the three noise conditions for the syllable [tap].
![Figure 1:
Spectrograms and waveforms of three noise conditions for the syllable [tap] as manipulated with high f0 with no superimposed noise (panel a), −10 dB (panel b) and −20 dB (panel c).](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_001.jpg)
Spectrograms and waveforms of three noise conditions for the syllable [tap] as manipulated with high f0 with no superimposed noise (panel a), −10 dB (panel b) and −20 dB (panel c).
2.2 Procedure
For the purposes of the phoneme identification task, we selected three languages with coda stop systems that varied across the four places of articulation of interest, henceforth PoAs, as shown in Table 4. None of these was the language in which the stimuli were spoken. We first chose a variety of Southern Min that has undergone the historical /t/→/k/ change, Chaoshan Southern Min, with its /p k ʔ/ coda system (Goddard 1883). It is spoken by 10 million speakers in the east of Guangdong Province in China. As a representative of a language with a complete /p t k ʔ/ system, we chose Zhangquan Southern Min, spoken by 11.7 million speakers in Fu-Jian province[1] and widely regarded as standard Southern Min (Zhou 2006). Both varieties have unreleased coda plosives and around 37 segmentally distinct rhymes, in addition to tone (Lin and Chen 1996). All participants in the Chaoshan and Zhangquan groups are bilingual with Standard Mandarin and their native dialect. In addition, we added a language without lexical tone and with released coda plosives. Dutch, spoken by around 25 million speakers, has 16 vowels as well as 17 consonants, 11 of which can appear in the coda, including /p t k/. The segment [ʔ] variably occurs as a predictable segment before vowel-initial syllables, but does not appear in the coda (Booij 1995; Gussenhoven 1992). All three languages have vowels like [i a u] that can appear before all their voiceless stops. The addition of Dutch broadens the typological range of our participants’ language backgrounds, thus increasing the generalizability of our findings.
Phonological and phonetic properties of the native language of the speaker of the stimuli and of those of the three groups of participants.
Status | Language | Family | Coda stops | Coda gaps | Released | Tone |
---|---|---|---|---|---|---|
Production | Taiwanese | Southern Min | p t k ʔ | - | No | Yes |
Reception | Chaoshan | Southern Min | p - k ʔ | t | No | Yes |
Zhanqquan | Southern Min | p t k ʔ | - | No | Yes | |
Dutch | Germanic | p t k - | ʔ | Yes | No |
We measured the performance of 128 Chaoshan and 146 Zhangquan participants with the help of a pre-test, for which we first familiarized participants with the phonetic values of the symbols <p>, <k>, <t> and <h>, the latter of which stood for the glottal stop. This familiarization involved a randomized presentation of 12 computer screens with examples, each with a large symbol and the simultaneous playback of a stimulus containing the corresponding stop in its coda. Each stop was represented three times in these stimuli (([pakH, puʔL, tipH, tuhL, kapH, phiʔL, thakH, khakL, khipH, putH, thitL, katL).The Chaoshan participants remained uncomfortable with the interpretation of <t>, evidently due to its absence from the coda of their language, for which reason we excluded this coda from their pretest. For the Chaoshan participants, the pre-test therefore consisted of ten trials, whereby the last three in the above list were replaced with [khuʔL]]. Stimuli were presented twice with a 500 ms interval and preceded by a 100 ms warning signal that ended 100 ms before the start of the first stimulus in a per-subject randomized order over headphones. Participants made their choice by pressing one of three or four buttons assigned to each coda on their keyboard. There was no time limit. Fifty-four Chaoshan participants (42 %), 39 of whom were female, passed the predetermined threshold of 6 correct scores, while 37 Zhangquan participants (26 %), 31 of whom were female, passed the equivalent threshold of 6 correct scores out of 12. Both thresholds are high enough to assure that participants are able to both perceive the different stops and to distinguish the graphemes. While all participants spoke either Chaoshan or Zhangquan at home while also using Mandarin outside the home, a potentially complicating factor was that Mandarin was more dominant for the Zhangquan than the Chaoshan participants, as reflected in its more frequent use in conversations with friends (58.0 %) compared with the Chaoshan group (34.4 %). Because Mandarin has no coda stops, we checked for any bias in the ratios of frequent Mandarin users in the accepted versus rejected groups, which turned out to be very similar (0.8 for Chaoshan and 0.9 for Zangquan) and indicate that the frequent Mandarin users were somewhat more frequent in both groups of accepted participants.
The full set of 288 stimuli, including the three noise levels, was presented in the same manner as the pre-test to the Chaoshan and Zhangquan participants, with a break halfway through. For the Chaoshan participants, we used <other> instead of <t>, in order to avoid their earlier confusion over that symbol. The time taken by each participant varied between 45 and 70 min, due to the self-paced nature of the task. We additionally recruited 12 female and 4 male speakers of standard Netherlandic Dutch who were raised in the Netherlands outside the tonal area in the south-east. They were presented with the familiarization procedure, but did not do a pre-test, since they knew three of the four symbols on their answer screen from their writing system, <p>, <t>, and <k>. We used <other> to stand for [ʔ], which was explained to them as the sound appearing after schwa and before vowels word-internally, as in [xə.ˈʔɑxt] geacht ‘honoured’. They were recruited from student populations, all of whom declared to be able to speak English and eight also German. In order to run the test comfortably within the time period covered by the standard hourly fee, the Dutch participants were presented with 192 stimuli with a two-level noise condition (0 dB and −10 dB) in a single session, with ten trial filler stimuli preceding the experimental stimuli, which were intended to familiarize participants tacitly with the task. The manner of presentation was otherwise identical to the pre-test administered to the Chaoshan and Zhangquan groups.
3 Results
Overall, the three participant groups produced considerable numbers of misperceptions, 39 % (Chaoshan), 43 % (Zhangquan), and 36 % (Dutch). In 3.1, we report the results of mixed-effects analyses of the accuracy scores for the three listener groups separately for all nine rhyme types. In 3.2 we investigate the response patterns that are relevant to the three hypotheses.
3.1 Results: simple effects
For our mixed-effects analyses, we used glmer, package lme4, available in R, with the specification of family as binomial and participant as a random factor. The dependent variable is the accuracy of the coda consonant, coded as ‘1’ for correct and ‘0’ for incorrect. Since a single word represented each combination of the experimental conditions, we could not include Word as a random effect.[2] We included six experimental manipulations as fixed variables (Vowel, Place of Articulation of the coda [CodaPoA or PoA], Noise, Tone, OnsetAspiration, OnsetPoA). Because of the absence of three rhyme type combinations (cf. Table 3), interactions between vowel and CodaPoA could not be investigated.
We tested the six simple effects for the three language groups separately, after observing many interaction effects between language group and the six predictors. Table 5 gives the results in terms of the regression weights of the six simple effects for the separate language groups, with standard errors in parentheses. A positive value means that this level has a positive effect on the identification of the coda consonant compared to that of the reference level (the first level), while a negative outcome similarly indicates a negative effect. The larger their absolute value, the stronger is the effect. Two predictors were significant across all three participant groups, Vowel and CodaPoA. These also had the strongest effects, all in the same direction, as shown by the regression coefficients in Table 5. The regression weights and the significance levels of the core effects for Vowel and CodaPoA were robust against the removal of the effects of the four other variables from the analysis. These four predictors failed to reach significance in one or more language groups or for one or more levels of the predictor. Where they did reach significance, effect sizes varied across language groups. Tone is never significant, while noise has a negative effect only at the harmonic-to-noise level of −20 dB. (This level was not included in the experiment with the Dutch language group.) Aspiration positively affected the Chaoshan scores only, for which we have no explanation, given that Zhangquan has initial aspirated plosives, like Chaoshan and unlike Dutch. Finally, the only effect for OnsetPoA was that for [t], which negatively affected identification of CodaPoA in the Zhangquan and Dutch language groups, with small effects. The overall conclusion is that the four secondary variables are not, or only weakly related to the identification of the PoA of the coda.
Regression coefficients of the six simple effects for the three language groups with standard errors in parentheses, their size indicating the strength of their negative or positive effects. The reference category is the first effect category, which standardly has a value of 0.000.
Chaoshan | Zhangquan | Dutch | ||
---|---|---|---|---|
Vowel | a | |||
i | 0.331 (0.046)*** | 0.157 (0.052)** | 0.453 (0.102)*** | |
u | 0.777 (0.050)*** | 0.537 (0.057)*** | 1.589 (0.124)*** | |
CodaPoA | p | |||
k | –0.494 (0.067)*** | –1.262(0.083)*** | –1.220(0.165)*** | |
ʔ | –1.307(0.055)*** | –1.409(0.066)*** | –0.942 (0.139)*** | |
t | –1.831(0.056)*** | –1.380 (0.066)*** | –1.454 (0.138)*** | |
Noise | 0 dB | |||
–10 dB | –0.067 (0.043) | 0.012 (0.050) | – 0.158 (0.083) | |
–20 dB | –0.143 (0.043)*** | –0.156 (0.050)** | n.a. | |
Tone | Low | |||
High | 0.014 (0.035) | 0.023 (0.041) | –0.090 (0.083) | |
Aspiration of the onset | No | |||
Yes | 0.143 (0.035)*** | 0.052 (0.041) | –0.087 (0.084) | |
OnsetPoA | p | |||
t | 0.067 (0.047) | –0.185 (0.054)*** | –0.234 (0.110)* | |
k | –0.034 (0.047) | –0.075 (0.054) | –0.119 (0.111) |
-
Standard errors are given in parentheses. Significant effects: *p < 0.05, **p < 0.01, ***p < 0.001; n.a., not applicable.
The most detrimental effect of the vowel on the identification of CodaPoA is obtained for [a], with [i] coming second and [u] third, in all three language groups, with the largest effect sizes for the Dutch group. Both [i] and [u] significantly increase the identification of the coda consonant as compared to [a]. This is visualized in Figure 2, where the regression coefficients for the vowels are plotted with their confidence intervals. The confidence intervals for [i] and [u] do not overlap with those for [a]. In addition, the positive effect of [u] is stronger in all three language groups than that of [i], since there is no overlap between their confidence intervals, meaning that the accuracy scores are the highest with a preceding vowel [u]. The coefficient gaps between the vowels vary, with larger confidence intervals for Dutch because of the lower number of participants.
![Figure 2:
Regression coefficients of the three vowels for the three languages separately, with [a] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 5).](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_002.jpg)
Regression coefficients of the three vowels for the three languages separately, with [a] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 5).
The second row in Table 5 shows that in all three language groups, [p] is more often recognized correctly than the three other coda consonants [t k ʔ], as seen in Figure 3, confirming earlier experimental results (Wright 2001). The Chaoshan group distinguishes itself in recognizing [k] better and [t] worse than the Zhangquan and Dutch groups.
![Figure 3:
Regression coefficients of the four codas for the three languages separately, with [p] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 5).](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_003.jpg)
Regression coefficients of the four codas for the three languages separately, with [p] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 5).
3.2 Results in relation to the three hypotheses
Hypotheses I, II and III, given in Section 1.3, predict that the historical change from [at] to [ak] should be reflected in three biases in the misidentification rates of the four rhymes, [ap, at, ak, aʔ] across the three language groups. We tested the three hypotheses statistically by applying glmer per language, specifying family as binomial and including Participant as a random factor. Our analyses to test the three hypotheses were done on the relevant subsets of the data. We left out the four secondary variables (see Section 3.1), after having established that they did not have any impact on the evaluation of these hypotheses. Random slopes gave again convergence problems, as in the analyses reported in 3.1, for which reason we did not include them.
Importantly, the biases predicted by the hypotheses should be comparable in the three language groups, on the assumption that the perception results are language-independent (cf. Ohala 1993a, 1993b).
Hypothesis I states that [ak] will less often be misperceived as [at] than the other way around. Table 6 presents the frequencies and perceptual misidentification rates for the three languages for this pair only, not taking into account confusions with the other two codas. Asymmetry, as predicted by hypothesis I, indicates a large difference between the percentages of misperceptions, while symmetry implies that misperceptions occur equally frequently in both directions. As shown, the percentages for [at]→[ak] are higher than those for its counterpart [ak]→[at].
Frequencies and percentages of misperception in the pairs [at]→[ak] and [ak]→[at] in the three languages.
Misperceptions | Chaoshan | Zhangquan | Dutch |
---|---|---|---|
[at]→[ak] | 962/1133 (81.7 %) | 418/826 (50.6 %) | 102/294 (34.7 %) |
[ak]→[at] | 121/1039 (11.6 %) | 204/691 (29.5 %) | 44/234 (18.8 %) |
The confusion rates between these two error pairs were compared in the glmer analysis. The results are given in Table 7. The negative and significant regression coefficients indicate a dominant bias towards choosing a coda [k] over [t] after the vowel [a].
Regression coefficients of the asymmetry between coda [t] and [k] after [a] for the three languages.
Chaoshan | Zhangquan | Dutch | |
---|---|---|---|
Asymmetry | −3.754** (0.127) | −0.916** (0.110) | −0.899** (0.214) |
-
Standard errors are given in parentheses. A negative coefficient indicates a lower number of misidentifications in the second pair [ak]→[at] than in the first pair [ak]→[at]. Significant effects: *p < 0.05, **p < 0.01.
All coefficients were significant (p < 0.01). The Chaoshan group has the strongest asymmetry, which boost may be an effect of the structural absence [t] from the coda, as a result of which these participants are less familiar with the pattern of formant transitions than the Zhangquan and Dutch participants, as suggested by a reviewer. This boost is however specific to misperceptions as [k]. As will be seen in the next paragraph, it fails to show up in misperceptions of [t] as either [p] or [ʔ], where the Chaoshan group is less inclined to misidentify [t]. These regression coefficients can be interpreted as odds ratios. For instance, Zhangquan’s −0.916 means that the chance of [ak]→[at] is 0.40 less probable than that of [at]→[ak] (applying the antilog), or putting it the other way around, [ak]→[at] is 1/.40 = 2.50 times less probable than [at]→[ak] (cf. Winter 2019: 202–204).
Hypothesis II states that [at] is more likely to be misperceived as [ak] than as either [ap] or [aʔ]. Table 8 gives the frequencies and percentages of misidentified codas for [at]. In all languages the percentages of a misperceived coda [t] decrease going from [k], to [p], to [ʔ]. These outcomes are confirmed by the regression coefficients obtained from a regression analysis given in Table 9. As predicted by Hypothesis II, all are negative and significant (p < 0.01), confirming the special status of [k]. This pattern is visualized in Figure 4, where all three language groups had higher misidentifications of [at] as [ak] than as either [ap] or [aʔ].
Frequencies and percentages of all misperceptions of the coda [t] after [a] in the three languages.
Misperceptions coda [t] | Chaoshan | Zhangquan | Dutch |
---|---|---|---|
[k] | 962 (54.4 %) | 418 (45.2 %) | 102 (53.1 %) |
[p] | 442 (25.0 %) | 299 (32.4 %) | 68 (35.4 %) |
[ʔ] | 363 (20.5 %) | 207 (22.4 %) | 22 (11.5 %) |
Regression coefficients of misperceiving [at] as [ak] in relation to misperceiving [at] as [ap] or [aʔ] in the three languages.
[at] | Chaoshan | Zhangquan | Dutch | |||
---|---|---|---|---|---|---|
perceived coda | [p] | [ʔ] | [p] | [ʔ] | [p] | [ʔ] |
Misidentification compared to [k] | −0.335** (0.076) | −0.703** (0.085) | −0.778** (0.057) | −0.975** (0.062) | −0.405** (0.157) | −1.534** (0.235) |
-
Standard errors are given in parentheses. A negative coefficient indicates a lower number of misidentifications as [p] or [ʔ] compared to the number of misidentification as [k] (reference point). Significant effects: *p < 0.05, **p < 0.01.
![Figure 4:
Regression coefficients of the three misperceptions of [t] for the three languages separately, with [ak] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 9).](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_004.jpg)
Regression coefficients of the three misperceptions of [t] for the three languages separately, with [ak] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 9).
Hypothesis III singles out [at] as being more frequently misidentified as [ak] than [it] and [ut] are as [ik] and [uk], respectively. The frequencies and percentages are given in Table 10. The statistical outcomes of the regression analysis for testing Hypothesis III are given in Table 11. All regression coefficients are negative and significant (p < 0.01), in line with the hypothesis. They are plotted in Figure 5. These results show that all three language groups made significant distinctions between [a] on the one hand and [i u] on the other.
Frequencies and percentages of misperceptions of coda [t] as [k] after three vowels in the three languages.
Misperceptions of [t] as [k] | Chaoshan | Zhangquan | Dutch |
---|---|---|---|
[a] | 962/1133 (84.9 %) | 418/826 (50.6 %) | 102/294 (34.7 %) |
[i] | 303/658 (46.0 %) | 165/719 (22.9 %) | 37/185 (20.0 %) |
[u] | 289/1302 (22.2 %) | 235/842 (27.9 %) | 16/349 (4.6 %) |
Regression coefficients of misidentifications of [t] as [k] after [a i u] in the three languages, with [a] as the reference.
[t]→[k] | Chaoshan | Zhangquan | Dutch | |||
---|---|---|---|---|---|---|
Vowel | [i] | [u] | [i] | [u] | [i] | [u] |
Misidentification | −1.822** | −2.982** | −1.235** | −0.973** | −0.754** | −2.403** |
compared to [a] | (0.113) | (0.106) | (0.113) | (0.104) | (0.221) | (0.284) |
-
Standard errors are given in parentheses. A negative coefficient indicates a lower number of misidentifications. Significant effects: *p < 0.05, **p < 0.01.
![Figure 5:
Regression coefficients of the misidentifications of [t] as [k] after [a i u] in the three languages, with [a] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 11).](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_005.jpg)
Regression coefficients of the misidentifications of [t] as [k] after [a i u] in the three languages, with [a] as the baseline (0.0) and their confidence intervals (95 %) (see also Table 11).
The results for each of the three language groups are thus fully consistent with our three hypotheses in Section 1.
4 Discussion
We would like to raise four questions, partly in response to suggestions by our reviewers. The first concerns the extent to which phonetic data in the literature account for our perceptual findings, the second asks the same question with reference to the acoustic nature of our stimuli, the third focuses on the low identification scores for [t] and the related directionality of the misperception of coronal to velar, while the fourth concerns an alternative explanation of the Chaoshan sound change.
4.1 Potential formant transition cues
On the assumption that VC formant transitions are mirror images of CV transitions, we should expect the articulation place of unreleased coda stops in our stimuli to be cued by the corresponding mirrored formant transitions from the vowel to the stop closure, with the transition of F2 being most informative (Anderson 1997; Dorman et al. 1979; Halle et al. 1957; Liberman et al. 1967). For CV formant transitions, we turn to the classic CV data in Delattre er al. (1955), where the F2 transitions for [b] are downward and those for [g] upward, across both front and back vowels, while those for [d] are upward before back vowels and downward before mid and high front vowels, due to its constant ‘locus’ for [d] at around 1,800 Hz. Leaving the placeless [ʔ] aside, this would predict that the distinctiveness of the F2 transitions is largest for velars adjacent to front vowels and for labials adjacent to back vowels. That is, confusability should be highest for the pairs [ak]∼[at] and [ip]∼[it]. Both of these predictions are confirmed in our data. The first formed part of our hypothesis. The second did not, but since we also collected misperceptions for [ip], we are able to report that [it] was frequently misperceived as [ip], as shown by the high positive odds ratios (3.05 for Chaoshan, 1.13 for Zhangquan, 3.55 for Dutch).
To some extent, these acoustically motivated predictions can be related to place articulations on the assumption that larger transitions from vowel targets to a following articulation place provide more reliable cues (cf. Delattre et al. 1955). On this basis, Recasens et al. (1997) point out that smaller transitions should be expected towards palatals and palatoalveolars, which fully engage the tongue dorsum for their articulation. For alveolars, which have a raised tongue blade, transitions will depend on the simultaneous presence of a raised tongue back, which will cause F2 to be below 1,800 Hz and thus produce upward transitions from low and back vowels, rather than downward ones for mid and higher front vowels, quite as found in Delattre et al. (1955). Adjustments of the articulation place of velars to the target of the preceding vowel will provide [k] with two or more loci, depending on the degree of the adjustment. Both loci reported in Delattre et al. (1955) have the same upward direction, the higher locus being for front vowels. The articulatory considerations identify [p] as potentially having the most robust place of articulation, since the tongue is free to take up any position during its labial closure stage, while the closure itself will have a depressing effect on F2, as for a rounded back vowel like [u]. Glottal sounds, whose articulation does not involve any supraglottal constriction in Taiwanese Southern Min, are not predictably marked by any transitions, in principle leaving the tongue free to move the vowel target in any direction, which freedom may conceivably be exploited to create language-specific distinctive F2 transitions relative to oral stops. Overall, these acoustic and articulatory considerations weakly identify [t] and [k] as being less easily identifiable than [p], which is confirmed in our data.
The above viewpoint of symmetry for CV and VC transitions is however not fully supported by research data. For one thing, Wang and Bilger (1973) found that the effect of formant transitions on the perception of articulation place is weaker in VC-structures than in CV-structures, while Blumstein and Stevens (1979) reported that spectral template samples at the VC boundary contained considerably less information than samples taken at release and burst locations in CV structures and at the release stage in VC structures. Similarly, Recasens et al. (1997) found that vowel-dependent anticipatory effects on articulations of consonants (CV-structures) are weaker than carryover effects (VC-structures) in VCV-sequences produced by speakers of Spanish and Catalan. This asymmetry may be related to the sonority profile of the syllable, which Clements (1990) characterized as maximizing the sonority difference at the beginning of the rhyme, discouraging coarticulation in CV, and minimizing it towards the end of the rhyme, promoting it in VC. Against this discriminability bias towards CV transitions, van Wieringen and Pols (1995) found that varying formant-like falling and rising transitions in artificial stimuli yielded a larger perceptual difference limen before a steady state portion than after it. They cautioned however that the perception of symmetrical synthetic signals cannot simply be related to that of natural CV and VC stimuli, which are inherently asymmetrical. Their study of difference limens as a function of variation in the position, direction, duration, and frequency bandwidth of signals with single and multiple formants generally showed that difference limens were smallest for variation in end points as well as for longer formant transitions. Another study that found greater place of articulation identification for VC than CV structures is Shaft and Hemeyer (1972), who offered a range of naturally spoken stimuli with the vowel [ə] and a range of voiced and voiceless plosives and fricatives. However, they curated the stimuli by removing onset noise bursts, thereby creating an advantage for VC stimuli, whose final boundaries were left intact.
The relevance of the findings and considerations in the first two paragraphs of this section of course depends on the extent to which our stimuli shared the acoustic features identified in the literature. The next section therefore presents formant values in our stimuli.
4.2 Potential cues in our stimuli
Our stimuli may reveal phonetic features that provide predictions of our perceptual results and which may form the basis of hypotheses for future experimentation. Because vowel duration may also have played a role, rhyme durations are given in Figure 6. Before [ʔ], vowels are generally some 50 ms longer than before the other three consonants, something that may well have had an effect on accuracy scores.

Boxplots of the rhyme durations for all stimuli.
Formant transitions should approximately appear in the last 25 ms or so of the vowel (e.g. Turner et al. 1997), but in line with the discussion in 4.1 on the asymmetry of CV and VC transitions, we present formant frequency data for the final 75 % of the rhyme durations, in case there are allophonic differences in vowel quality that span larger time domains. Figures 7–9 present F1, F2 and F3 formant frequency trajectories respectively for all VC combinations, averaged over minimally four of the six possible combinations with aspirated and unaspirated onset consonants. The trajectories and their confidence intervals were estimated with the option gam in the R package ggplot2. Like the duration data, the data for F1 (Figure 7) suggest high discriminability of [ʔ] from the other three consonants. In the case of [a], a rising transition contrasts with falling trajectories for the other three consonants setting in around the vowel midpoint, with those for [t] and [k] being particularly close together. In the case of [i] and [u], a lower or falling F1 trajectories suggest allophones before [ʔ], contrasting with the higher trajectories before [t] and [p] after [i] and [t] after [u]. While singling out [ʔ] as highly discriminable, these F1 data thus indicate low discriminability of both [ak] ∼ [at] and [ip] ∼ [it]. This finding agrees with that reported in Section 4.1.
![Figure 7:
F1 trajectories shown over the final 75 % of the duration of each of the vowels [a i u] by articulation place of the coda stop.](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_007.jpg)
F1 trajectories shown over the final 75 % of the duration of each of the vowels [a i u] by articulation place of the coda stop.
![Figure 8:
F2 trajectories shown over the final 75 % of the duration of each of the vowels [a i u] by articulation place of the coda stop.](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_008.jpg)
F2 trajectories shown over the final 75 % of the duration of each of the vowels [a i u] by articulation place of the coda stop.
![Figure 9:
F3 trajectories shown over the final 75 % of the duration of each of the vowels [a i u] by articulation place of the coda stop.](/document/doi/10.1515/phon-2023-2003/asset/graphic/j_phon-2023-2003_fig_009.jpg)
F3 trajectories shown over the final 75 % of the duration of each of the vowels [a i u] by articulation place of the coda stop.
Moving on to the F2 trajectories in Figure 8, we note the uniquely falling transitions for [ap] and [ip], in line with the discussion in 4.1. In the case of [ap], the falling transition contrasts with weakly rising ones for [aʔ] and [at] and a flat trajectory for [ak]. An early fronting or raising occurs in [a] before [t], but if this is a specific feature of Taiwanese Southern Min, its effect on the accuracy scores may have been limited. Confirming the interpretations of F1 trajectories, a raised allophone of [i] appears before [ʔ], suggesting greater discriminability from [t] and [p]. The trajectories of [u] before [t] and [ʔ] suggest a robust discriminability between these consonants. An auditory inspection revealed a sudden change from [u] to a centralized [ɵ] at a variable point halfway through the vowel in the six tokens, which may have boosted correct perceptions of [ut] for participants who could associate the allophonic shift with the [t] coda context. The data in Figure 5 suggest that this would appear to be the case for the Chaoshan and Dutch groups, but not for the Zangquan group. In line with the discussion in 4.1, moderate to poor discriminability is suggested for [ip] ∼ [it] and unless the allophonic raising of [a] is recognized as anticipating [t], also for [at] ∼ [ak].
Moving on to Figure 9, we note that the F3 trajectories in many ways repeat the patterns we saw for F2. After [a], the four coda types are paired as [t] and [ʔ] against [p] and [k], strengthening the discriminability of [t] and [k], again depending on whether participants associate the allophonic effect with [t]. After [i], the discriminability of [ʔ] from [t k] on the basis of F2 trajectories is strengthened by those for F3. For [ut] versus [u], F3 has no contribution to make.
Our stimuli came from a language that was not represented in any participant group in order to prevent a language bias in the results. The apparent cases of vowel allophony as a function of the coda context, like raised and fronted [a] before [t], are most probably language-specific ways of enhancing contrasts with other coda places. If so, our methodological decision inevitably biases the results for any participant group whose language happens to share one or more of these allophonic implementation rules with the language of the stimuli. It is not clear to us how this problem can be avoided in a crosslinguistic experiment that seeks to obtain results on the basis of natural speech. Attempts to control for such effects would require a phonetic analysis of VC rhymes of the stimuli, followed by the selection of a set of languages each of which agrees in a unique allophony rule with the language of the stimuli. This would allow the researcher to assess the effects of each case of vocalic allophony. Because of the vulnerability of formant transition cues in the absence of release bursts, this problem may be more relevant to unreleased VC rhymes than to released VC rhymes and CV structures. Additionally, the ideal sonority profile of syllables, with a rapid prominence surge followed by a protracted prominence decline (Clements 1990), may locate the exploitation of the vocalic phase for consonantal place cues towards the syllable end. Our acoustic walk-through of the stimuli suggests that the study of the perceptual strategies in any specific language may yield new and insightful data.
4.3 Asymmetrical confusions
Confusions between [ip] and [it] as well as those between [ak] and [at] reflect an overall poorer identification of coronals than of velars and labials in earlier data as well as in ours (Halle et al. 1957; Shaft and Hemeyer 1972). A possible explanation of this bias was suggested by Stemberger (1991) in connection with the more frequent substitution of velar and labial plosives for coronal plosives than of coronal plosives for either velar or labial ones in speech errors. Along with a bias for single consonants to be replaced with consonant clusters (Stemberger and Treiman 1986), Stemberger (1991) used this finding to explore the hypothesis that these errors have their source in the addition of phonological elements as a general perception policy. Stemberger (1991) argued that the notion of addition is also applicable to coronal consonants on the assumption that these are phonologically underspecified relative to labials and velars. This conjecture was also supported by the coronal-to-palatal errors earlier reported by Shattuck-Hufnagel and Klatt (1979). In the case of speech perception, this could be interpreted in terms of either the acoustic patterns in the input signal, in which case cues that are absent or weakly present in the input are interpreted as if they were robustly present, or in terms of phonological features that are unspecified in the input segments and are supplied in the interpreted segment.
The acoustic information in Figure 8 fails to provide evidence that [k] has a more conspicuous version of any F2 cue than [t], so that an acoustic interpretation of the ‘addition’ hypothesis cannot be based on our data. Neither does a phonological interpretation provide any supporting evidence. The most prominent underspecification theory in speech perception, the Featurally Underspecified Lexicon (FUL) model (Lahiri and Reetz 2010), is based on the agreement of features detected in the signal with those present in the phonological representations of word candidates. Coronal place is taken to be universally unspecified, while labiality and dorsality are specified. As a result, perceived coronality provides a mismatch with labial and dorsal feature specifications, causing [t] to be an unacceptable match for either [p] or [k]. By contrast, the detection of either labial or dorsal features in the input allows a parsing as underspecified coronality, there being no mismatching features. These predictions have been confirmed by language acquisition data as well as by assimilation rules reported for a wide range of languages. However, they do not agree with the confusability directions for [p] ∼ [t] or [k] ∼ [t]. Perceived coronality cannot be interpreted as either labiality in [p] or dorsality in [k], while perceived labiality can be interpreted as [t] or[p] and perceived dorsality as [k] or [t]. In fact, it is questionable whether underspecification is an appropriate concept in the explanation of either speech errors or misperceptions. The FUL model assumes correct parsings of inputs, whereas perceptions result from competition between different parsings of similar acoustic cues. Stemberger’s hypothetical ‘addition’ scenario therefore provides no explanation for the bias in misperceptions of [t] in favour of [p] after [i] and in favour of [k] after [a]. An additional complication here is that there may be biases in speech error reporting. Realizations of [k] and [t] with simultaneous coronal and velar constrictions are classified as [k] in the case of intended [k]’s, but intended [t]’s are ambiguous between [t] and [k], potentially boosting [k] perceptions (Marin et al. 2010).
4.4 An alternative explanation of the sound change
A possible alternative to our acoustic explanation of the Chaoshan sound change is an articulatory one. Blevins (2004: 123) discusses a plausible scenario for /t/-to-/k/ shifts generally, pointing out that stop systems in most languages in which that change is attested were small, typically being confined to [p t (ʔ)], whereby [t] is the only lingual sound. This characterization is supported by the survey in Blust (2004), where the change predominantly occurs in stop systems lacking [k]. In a system with only a single lingual stop, the place of articulation of (what might be described as) a ‘dental’ or ‘alveolar’ stop may well cover a wider sagittal space over the palate, creating a front-plus-back contact area over the palatal zone covering the [t]-to-[k] range. A lingual articulation of this type was described for the realization of coda /n/ in Shiwilu (Valenzuela and Gussenhoven 2013), a single, though variable area of contact before onset nasals and word finally, as in [ˈkɘn̪͡ŋ.maʔ] ‘indigenous person’. This alternative articulatory scenario would predict that, after [a], speakers developed an articulation of this type for /t/, which was subsequently reported as /k/. There are two reasons why this scenario is less probable for Chaoshan. First, there were two contrasting lingual stops, /t/ and /k/, which does not correspond to Blevins’ typical scenario. Second, the large-contact articulation of coronals is not supported by descriptions for languages in the Chaoshan area, as far as we are aware.
5 Summary and conclusion
The goal of our experiment was to present experimental evidence for the perceptual origin of a sound change in Chaoshan Southern Min in the final decades of the nineteenth century. Specifically, syllable rhymes with /at/ changed to /ak/, after which coda /t/ merged with /k/ after the other vowels of the language, /i/ and /u/. The Chaoshan [t]→[k] change is unlike the change that Blust (2004) claims as the majority type, in which the change occurs when [k] is absent. Rather, coda /t/ merged with coda /k/ in Chaoshan. A pull-chain scenario whereby a change of [at] to [ak] was triggered by an earlier change of [ak] to [aʔ] is independently excluded by the morphological explanation of coda /ʔ/ by Tsao and Chen (2012), who pointed out that the glottal stop was introduced by a diminutive suffix /-ʔ/ which replaced /p t k/ in virtually equal proportions. The absence of a wider shift of /ak/ to /aʔ/ is confirmed by the unchanged presence of two-thirds of the 19th-century rhymes containing /ak/ in modern dictionaries (Fielde 1883; Goddard 1996).
Because the change from [at] to [ak] would not obviously appear to have a source in the simplification of articulatory routines, our assumption was that it had a perceptual origin. Unlike phonological interpretations of articulation-based phonetic sound changes, perceptual sound changes have no intermediate phonetic stages between the original sound and its reflex, and arise from universal perceptual biases (Ohala 1993a, 1993b). In addition to an expectation that, like similar articulation-based sound changes, similar perception-based sound changes may arise independently in different languages, perception research should be able to mimic these sound changes. Accordingly, we recreated the relevant conditions in an experimental setting, using minimally curated CVC stimuli as spoken by a speaker of Taiwanese Southern Min for a four-way coda stop identification task. Among other things, the stimuli crucially varied in the identity of the V ([i a u], vowels that figure in the phonologies of most languages) and the final C, [p t k ʔ]. The existence of these four consonants in the syllable coda varies across the languages of the three listener groups we tested this hypothesis on: Zhangquan allows all four in its coda, Dutch allows [p t k] and Chaoshan [p k ʔ]. A mixed-effects analysis showed that not only the type of coda consonant, but also the type of vowel had strong effects on the identification of the coda consonant. Other factors, like the identity of the onset C and the signal-to-noise ratio, had weak and inconsistent effects, while the pitch of the vowel never influenced the identifications of the coda C. Importantly, all three language groups misperceived (i) [at] more frequently as [ak] than the other way around, (ii) [at] more frequently as [ak] than as [aʔ] or [ap], and (iii) [at] more frequently as [ak] than they misperceived [it] as [ik] or [ut] as [uk]. This pattern is consistent with the historical change of /at/ to /ak/ in Chaoshan.
Our decision to study the identification of coda place as a function of vowel quality on the basis of an actual sound change has led to a demonstration that what happens in language evolution can be linked to perception data, which represents a significant contribution to the literature on coda place identification. The consistent results of our experiment across the three language groups provide a strong motivation for more targeted research into the perceptual cues of coda place recognition. Potentially predictive acoustic information in our stimuli to a certain extent agrees with our results. In combination with the mixed results from earlier research on VC-transitions, they justify a renewed research effort to explore the cues in unreleased VC-structures. Gating experiments with brief pared off sections at the end of the stimulus may help to locate the best cues, while stimuli with systematically manipulated formant transitions may increase our understanding of what those cues are.
Acknowledgements
We dedicate this article to the memory of John Ohala (1941–2020). The historical hypothesis and a preliminary analysis of the Chinese data were part of the PhD dissertation supervised by Feng-fu Tsao and H. Samuel Wang by the first author (National Tsing-Hua University, Chu 2009), who collected the Dutch data while she was a guest researcher at Radboud University in the Netherlands. Carlos Gussenhoven is the main author of the article. Roeland van Hout is responsible for the statistical analyses. We thank Peng, Jyn-Ying for developing VowelWav_en.ext, which we used to run the experiment. We thank our Tainan speaker and the participants in our investigation, who were recruited from the Shantou Linbaixin Science and Technology Secondary Vocational School (汕頭市林百欣科學技術中等專業學校), Hanshan Normal University (潮州市韓山師範學院), Minnan Normal University (漳州市漳州師範學院 renamed 閩南師範大學 in 2013), Liming Vocational University (泉州市黎明職業大學) and the subject pools of the Max Plank Institute of Psycholinguistics and Radboud University, Nijmegen. We thank Paula Fikkert for discussion of section 4.3 and are most grateful for the extensive comments and corrections made by our reviewers, which have greatly improved the quality of this article.
The corpus of 48 recorded words on which the investigation was based. Words with blanks in the last column lack an unambiguous character.
IPA/bei-wei-zi | Chinese characters | IPA/be-wei-zi | Chinese characters |
---|---|---|---|
pat | 別 | phat | |
pak | 縛 | phak | 曝 |
pit | 鼻 | phiʔ | |
puʔ | phit | ||
put | 佛 | phuʔ | 薄 |
taʔ | phut | ||
tap | 沓 | thaʔ | 疊 |
tat | thap | ||
tak | 獨 | that | |
tiʔ | thak | 讀 | |
tip | thiʔ | ||
tit | 直 | thip | |
tuʔ | 突 | thit | |
tut | thuʔ | 咄 | |
kaʔ | 呱 | thut | 脫 |
kap | 詥 | khaʔ | 及 |
kat | khap | 磕 | |
kak | khat | 竭 | |
kiʔ | khak | ||
kip | 及 | khiʔ | |
kit | khip | ||
kuʔ | khit | ||
kut | 滑 | khuʔ | |
phaʔ | khut | 咄 |
References
Anderson, Victoria B. 1997. The perception of coronals in Western Arrernte. In Fifth European Conference on Speech Communication and Technology. Eurospeech, 389–392. Greece, Rhodes.10.21437/Eurospeech.1997-146Search in Google Scholar
Blevins, Juliette. 2004. Evolutionary phonology. The emergence of sound patterns. Cambridge: Cambridge University Press.10.1017/CBO9780511486357Search in Google Scholar
Blevins, Juliette. 2010. Phonetically-based sound patterns: Typological tendencies or phonological universals? In C. Fougeron, B. Kühnert, M. D’Imperio & N. Vallée (eds.), Laboratory Phonology X: Variation, phonetic detail and phonological modeling, 201–224. Berlin: Mouton de Gruyter.10.1515/9783110224917.2.201Search in Google Scholar
Blumstein, Sheila E. & Kenneth N. Stevens. 1979. Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants. The Journal of the Acoustical Society of America 66(4). 1001–1017. https://doi.org/10.1121/1.383319.Search in Google Scholar
Blust, Robert. 2004. *t to k: An Austronesian sound change revisited. Oceanic Linguistics 43. 365–410. https://doi.org/10.1353/ol.2005.0001.Search in Google Scholar
Boersma, Paul & David Weenink. 2016 [1992]. Praat: Doing phonetics by computer. [Computer software]. Available at: http://www.praat.org.Search in Google Scholar
Booij, Geert. 1995. The phonology of Dutch. Clarendon: Oxford University Press.Search in Google Scholar
Chen, Matthew. 1973. Cross-dialectal comparisons: A case study and some theoretical considerations. Journal of Chinese Linguistics 1(1). 38–63.Search in Google Scholar
Chen, Peng-Nian (陳彭年) & Yong Qiu (丘雍) (eds.). ca. 1088. 大宋重修廣韻. Da Song Chongxiu Guangyun [The revised dictionary of rhymes of the Song Dynasty in the Chinese language].Search in Google Scholar
Clements, George. 1990. The role of the sonority cycle in core syllabification. Laboratory Phonology I: Between the grammar and physics of speech, 283–333. Cambridge: Cambridge University Press.10.1017/CBO9780511627736.017Search in Google Scholar
Delattre, Pierre C., Alvin M. Liberman & Franklin S. Cooper. 1955. Acoustic loci and transitional cues for consonants. The Journal of the Acoustical Society of America 27. 769–773. https://doi.org/10.1121/1.1908024.Search in Google Scholar
Dorman, Michael F., Lawrence J. Raphael & Alvin M. Liberman. 1979. Some experiments on the sound of silence in phonetic perception. The Journal of the Acoustical Society of America 65. 1518–1532. https://doi.org/10.1121/1.382916.Search in Google Scholar
Fielde, Adele M. 1883. A pronouncing and defining diction of the Swatow dialect: Arranged according to syllables and tones. Shanghai: American Presbyterian Mission Press.Search in Google Scholar
Goddard, Josiah. 1883. A Chinese and English vocabulary in the Tiu-Chiu dialect. Shanghai: American Presbyterian Mission Press (Original work published 1847).Search in Google Scholar
Gussenhoven, Carlos. 1992. Dutch. Illustrations of the IPA. Journal of the International Phonetic Association 22. 45–47. https://doi.org/10.1017/s002510030000459x.Search in Google Scholar
Halle, Morris, George W. Hughes & Jean‐Pierre A. Radley. 1957. Acoustic properties of stop consonants. The Journal of the Acoustical Society of America 29. 107–116. https://doi.org/10.1121/1.1908634.Search in Google Scholar
Hume, Elizabeth & Keith Johnson. 2001. The role of speech perception in phonology. New York: Academic Press.10.1163/9789004454095Search in Google Scholar
Hyman, Larry. 1976. Phonologization. In Alphonse Juilland, Andrew M. Devine & Laurence D. Stephens (eds.), Linguistic studies offered to Joseph Greenberg on the Occasion of his Sixtieth Birthday, 4, 407–418. Saratoga: Anima Libri.Search in Google Scholar
Kiparsky, Paul. 1982. From cyclic phonology to lexical phonology. In Harry van der Hulst & Norval Smith (eds.), The structure of phonological representations, Part I, 131–175. Dordrecht: Foris.10.1515/9783112328088-008Search in Google Scholar
Labov, William, Sharon Ash & Charles Boberg. 2006. The Atlas of North American English phonetics, phonology and sound change. A multimedia reference tool. Berlin, New York: Mouton de Gruyter.10.1515/9783110167467Search in Google Scholar
Lahiri, Aditi & Henning Reetz. 2010. Distinctive features: Phonological underspecification in representation and processing. Journal of Phonetics 38. 44–59. https://doi.org/10.1016/j.wocn.2010.01.002.Search in Google Scholar
Lass, Roger. 1976. English phonology and phonological theory: Synchronic and diachronic studies. Cambridge: Cambridge University Press.Search in Google Scholar
Liberman, Alvin M., Franklin S. Cooper, Donald P. Shankweiler & Michael Studdert-Kennedy. 1967. Perception of the speech code. Psychological Review 74(6). 431–461. https://doi.org/10.1037/h0020279.Search in Google Scholar
Lightfoot, David. 2010. Language acquisition and language change. Wiley Interdisciplinary Reviews: Cognitive Science 1(5). 677–684. https://doi.org/10.1002/wcs.39.Search in Google Scholar
Lightfoot, David. 2013. Types of explanation in history. Language 89(4). e18–e38.10.1353/lan.2013.0056Search in Google Scholar
Lin, Lun-Lun & Xiao-Feng Chen (林倫倫&陳小楓). 1996. 廣東閩方言語音研究 GuangDong MinFangYan YunYi YanJio [The phonetic analysis of GuangDong Min dialect in the Chinese language]. GuangDong: Shantou University Press.Search in Google Scholar
Lindblom, Björn, Susan Guion, Susan Hura, Seung-Jae Moon & Raquel Willerman. 1995. Is sound change adaptive? Rivista di Linguistica 7. 5–37.Search in Google Scholar
Marin, Stefania, Marianne Pouplier & Jonathan. Harrington. 2010. Acoustic consequences of articulatory variability during productions of [t] and [k] and its implications for speech error research. The Journal of the Acoustical Society of America 127. 445–461. https://doi.org/10.1121/1.3268600.Search in Google Scholar
Ohala, John Jerome. 1981. The listener as a source of sound change. In Carrie S. Masek, Roberta A. Hendrick & Mary Frances Miller (eds.), Parasession on language and behavior, 178–203. Chicago: Chicago Linguistic Society.Search in Google Scholar
Ohala, John Jerome. 1985. Linguistics and automatic processing of speech. In Renato De Mori & Ching Y. Suen (eds.), New systems and architectures for automatic speech recognition and synthesis. NATO ASI Series [Series F: Computer and Systems Sciences Vol 16], 447–475. Berlin, Heidelberg: Springer.10.1007/978-3-642-82447-0_18Search in Google Scholar
Ohala, John Jerome. 1986. Consumer’s guide to evidence in phonology. Phonology 3. 3–26. https://doi.org/10.1017/s0952675700000555.Search in Google Scholar
Ohala, John Jerome. 1989. Sound change is drawn from a pool of synchronic variation. In Leiv Egil Breivik & Ernst Håkon Jahr (eds.), Language change: Contributions to the study of its causes [Series: Trends in Linguistics, Studies and Monographs No. 43], 173–198. Berlin: Mouton de Gruyter.10.1515/9783110853063.173Search in Google Scholar
Ohala, John Jerome. 1992. The segment primitive or derived? In Gerard J. Docherty & D. Robert Ladd (eds.), Laboratory phonology II: Gesture segmental, prosody, 166–183. Cambridge: Cambridge University Press.10.1017/CBO9780511519918.008Search in Google Scholar
Ohala, John Jerome. 1993a. Sound change as nature’s speech perception experiment. Speech Communication 13. 155–161. https://doi.org/10.1016/0167-6393(93)90067-u.Search in Google Scholar
Ohala, John Jerome. 1993b. The phonetics of sound change. In Charles Jones (ed.), Historical linguistics: Problems and perspectives, 237–278. London: Longman.Search in Google Scholar
Recasens, Daniel, Maria Dolors Pallarès & Jordi Fontdevila. 1997. A model of lingual coarticulation based on articulatory constraints. The Journal of the Acoustical Society of America 102. 544–561. https://doi.org/10.1121/1.419727.Search in Google Scholar
Recasens, Daniel. 2020. Phonetic causes of sound change: The palatalization and assibilation of obstruents. Oxford: Oxford University Press.10.1093/oso/9780198845010.001.0001Search in Google Scholar
Shaft, Donald J. & Thomas Hemeyer. 1972. Identification of place of consonant articulation from vowel formant transitions. Journal of the Acoustical Society of America 51. 652–658. https://doi.org/10.1121/1.1912890.Search in Google Scholar
Stemberger, Joseph P. 1991. Apparent anti-frequency effects in language production: The addition bias and phonological underspecification. Journal of Memory and Language 30. 161–185. https://doi.org/10.1016/0749-596x(91)90002-2.Search in Google Scholar
Stemberger, Joseph P. & Rebecca Treiman. 1986. The internal structure of word-initial consonant clusters. Journal of Memory and Language 25. 163–180. https://doi.org/10.1016/0749-596x(86)90027-6.Search in Google Scholar
Shattuck-Hufnagel, Stefanie & Dennis Klatt. 1979. The limited use of distinctive features and markedness in speech production: Evidence from speech errors. Journal of Verbal Learning and Verbal Behavior 18. 41–55. https://doi.org/10.1016/s0022-5371(79)90554-1.Search in Google Scholar
Tsao, Feng-Fu & Yen-Ling Chen. 2012. Diminutive-induced Sound Changes in the Hui Yin Miao Wu. Language and Linguistics 13(2). 221.Search in Google Scholar
Turner, Christopher W., Sarah J. Smith, Patricia L. Aldridge & Suzanne L. Stewart. 1997. Formant transition duration and speech recognition in normal and hearing-impaired listeners. Journal of the Acoustical Society of America 101(5 Pt 1). 2822–2825. https://doi.org/10.1121/1.418566.Search in Google Scholar
Valenzuela, Pilar M. & Carlos Gussenhoven. 2013. Shiwilu (Jebero). Journal of the International Phonetic Association 43(1). 97–106. https://doi.org/10.1017/s0025100312000370.Search in Google Scholar
van Wieringen, Astrid & Louis C.W. Pols. 1995. Discrimination of single and complex consonant-vowel- and vowel-consonant formant transitions. Journal of the Acoustical Society of America 98(3). 1304–1312. https://doi.org/10.1121/1.413467.Search in Google Scholar
Wang, Marilyn D. & Robert C. Bilger. 1973. Consonant confusions in noise: A study of perceptual features. The Journal of the Acoustical Society of America 54. 1248–1266. https://doi.org/10.1121/1.1914417.Search in Google Scholar
Winter, Bodo. 2019. Statistics for linguists: An introduction using R. New York: Routledge.10.4324/9781315165547Search in Google Scholar
Wright, Richard. 2001. Perceptual cues in contrast maintenance. In Elizabeth Hume & Keith Johnson (eds.), The role of speech perception in phonology, 251–277. London: Academic Press Limited.10.1163/9789004454095_014Search in Google Scholar
Yu, Alan C.J. (ed.). 2013. Sound change in progress. Oxford University Press.Search in Google Scholar
Zhou, Chang-Ji (周長楫). 2006. 閩南方言大詞典Minnan FangYan DaCiDian. [Southern Min Dialect Dictionary]. Fu-Jian: Fu-Jian People Press.Search in Google Scholar
© 2023 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Research Articles
- Phonetic phenomena in New Flamenco. The linguistic stylisation of flamenco over time: a corpus study
- Acoustic correlates of Burmese voiced and voiceless sonorants
- A perception-induced /t/-to-/k/ sound change: evidence from a cross-linguistic study
- Book Notice
- Jonathan Barnes and Stefanie Shattuck-Hufnagel: Prosodic Theory and Practice
- Books available for review
Articles in the same Issue
- Frontmatter
- Research Articles
- Phonetic phenomena in New Flamenco. The linguistic stylisation of flamenco over time: a corpus study
- Acoustic correlates of Burmese voiced and voiceless sonorants
- A perception-induced /t/-to-/k/ sound change: evidence from a cross-linguistic study
- Book Notice
- Jonathan Barnes and Stefanie Shattuck-Hufnagel: Prosodic Theory and Practice
- Books available for review