Abstract
Experimental data on final devoicing in languages such as German and Russian usually show that speakers produce incompletely neutralized acoustic differences between words ending in phonologically voiced versus voiceless obstruents (e.g., /kod/ ‘code’ vs. /kot/ ‘cat’ in Russian) and that listeners can use these differences to identify the underlying specification of final consonants at an above-chance level. The current study examines how the seemingly successful perceptual identification of voicing varies across stimulus items recorded in reading vs. non-reading procedures and with and without full minimal pairs present in the experimental list. Results of a series of identification tasks reveal that Russian listeners’ identification responses are more in line with underlying voicing for the stimuli recorded during word-reading and with minimal pairs included among the experimental items. This shows that voicing judgments are strongly influenced by the acoustic differences produced when speakers encounter orthographic forms or lexical competition. At the same time, perceptual neutralization is also not complete for the items recorded without such exposure, which indicates that listeners’ ability to recover underlying voicing is not limited to the production contexts involving written forms or minimal pairs.
1 Introduction
Traditional accounts of word-final obstruent devoicing assume that the phonological voicing contrast is not preserved at the phonetic level, with word pairs such as Rad ‘wheel’ and Rat ‘advice’ in German and код /kod/ ‘code’ vs. кот /kot/ ‘cat’ in Russian being indistinguishable in both production and perception (Kiparsky 1976, among others). At the same time, experimental studies have repeatedly shown that speakers of neutralizing languages maintain acoustic differences between phonologically voiced versus voiceless final obstruents and that listeners can often identify the voicing setting of such segments at an above-chance level (e.g., Port and O’Dell 1985; Warner et al. 2004). The current study investigates how such identification responses vary depending on whether the perceptual stimuli were recorded using reading vs. non-reading procedures and with vs. without full minimal pairs in the experimental list. The goal is to determine how differences in production context affect the extent to which the phonological contrast is neutralized in perception.
Production data from languages with final devoicing, including German, Dutch, Afrikaans, Polish, and Russian, often show reliable differences between the surface forms of phonologically voiced versus voiceless obstruents (e.g., Chen 1970; Charles-Luce 1985; Port and O’Dell 1985; Slowiaczek and Dinnsen 1985; Tieszen 1997; van Rooy et al. 2003; Piroth and Janker 2004; Smith et al. 2009; Dmitrieva et al. 2010; Rӧttger et al. 2011; Kharlamov 2014; Röttger et al. 2014). This phenomenon is traditionally referred to as ‘incomplete neutralization’, and it tends to affect the acoustic parameters of consonantal duration, glottal pulsing, and preceding vowel duration. Phonologically voiced consonants usually show significantly shorter durations, more glottal pulsing, and longer preceding vowels compared to their voiceless counterparts (e.g., all 3 parameters demonstrate significant differences in Port and O’Dell 1985; van Rooy et al. 2003).
Incomplete neutralization has also been attested in the domain of perception, where the rates of voiced responses vary depending on the phonological voicing setting of the final consonant despite its phonetic devoicing (Port and O’Dell 1985; Port and Crawford 1989; Slowiaczek and Szymanska 1989; Warner et al. 2004; Röttger et al. 2011). For example, during forced-choice two-alternative identification tasks, speakers of German have shown above-chance-level performance on voicing, with the majority of phonologically voiced but phonetically devoiced segments being attributed to the voiced category (e.g., Port and O’Dell 1985; Port and Crawford 1989; Röttger et al. 2011). Comparable results have also been reported for other devoicing languages, including Dutch (e.g., Warner et al. 2004) and Polish (e.g., Slowiaczek and Szymanska 1989), as well as for other types of perceptual tasks, such as discrimination experiments (e.g., Matsui 2011) and rating tasks (e.g., Ernestus et al. 2007a). This suggests that listeners perceive the partial cues to voicing and interpret them as evidence of the target segment being [+voiced].
When voicing effects are observed in production data, they are usually attributed to speakers’ producing incompletely neutralized consonants because they are aware of these segments being voiced at the phonological level or because they know that the same consonant is articulated as voiced when non-final in morphologically-related forms (among others, Port 1996; Port and Leary 2005; Ernestus et al. 2007a, Ernestus et al. 2007b). However, incompletely neutralized differences are also known to be motivated in production by speakers’ exposure to orthographic representations during word-reading tasks and inclusion of minimal pairs among the stimuli (among others, Fourakis and Iverson 1984; Mascaró 1987; Jassem and Richter 1989; Manaster Ramer 1996a, Manaster Ramer 1996b; Warner et al. 2004; Warner et al. 2006; Iverson and Salmons 2011; Kharlamov 2014). For example, Fourakis and Iverson (1984) found incompletely neutralized differences in German in the tokens elicited during a word-reading task but not in the items recorded during an oral task. Kharlamov (2014) reported that robust differences in glottal pulsing between phonologically voiced versus voiceless stops and fricatives in Russian were found during word-reading but not picture-naming/word-guessing, and when the list of test items contained minimally-contrasting word pairs (e.g., /kot/ ‘cat’ ~ /kod/ ‘code’) but not when such pairs were excluded. This shows that both orthography and lexical competition can affect the production of voicing in obstruents, likely by increasing speakers’ awareness of the situational importance of voicing and encouraging the production of prominent acoustic differences between voiced and voiceless obstruents in the parameter or parameters relevant for a given language (e.g., glottal pulsing for Russian speakers, segmental duration for German speakers).
Considering that the previous perception findings have been based on (i) the stimuli recorded in word-reading tasks with voicing-based minimal pairs included in the experimental list (e.g., Port and O’Dell 1985; Port and Crawford 1989; Slowiaczek and Szymanska 1989; Warner et al. 2004), (ii) word-reading tasks without minimal pairs among the stimuli (e.g., Ernestus et al. 2007a), or (iii) oral tasks with minimal pairs among the stimuli (e.g., Jassem and Richter 1989; Matsui 2011; Röttger et al. 2011), listeners’ apparent ability to correctly identify intended voicing may then be due to their reliance on those hyper-articulated differences that are present in the stimuli as a result of speakers’ exposure to either written forms or lexical competition or both. Moreover, many previous perception studies limited their scope to monosyllabic words ending in coronal stops (e.g., Port and Crawford 1989; van Rooy et al. 2003; Warner et al. 2004) and relied on small samples of speakers or stimulus items, such as testing 10 or fewer participants and/or stimulus items (e.g., Port and Crawford 1989; Kopkalli 1993; Matsui 2011; Röttger et al. 2011). However, the patterns that exist at the level of individual speakers or specific words may not necessarily be representative of group-level behaviors and the entire lexicon, and factors such as word length and consonantal place of articulation are also well-known to play a role in the production of the voicing contrast in obstruents (among others, Gamkrelidze 1975; Ohala 1983; Warner and Tucker 2011). As such, the previously reported cases of the preservation of the voicing contrast in perception do not necessarily indicate that perceptual neutralization is incomplete in the language in general. Instead, it may be incomplete only for a subset of lexical items, specific obstruent types, and/or only the stimuli from those production contexts that involve speakers’ exposure to orthographic forms and lexical competition.
The study presented below investigates the exact role of differences in production context during the identification of voicing in Russian, a language that has so far received only limited attention in the experimental literature on the perception of incompletely neutralized contrasts. The study examines participants’ performance on the stimulus items recorded during word-reading vs. picture-naming/word-guessing tasks and in the presence vs. absence of minimal pairs in the experimental list. To ensure that the results are representative of (the local variety of) the Russian language in general, the findings are based on a large set of obstruents and large samples of participants and stimulus items.
2 Method
2.1 Subjects
Two hundred sixteen native speakers of Russian took part in the study (84 male, 132 female; 18–37 years old; mean age of 22). They were undergraduate and graduate students and postdoctoral researchers at a university in Perm, Russia. None majored in linguistics or a related field. None took part in the production study from which the stimuli originated. All participants were recruited and tested on campus and spoke the same local variety of standard (Northern) Russian. The majority (n = 190) self-identified as monolingual. The remainder (n = 26) were raised in bilingual households (Russian and Komi, Tatar, or Udmurt), but they received all their schooling in Russian and did not use languages other than Russian in their daily lives. All participants reported having low to average proficiency in at least one foreign language (English, French, German, Spanish). None indicated having any language or hearing-related physiological disorders.
2.2 Procedures
2.2.1 Stimuli
Perceptual test items (n = 12,528) were created on the basis of 150 stimulus words that were elicited from 78 Russian speakers. The words were distributed equally across 5 different lexical types, including (i) plosive-final monosyllabic minimal pairs (e.g., /kot/ ‘cat’, /kod/ ‘code’), (ii) plosive-final monosyllabic non-minimal pairs (e.g., /zlak/ ‘grass’, /flaɡ/ ‘flag’; */zlaɡ/ and */flak/ are not existing words of Russian), (iii) plosive-final disyllabic non-minimal pairs (e.g., /pirat/ ‘pirate’, /parad/ ‘parade’; cf. */pirad/, */parat/), (iv) fricative-final monosyllabic non-minimal pairs (e.g., /trus/ ‘coward’, /ɡruz/ ‘load’; cf. */truz/, */ɡrus/), and (v) fricative-final disyllabic non-minimal pairs (e.g., /tarif/ ‘tariff’, /zaliv/ ‘bay’; cf. */tariv/, */zalif/). The words from different lexical types were matched on frequency, length in syllables, length in graphemes, grammatical category, and inflectional form. Within each type, the items were recorded using four different production contexts: (i) an oral task with no minimal pairs present among the stimuli (the Orth−MP− context), (ii) an oral task with minimal pairs included among the stimuli (Orth−MP+), (iii) a reading task with no minimal pairs in the stimulus list (Orth+MP−), and (iv) a reading task with full minimal pairs included among the stimuli (Orth+MP+).
Two token types were used in the study: full words and final rhymes. Full words were used for plosive-final monosyllabic minimal pair items only (e.g., /kot/, /kod/). Final rhymes were used for all 5 lexical types (e.g., /ot/ from/ kot/, /aɡ/ from/ flaɡ/ ‘flag’, /if/ from/ tarif/ ‘tariff’). Rhyme-only tokens always started with the steady-state part of the vowel that was determined on the basis of visual inspection of F1 and F2 trajectories in the spectrogram. The use of rhyme-only tokens ensured that participants were not asked to judge the voicing setting of the final consonant in items such as /flaɡ/ that do not have minimal pair counterparts in Russian. All tokens were normalized for volume using an automated script for the PRAAT software (Boersma and Weenink 2009).
A general summary of the acoustic characteristics of the items used to create the perceptual stimuli is provided in Table 1. The table lists the differences in (i) consonantal duration (closure and release duration for stops, frication duration for fricatives), (ii) glottal pulsing, and (iii) preceding vowel duration. These particular parameters were chosen because they have repeatedly shown statistically reliable effects for both production and perception of incompletely neutralized differences in other devoicing languages (e.g., Afrikaans; van Rooy et al. 2003). Following Fourakis and Iverson (1984), glottal pulsing is reported in cycles, and other measurements are given in milliseconds. All segmental durations were determined on the basis of spectrograms and oscillograms. For vowels, the initial boundary was placed at the end of the release or frication noise of the preceding consonant and the beginning of F2 and F3 associated with the vowel or, for vowels preceded by sonorants, the point of an abrupt change in the waveform and the spectral pattern. Consonantal closures were measured from the point of interruption in F2 and F3 associated with the preceding vowel. Releases started at the point of significant increase in waveform amplitude as well as appearance of high-frequency noise in the spectrogram, and they included both the burst and the aspiration noise, if any. Frication duration was measured from the end of the preceding vowel to the end of high-frequency noise associated with the fricative. [1] Significance of production differences was tested in a series of by-subject and by-item RM ANOVAs that examined the effect of underlying voicing, consonantal place of articulation, and production context. Main effects and interactions were resolved using pairwise comparisons (with Bonferroni correction).
Summary of the acoustic characteristics of the perceptual stimuli (comparing phonologically voiced to voiceless obstruents).
Lexical type (final C) | Consonantal duration | Glottal pulsing | Preceding vowel duration |
Minimal pair monosyllables (plosives) | All production contexts: | Orth−MP−: no diff. | All production contexts: |
−7 ms shorter closures**; | Orth−MP+: 1.3 cycles** | − no diff. | |
−6 ms shorter releases** | Orth+MP−: 3.0 cycles* | ||
Orth+MP+: 3.8 cycles*** | |||
Non-minimal pair monosyllables (plosives) | All production contexts: | Orth−MP−: no diff. | All production contexts: |
−5 ms shorter releases** | Orth−MP+: no diff. | − no diff. | |
Orth+MP−: 1.2 cycles** | |||
Orth+MP+: 2.1 cycles*** | |||
Non-minimal pair disyllables (plosives) | All production contexts: | Orth−MP−: no diff. | All production contexts: |
−no diff. | Orth−MP+: 0.6 cycles** | − no diff. | |
Orth+MP−: 1.0 cycles*** | |||
Orth+MP+: 2.1 cycles*** | |||
Non-minimal pair monosyllables (fricatives) | All production contexts: | Orth−MP−: no diff. | All production contexts: |
−13 ms shorter*** | Orth−MP+: 0.9 cycles** | − no diff. | |
Orth+MP−: 1.9 cycles** | |||
Orth+MP+: 3.1 cycles** | |||
Non-minimal pair disyllables (fricatives) | All production contexts: | Orth−MP−: no diff. | All production contexts: |
−8 ms shorter*** | Orth−MP+: 0.8 cycles** | − no diff. | |
Orth+MP−: 1.9 cycles** | |||
Orth+MP+: 2.5 cycles** |
As can be seen in Table 1, items ending in phonologically voiced versus voiceless consonants differed in consonantal duration and glottal pulsing but not preceding vowel duration. For consonantal duration, closures and releases were shorter for minimal pair monosyllables ending in voiced plosives. Releases were also shorter for voiced final stops of non-minimal pair monosyllables. Periods of frication noise were shorter for voiced fricatives in both monosyllables and disyllables. All these differences were found regardless of speakers’ exposure to orthography and minimal pairs. Plosive-final disyllables did not show any significant differences in consonantal duration. For glottal pulsing, all five lexical types showed more cycles for phonologically voiced obstruents. However, the differences were significant only during word-reading or when speakers encountered minimal pairs among the stimuli. Preceding vowel duration did not show any effects of underlying voicing. None of the observed differences were affected by consonantal place of articulation. Thus, the perceptual stimuli contained significant production differences in consonantal duration (that were independent of orthography or lexical competition) as well as significant differences in glottal pulsing (that were found only for the stimuli elicited during word-reading and/or when minimal pairs were included among the stimuli).
2.2.2 Experimental sessions and groups
Participants performed a forced-choice two-alternative identification task. The DMDX software (Forster and Forster 2003) was used to present the auditory stimuli and to record participants’ responses. Stimulus items were delivered binaurally via over-ear Sennheiser headphones. One token was presented at a time, with two alternative orthographic representations of the final consonant displayed concurrently on a computer screen in front of the listener. The two choices differed minimally with respect to the phonemic/orthographic voicing of the consonant. Upon hearing each token, listeners indicated whether the auditory stimulus ended in a voiced or a voiceless consonant by pressing one of two buttons (arrow keys) on a computer keyboard. Subsequent tokens were presented 1 second after a response was registered or after a 3-second time-out if no response was given.
To accommodate the large number of perceptual tokens, listeners were divided at random into four experimental groups and, within each group, into three sub-groups (on the basis of consonantal place of articulation), with each subgroup listening to the same set of stimuli. Group 1 (n = 54; 18 per place of articulation) listened to full words representing plosive-final monosyllabic minimal pairs (e.g., /kot/, /kod/). Group 2 (n = 54; 18 per place of articulation) listened to the same words as Group 1 but heard final rhymes only (e.g., /ot/ from /kot/, /od/ from /kod/). Group 3 (n = 54; 18 per place of articulation) listened to the final rhymes of plosive-final non-minimal pairs (e.g., /ak/ from /zlak/, /ad/ from /parad/). Group 4 (n = 54; 18 per place of articulation) heard the final rhymes of fricative-final items (e.g., /us/ from /trus/, /iv/ from /zaliv/). For the latter two groups, each participant heard half of all rhymes from monosyllabic and disyllabic wordforms, with the selection of tokens counter-balanced across the listeners. The order of presentation was randomized for each participant. Perceptual stimuli were presented in blocks (n = 5) and were grouped by the minimal pair or the rhyme, such that the preceding vowel and the place of articulation of the final consonant were always kept constant within a given block. The order of blocks varied at random across participants. Experimental sessions lasted approximately one hour, with participants taking short self-paced breaks after each block.
2.2.3 Statistical analyses
Statistical analyses were conducted using the IBM SPSS Statistics software (Version 19; IBM, Inc.). The analyses were performed only on the responses given after the offset of the acoustic signal and within 3 standard deviations of the participant’s mean. The proportions of voiced responses were entered as the dependent variable into a series of by-subject (F1) and by-item (F2) RM ANOVAs. Predictor variables included (i) underlying voicing (voiced, voiceless), (ii) consonantal place of articulation (labial, coronal, dorsal), and (iii) production context (Orth−MP−, Orth−MP+, Orth+MP−, Orth+MP+). Separate ANOVAs were conducted for each lexical type. Only those differences that met the F1 × F2 Criterion (Clark 1973; Raaijmakers et al. 1999) were taken to be statistically significant. Main effects and interactions were resolved using pairwise comparisons, with Bonferroni-corrected values reported in the Results section. The relationship between participants’ perceptual judgments and the acoustic cues present in the stimuli was examined in a series of stepwise regressions (separate regressions for each type of perceptual stimuli) that aimed to predict the mean rates of voiced responses from production differences in (i) closure/frication duration, (ii) release duration (for plosives), (iii) glottal pulsing, and (iv) preceding vowel duration that are described in Section 2.2.1 above.
3 Results
3.1 Rates of voiced responses
As can be seen in Table 2, 42.4–44.0% of phonologically voiced obstruents and 26.4–30.3% of voiceless obstruents were classified as voiced. Response rates were significantly different for all obstruent, lexical, and token types, with the mean difference of 16% for plosives and 12.5% for fricatives (full word minimal pair tokens: F1(1,51) = 660.09, p < 0.001; F2(1,24) = 173.68, p < 0.001; rhyme-only minimal pair tokens: F1(1,51) = 465.23, p < 0.001; F2(1,24) = 197.63, p < 0.001; plosive-final monosyllabic non-minimal pairs: F1(1,51) = 393.60, p < 0.001; F2(1,24) = 174.08, p < 0.001; plosive-final disyllabic non-minimal pairs: F1(1,51) = 253.18, p < 0.001; F2(1,24) = 94.95, p < 0.001; fricative-final monosyllables: F1(1,51) = 241.23, p < 0.001; F2(1,24) = 38.60, p < 0.001; fricative-final disyllables: F1(1,51) = 232.28, p < 0.001; F2(1,24) = 32.51, p < 0.001).
Mean rates of voiced responses to tokens ending in phonologically voiced vs. voiceless obstruents.
Final C | Lexical type | Token type | Examples | Voiced | Voiceless | Difference |
Plosives | minimal pair monosyllables | full word | /kot/, /kod/ | 44.0 | 27.7 | 16.3*** |
final rhyme | /(k)ot/, /(k)od/ | 43.0 | 27.1 | 15.9*** | ||
non-minimal pair monosyllables | final rhyme | /(zl)ak/, /(fl)ag/ | 43.4 | 26.4 | 17.0*** | |
non-minimal pair disyllables | final rhyme | /(pir)at/, /(par)ad/ | 42.5 | 27.7 | 14.8*** | |
Mean: | 43.2 | 27.2 | 16.0 | |||
Fricatives | non-minimal pair monosyllables | final rhyme | /(tr)us/, /(gr)uz/ | 42.7 | 30.3 | 12.4*** |
non-minimal pair disyllables | final rhyme | /(tar)if/, /(zal)iv/ | 42.4 | 29.8 | 12.6*** | |
Mean: | 42.6 | 30.1 | 12.5 |
Importantly, all stimuli also demonstrated a significant interaction between underlying voicing and production context (full word minimal pairs (1A): F1(1,51) = 66.13, p < 0.001; F2(1,24) = 13.23, p < 0.001; final rhymes of minimal pairs (1B): F1(1,51) = 43.40, p < 0.001; F2(1,24) = 13.98, p < 0.001; plosive-final non-minimal pair monosyllables (1C): F1(1,51) = 18.28, p < 0.001; F2(1,24) = 9.57, p < 0.001; plosive-final non-minimal pair disyllables (1D): F1(1,51) = 25.40, p < 0.001; F2(1,24) = 12.12, p < 0.001; fricative-final monosyllables (1E): F1(1,51) = 29.53, p < 0.001; F2(1,24) = 10.45, p < 0.001; fricative-final disyllables (1F): F1(1,51) = 24.64, p < 0.001; F2(1,24) = 10.30, p < 0.001). [2] The nature of the interaction can be seen in Figure 1.
Mean rates of voiced responses to phonologically voiceless (light gray) versus voiced (dark gray) obstruents in plosive-final monosyllabic minimal pairs (full words (a), final rhymes (b), plosive-final monosyllabic non-minimal pairs (c), plosive-final disyllables (d), fricative-final monosyllables (e), and fricative-final disyllables (f). The x-axis represents the production context: (i) an oral task with no minimal pairs (Orth-MP-), (ii) an oral task with minimal pairs (Orth−MP+), (iii) a reading task with no minimal pairs (Orth+MP-), and (iv) a reading task with minimal pairs (Orth+MP+). A value of 100 on the y-axis corresponds to 100% of tokens being identified as voiced. Significance of pairwise comparisons is marked as ‘***’ (0.001), ‘**’ (0.01), and ‘*’ (0.05).
As reflected in the figure, pairwise comparisons between voiced and voiceless categories were always significant within a given production context (all ps < 0.001). In addition, voiced and voiceless plosives showed a number of significant differences across the different contexts. For full-word minimal pair tokens (Figure 1a), voiceless segments (e.g., the /t/ in /kot/) demonstrated 3 ~ 4% fewer voiced responses for the items recorded during word-reading than the non-reading tasks (all ps < 0.01). Voiced plosives (e.g., the /d/ in /kod/) showed up to 13% higher rates of voiced responses for word-reading than non-reading and, within word-reading, 7% higher rate of voiced responses in the presence than absence of minimal pairs in the word list (all ps < 0.05). For rhyme-only plosive-final tokens (Figure 1b), voiceless stops (e.g., the/t/ in /(k)ot/) demonstrated up to 4% fewer voiced responses when the items came from either word-reading or from a non-reading task but in the presence of minimal pairs among the stimuli compared to the items that were recorded during non-reading and with minimal pairs excluded from the word list (all ps < 0.01). For voiced stops (e.g., the/d/ in /(k)od/), the rates of voiced responses were 11% higher for word-reading than non-reading and, within word-reading, 8% higher in the presence than absence of minimal pairs in the word list (all ps < 0.05).
For plosive-final tokens excised from non-minimal pairs (e.g., /at/ from /brat/, /ad/ from /parad/), monosyllabic items (Figure 1c) had up to 5% fewer voiced responses to voiceless sounds from the Orth−MP− context compared to all other production contexts (all ps < 0.01). Voiced plosives from the Orth+MP+ context showed up to 10% more voiced responses compared to the rest of the contexts (all ps < 0.01). Disyllabic items (Figure 1d) demonstrated 3% higher rates of voiced responses to voiceless segments from the Orth-MP- context compared to the Orth+MP+ context (p < 0.01) as well as up to 12% higher rates of voiced responses to voiced segments across all pairwise comparisons (all ps < 0.05) except the Orth-MP- and Orth-MP+ pair (p > 0.1).
For perceptual tokens excised from fricative-final items (e.g., /af/ from /ɡraf/, /uz/ from /arbuz/), final rhymes of monosyllables (Figure 1e) showed up to 6% fewer voiced responses to voiceless fricatives from the Orth+MP+ context compared to all other contexts (all ps < 0.05) and up to 9% higher rates of voiced responses to voiced segments in all pairwise comparisons (all ps < 0.05) except the Orth+MP− and Orth+MP+ pair (p > 0.1). For disyllable-based items (Figure 1f), rates of voiced responses to voiceless final stops were not different in the case of the Orth−MP− and Orth+MP− pair (p > 0.1) but different in all other pairwise comparisons (all ps < 0.05). For voiced fricatives, rates of voiced responses to phonologically voiced final fricatives were also significant in all pairwise comparisons (all ps < 0.05) other than the Orth-MP- and Orth-MP+ pair (p > 0.1).
Furthermore, as shown in Figure 2, plosive-final items also showed a significant interaction between underlying voicing and place of articulation (full word minimal pairs: F1(2,51) = 45.16, p < 0.001; F2(2,24) = 12.12, p = 0.005; final rhymes of minimal pairs: F1(2,51) = 37.82, p < 0.001; F2(2,24) = 13.98, p < 0.001; non-minimal pair monosyllables: F1(2,51) = 27.61, p < 0.001; F2(2,24) = 11.95, p = 0.005; non-minimal pair disyllables: F1(2,51) = 23.15, p < 0.001; F2(2,24) = 8.28, p = 0.002). [3] Pairwise comparisons revealed that differences in responses to voiced versus voiceless obstruents were significant within each place of articulation (all ps < 0.001). At the same time, listeners heard the voicing distinction more clearly in velar plosives. Voiceless velars received significantly fewer voiced responses than voiceless labials or coronals (all ps < 0.001), and voiced velars showed higher rates of voiced responses than voiced labials (all ps < 0.01).
Mean rates of voiced responses to phonologically voiceless (light gray) versus voiced (dark gray) plosives in monosyllabic minimal pairs (full words (a), final rhymes (b), monosyllabic non-minimal pairs (c), and disyllabic non-minimal pairs (d). The x-axis represents consonantal place of articulation (labial, coronal, velar). A value of 100 on the y-axis corresponds to 100% of tokens being identified as ending in a voiced plosive. Significance of pairwise comparisons is marked as ‘***’ (0.001), ‘**’ (0.01), and ‘*’ (0.05).
Unlike plosives, final fricatives did not show a significant interaction between underlying voicing and consonantal place (Fs < 1, ps > 0.1). An interaction between place and production context was observed only for fricative-final disyllables (F1(2,51) = 7.20, p < 0.001; F2(2,24) = 3.52, p =0.007), which was due to the fact that coronal-final tokens produced in the non-reading tasks received fewer voiced responses regardless of underlying voicing (all ps < 0.01). The rest of the interactions were not significant (Fs < 1, ps > 0.05).
3.2 Regression modeling
Results of regression analyses revealed that the acoustic parameters of glottal pulsing, release duration, and preceding vowel duration could account for up to 56.5% of variance in the rates of voiced responses. For plosive-final minimal pair tokens, glottal pulsing alone could explain 47.2% of variance for full-word items (β = 0.687, t(118) = 10.28, p < 0.001; R2 = 0.472, F(1,118) = 105.65, p < 0.001) and 47.8% of variance for rhyme-only tokens (β = 0.692, t(118) = 10.40, p < 0.001; R2 = 0.478, F(1,118) = 108.19, p < 0.001). With release duration added as the second predictor, the model improved by 7.6% (to 54.8%) for full-word items (β1 = 0.628, t1(117) = 9.87, p < 0.001, β2 = −0.281, t2(117) = –4.42, p < 0.001; R2 = 0.548, F(2,117) = 70.86, p < 0.001) and by 6.8% (to 54.6%) for rhyme-only tokens (β1 = 0.635, t1(117) = 9.96, p < 0.001, β2 = −0.266, t2(117) = –4.17, p < 0.001; R2 = 0.546, F(2,117) = 70.27, p < 0.001). Introduction of preceding vowel duration as the third predictor did not affect the model for rhyme-only tokens (|t| < 1.5, p > 0.1) but led to an additional increase of 1.7% (to 56.5%) for full-word items (β1 = 0.631, t1(116) = 10.08, p < 0.001, β2 = −0.282, t2(116) = –4.5, p < 0.001, β3 = 0.133, t3(116) = 2.17, p < 0.05; R2 = 0.565, F(3,116) = 50.31, p < 0.001). Closure duration did not contribute to the model for either full or truncated tokens (all |t|s < 1.5, all ps > 0.1). Thus, for full-word minimal pair tokens, the observed variance in the rates of voiced responses could be predicted from the production differences in glottal pulsing, release duration, and preceding vowel duration, with vocal fold vibration being the single most prominent contributor. For rhyme-only minimal pair items, identification responses could be accounted for on the basis of glottal pulsing and, to a limited extent, release duration.
For plosive-final non-minimal pair tokens, a model with glottal pulsing as the only predictor explained 37.7% of variance for monosyllables (β = 0.614, t(118) = 8.44, p < 0.001; R2 = 0.377, F(1,118) = 71.26, p < 0.001) and 36.3% of variance for disyllables (β = 0.603, t(118) = 8.20, p < 0.001; R2 = 0.363, F(1,118) = 67.29, p < 0.001). Addition of release duration as the second predictor enhanced the model by 5.8% (to 43.5%) for monosyllables (β1 = 0.588, t1(117) = 8.42, p < 0.001, β2 = −0.243, t2(117) = –3.48, p = 0.001; R2 = 0.435, F(2,117) = 45.01, p < 0.001), but it did not lead to a significant model improvement for disyllables (|t| < 1.5, p > 0.1). Preceding vowel duration or closure duration did not contribute to the model for either monosyllables or disyllables (all |t|s < 1.5, all ps > 0.1). As such, for monosyllabic non-minimal pair items, production differences in glottal pulsing and, to a more limited extent, release duration could both account for the variance in identification responses. For disyllable-based tokens, glottal pulsing was the only significant contributor.
For tokens ending in fricatives, glottal pulsing explained 13.8% of variance for monosyllables (β = 0.372, t(118) = 4.35, p < 0.001; R2 = 0.138, F(1,118) = 18.95, p < 0.001) and 14.2% of the variance for disyllables (β = 0.377, t(118) = 4.43, p < 0.001; R2 = 0.142, F(1,118) = 19.60, p < 0.001). None of the remaining acoustic parameters enhanced the regression model for either monosyllabic or disyllabic items (all |t|s < 1.5, all ps > 0.1).
4 Discussion
The current study examined how identification of underlying voicing in incompletely neutralized obstruents varies across stimulus items recorded during reading vs. non-reading procedures when minimal pairs are either present or absent among the stimuli. Results from Russian revealed that, overall, underlying voicing was perceptible with limited accuracy only. On average, between 42–44% of voiced obstruents and 26–30% of voiceless consonants were attributed to the voiced category. This shows a general bias for voiceless responses, which parallels the findings of the Polish study in Jassem and Richter (1989) and the Afrikaans investigation in van Rooy et al. (2003) that also found high proportions of voiceless responses to all final obstruents regardless of their phonological specification.
At the same time, identification responses also differed depending on which production context the stimuli came from. When perceptual tokens were originally recorded during word-reading and with minimal pairs included in the experimental list, responses to the two voicing categories differed by 21–25%. When the same items were created during a non-reading procedure and in the absence of minimal pairs among the stimuli, response rates showed differences of 6–10% only. Thus, identification responses were less consistent with phonological specifications for the tokens elicited in non-reading tasks and without minimal pairs among the stimuli. This shows an effect of differences in production context and suggests that identification responses were affected by the acoustic cue of glottal pulsing, which is the parameter that showed robust production differences during word-reading and in the presence of minimal pairs but either limited or no differences during non-reading tasks and in the absence of minimally-contrasting word pairs. The primary role of glottal pulsing is also corroborated by the findings of regression modeling, with vocal fold vibration being the most prominent predictor for plosive-final monosyllables and the only significant predictor for plosive-final disyllables and both monosyllabic and disyllabic fricative-final tokens.
However, even though the tokens recorded during non-reading tasks and without minimal pairs among the stimuli showed a strong voiceless bias, rates of voiced responses were still 6–10% higher for voiced than voiceless obstruents. This demonstrates that listeners are able to use not only the cues motivated by orthography or lexical competition but also the differences present in production even without speakers’ exposure to written forms or minimal pairs. This is further confirmed by the result of regression analyses in which closure/frication duration, release duration, and preceding vowel duration (which are the parameters that did not show any enhancement of production differences as a result of speakers’ exposure to orthography or lexical competition) were identified as limited but nevertheless significant predictors that accounted for 1.7–7.6% of variance. Note, however, that since only the steady-state part of the vowel was included in the rhyme-only perceptual stimuli, the exact nature of the interaction between voicing and preceding vowel duration remains to be investigated.
Taken together, these findings suggest that the apparently successful identification of underlying voicing in incompletely neutralized final obstruents can be attributed only in part to listeners’ taking advantage of the differences produced when speakers are exposed to orthographic representations and minimal pairs. When such cues are unavailable, listeners show a stronger bias for voiceless responses, yet the rates of responses to voiced versus voiceless obstruents are never identical. Thus, perceptual identification of voicing appears to involve both the prominent incompletely neutralized differences that are the result of speakers’ exposure to written forms and minimal pairs as well as the subtle incompletely neutralized differences that are not directly motivated by orthography or lexical competition.
For plosive-final tokens, perception results also revealed an effect of consonantal place of articulation, with velar stops showing greater consistency with underlying voicing. This parallels the findings of Port and O’Dell (1985) who also observed more accurate identification in the case of velar obstruents. As noted in the description of the stimuli, place of articulation did not have a significant effect in production on the degree of preservation of voicing. This means that listeners appear to be more sensitive to the presence of the same amount of glottal pulsing in velars than labials or coronals. Although the exact causes for this effect would need to be investigated independently in future research, one likely explanation comes from the aerodynamics of speech. Physiological reasons do not favor the production of glottal pulsing in velar stops. Velars are articulated with a smaller space between the glottis and the oral constriction, which results in a smaller volume of air being able to pass through the vocal folds during the closure stage of the obstruent (Gamkrelidze 1975; Ohala 1983; Ohala 1997). As voicing is less expected for velars in production, it may be more perceptually salient in velar plosives, and this may explain why Russian listeners performed better on velars than non-velars.
Finally, the findings presented above also have implications for the phonological accounts of final devoicing and incomplete neutralization. The data show that neutralization is not fully complete in perception for the stimuli recorded without speakers’ exposure to orthographic forms and minimal pairs, yet the observed differences only result in a slight weakening of the voiceless bias rather than a categorical shift in voicing judgments. As such, it seems unlikely that speakers produce incompletely neutralized differences because they intend to provide the listeners with cues to phonological voicing. If speakers’ goal were to disambiguate the potentially homophonous forms, voicing cues would be expected to have a major effect on the outcome of the categorization process across all production contexts and not only for the stimuli elicited during word-reading and with minimal pairs included in the experimental list. This suggests that such differences are not the result of a specific articulatory goal but more likely a by-product of various automated processes taking place during lexical access and speech planning, such as automatic co-activation of morphologically related forms (Ernestus and Baayen 2006; Goldrick and Blumstein 2006; Winter and Röttger 2011; Röttger et al. 2014). This seems especially likely considering the generally small magnitude of production differences in incomplete neutralization studies, such as the vowel duration differences of 3.5 ms in Warner et al. (2004) or the consonantal duration differences of 5 ms in Kharlamov (2014). During perception of non-laboratory speech, listeners also normally have access to non-acoustic (e.g., syntactic, semantic) cues to the voicing setting of the consonant, which further diminishes the importance of partially neutralized differences.
5 Conclusions
The study presented above examined the perception of incompletely neutralized voicing cues in word-final stops and fricatives in Russian. The goal of the investigation was to determine how identification of underlying voicing varied across stimulus items recorded using reading vs. non-reading procedures and with vs. without speakers’ exposure to minimal pairs. Results of a series of identification tasks revealed that listeners’ responses were more in line with underlying voicing for the stimuli elicited during word-reading and with minimal pairs included among the stimuli. However, significant differences in response rates were also found for the stimulus items from other production contexts. This shows that voicing judgments are influenced by the production context, yet perceptual neutralization can be incomplete even when the stimuli are recorded without speakers’ exposure to orthography and lexical competition.
References
Boersma, Paul & DavidWeenink.2009. Praat: Doing phonetics by computer (Version 5.1.05) [Computer program]. http://www.praat.org/ (accessed 1 May 2009).Search in Google Scholar
Charles-Luce, Jan. 1985. Word-final devoicing in German: Effects of phonetic and sentential contexts. Journal of Phonetics13. 309–324.10.1016/S0095-4470(19)30762-4Search in Google Scholar
Chen, Matthew.1970. Vowel length variation as a function of the voicing of the consonant environment. Phonetica22. 129–159.10.1159/000259312Search in Google Scholar
Clark, Herbert.1973. The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning & Verbal Behavior12. 335–359.10.1016/S0022-5371(73)80014-3Search in Google Scholar
Dmitrieva, Olga, AllardJongman & JoanSereno.2010. Phonological neutralization by native and non-native speakers: The case of Russian final devoicing. Journal of Phonetics38. 483–492.10.1016/j.wocn.2010.06.001Search in Google Scholar
Ernestus, Miriam & HaroldBaayen.2006. The functionality of incomplete neutralization in Dutch: The case of past-tense formation. In LouisGoldstein, DouglasWhalen & CatherineBest (eds.), Laboratory phonology 8: Varieties of phonological competence, 27–49. Berlin: Mouton de Gruyter.Search in Google Scholar
Ernestus, Mirjam & HaroldBaayen.2007a. Intraparadigmatic effects on the perception of voice. In Eric Janvan der Torre & Jeroenvan de Weijer (eds.), Voicing in Dutch, 153–173. Amsterdam: Benjamins.10.1075/cilt.286.07ernSearch in Google Scholar
Ernestus, Mirjam & HaroldBaayen.2007b. Paradigmatic effects in auditory word recognition: The case of alternating voice in Dutch. Language & Cognitive Processes22. 1–24.10.1080/01690960500268303Search in Google Scholar
Forster, Kenneth & JonathanForster.2003. DMDX: A windows display program with millisecond accuracy. Behavior Research Methods, Instruments & Computers35. 116–124.Search in Google Scholar
Fourakis, Marios & GregoryIverson.1984. On the ‘incomplete neutralization’ of German final obstruents. Phonetica41. 140–149.10.1159/000261720Search in Google Scholar
Gamkrelidze, Thomas.1975. On the correlation of stops and fricatives in a phonological system. Lingua35. 231–361.10.1016/0024-3841(75)90060-1Search in Google Scholar
Goldrick, Matthew & SheilaBlumstein.2006. Cascading activation from phonological planning to articulatory processes: Evidence from tongue twisters. Language and Cognitive Processes21. 649–683.10.1080/01690960500181332Search in Google Scholar
Iverson, Gregory & JosephSalmons.2011. Final devoicing and final laryngeal neutralization. In Marcvan Oostendorp, ColinEwen, ElizabethHume & KerenRice (eds.), The Blackwell companion to phonology, 1622–1643. Malden, MA: Wiley-Blackwell.10.1002/9781444335262.wbctp0069Search in Google Scholar
Jassem, Wiktor & LutoslawaRichter.1989. Neutralization of voicing in Polish obstruents. Journal of Phonetics17. 317–325.10.1016/S0095-4470(19)30447-4Search in Google Scholar
Kharlamov, Viktor.2012. Incomplete neutralization and task effects in experimentally-elicited speech: Evidence from the production and perception of word-final devoicing in Russian. Ottawa: University of Ottawa Ph.D. dissertation.Search in Google Scholar
Kharlamov, Viktor.2014. Incomplete neutralization of the voicing contrast in word-final obstruents in Russian: Phonological, lexical, and methodological influences. Journal of Phonetics43. 47–53.Search in Google Scholar
Kiparsky, Paul., 1976. Abstractness, opacity, and global rules. In Andreas, Koutsoudas (ed.), The application and ordering of grammatical rules, 160–186. The Hague: Mouton.Search in Google Scholar
Kopkalli, Handan.1993. A phonetic and phonological analysis of final devoicing in Turkish. Ann Arbor, MI: University of Michigan Ph.D. dissertation.Search in Google Scholar
Manaster Ramer, Alexis. 1996a. A letter from an incompletely neutral phonologist. Journal of Phonetics24. 477–489.10.1006/jpho.1996.0026Search in Google Scholar
Manaster Ramer, Alexis. 1996b. Report on Alexis’ dreams—bad as well as good. Journal of Phonetics24. 513–519.10.1006/jpho.1996.0028Search in Google Scholar
Mascaró, Joan. 1987. Underlying voicing recoverability of finally devoiced obstruents in Catalan. Journal of Phonetics15. 183–186.10.1016/S0095-4470(19)30557-1Search in Google Scholar
Matsui, Mayuki.2011. The identifiability and discriminability between incompletely neutralized sounds: evidence from Russian. In Wai-SumLee & EricZee (eds.), Proceedings of the 17th International Congress of Phonetic Sciences, 1342–1345. Hong Kong: City University of Hong Kong.Search in Google Scholar
Ohala, John.1983. The origin of sound patterns in vocal tract constraints. In PeterMacNeilage (ed.), The production of speech, 189–216. Berlin: Springer.10.1007/978-1-4613-8202-7_9Search in Google Scholar
Ohala, John.1997. Phonetics in phonology. In Proceedings of the 4th Seoul International Conference on Linguistics, 45–50. Seoul: Linguistic Society of Korea.Search in Google Scholar
Piroth, Hans & PeterJanker.2004. Speaker-dependent differences in voicing and devoicing of German obstruents. Journal of Phonetics32. 81–109.10.1016/S0095-4470(03)00008-1Search in Google Scholar
Port, Robert.1996. Phonetic discreteness and formal linguistics: Reply to A. Manaster-Ramer. Journal of Phonetics24. 491–511.10.1006/jpho.1996.0027Search in Google Scholar
Port, Robert & PennyCrawford.1989. Incomplete neutralization and pragmatics in German. Journal of Phonetics17. 257–282.10.1016/S0095-4470(19)30444-9Search in Google Scholar
Port, Robert & AdamLeary.2005. Against formal phonology. Language81. 927–964.10.1353/lan.2005.0195Search in Google Scholar
Port, Robert & MichaelO’Dell. 1985. Neutralization of syllable-final voicing in German. Journal of Phonetics13. 455–471.10.1016/S0095-4470(19)30797-1Search in Google Scholar
Raaijmakers, Jeroen, JosephSchrijnemakers & FransGremmen.1999. How to deal with the ‘language-as-fixed-effect fallacy’: Common misconceptions and alternative solutions. Journal of Memory & Language41. 416–426.10.1006/jmla.1999.2650Search in Google Scholar
Röttger, Timo, BodoWinter & SvenGrawunder. 2011. The robustness of incomplete neutralization in German. In Wai-SumLee & EricZee (eds.), Proceedings of the 17th International Congress of Phonetic Sciences, 1342–1345. Hong Kong: City University of Hong Kong.Search in Google Scholar
Röttger, Timo, BodoWinter, SvenGrawunder, JamesKirby & MartineGrice. 2014. Assessing incomplete neutralization of final devoicing in German. Journal of Phonetics43. 11–25.10.1016/j.wocn.2014.01.002Search in Google Scholar
Slowiaczek, Louisa & DanielDinnsen.1985. On the neutralizing status of Polish word-final devoicing. Journal of Phonetics13. 325–341.10.1016/S0095-4470(19)30763-6Search in Google Scholar
Slowiaczek, Louisa & HelenaSzymanska.1989. Perception of word-final devoicing in Polish. Journal of Phonetics17. 205–212.10.1016/S0095-4470(19)30430-9Search in Google Scholar
Smith, Bruce, RachelHayes-Harb, MichaelBruss & AmyHarker. 2009. Production and perception of voicing and devoicing in similar German and English word pairs by native speakers of German. Journal of Phonetics37. 257–275.10.1016/j.wocn.2009.03.001Search in Google Scholar
Tieszen, Bozena.1997. Final stop devoicing in Polish: An acoustic and historical account for incomplete neutralization. Madison, WI: University of Wisconsin Ph.D. dissertation.Search in Google Scholar
van Rooy, Bertus, DaanWissing & DwaynePaschall.2003. Demystifying incomplete neutralisation during final devoicing. Southern African Linguistics and Applied Language Studies21. 49–66.10.2989/16073610309486328Search in Google Scholar
Warner, Natasha, ErinGood, AllardJongman & JoanSereno.2006. Orthographic vs. morphological incomplete neutralization effects. Journal of Phonetics34. 285–293.10.1016/j.wocn.2004.11.003Search in Google Scholar
Warner, Natasha, AllardJongman, JoanSereno & RachelKemps.2004. Incomplete neutralization and other sub-phonemic durational differences in production and perception: Evidence from Dutch. Journal of Phonetics32. 251–276.10.1016/S0095-4470(03)00032-9Search in Google Scholar
Warner, Natasha & BenTucker.2011. Phonetic variability of stops and flaps in spontaneous and careful speech. Journal of the Acoustical Society of America130. 1606–1617.10.1121/1.3621306Search in Google Scholar
Winter, Bodo & TimoRöttger. 2011. The nature of incomplete neutralization in German: Implications for laboratory phonology. Grazer Linguistische Studien76. 55–74.Search in Google Scholar
©2015 by De Gruyter Mouton
Articles in the same Issue
- Frontmatter
- Perception of incompletely neutralized voicing cues in word-final obstruents: The role of differences in production context
- Categorical and gradient homophony avoidance: Evidence from Japanese
- The aerodynamic puzzle of nasalized fricatives: Aerodynamic and perceptual evidence from Scottish Gaelic
- Informativity affects consonant duration and deletion rates
Articles in the same Issue
- Frontmatter
- Perception of incompletely neutralized voicing cues in word-final obstruents: The role of differences in production context
- Categorical and gradient homophony avoidance: Evidence from Japanese
- The aerodynamic puzzle of nasalized fricatives: Aerodynamic and perceptual evidence from Scottish Gaelic
- Informativity affects consonant duration and deletion rates