Abstract
The present article describes a modified and extended replication of a corpus study by Brewer (2008. Phonetic reflexes of orthographic characteristics in lexical representation. Tucson, AZ: University of Arizona PhD thesis) which reports differences in the acoustic duration of homophonous but heterographic sounds. The original findings point to a quantity effect of spelling on acoustic duration, i.e., the more letters are used to spell a sound, the longer the sound’s duration. Such a finding would have extensive theoretical implications and necessitate more research on how exactly spelling would come to influence speech production. However, the effects found by Brewer (2008) did not consistently reach statistical significance and the analysis did not include many of the covariates which are known by now to influence segment duration, rendering the robustness of the results at least questionable. Employing a more nuanced operationalization of graphemic units and a more advanced statistical analysis, the current replication fails to find the reported effect of letter quantity. Instead, we find an effect of graphemic complexity. Speakers realize consonants that do not have a visible graphemic correlate with shorter durations: the /s/ in tux is shorter that the /s/ in fuss. The effect presumably resembles orthographic visibility effects found in perception. In addition, our results highlight the need for a more rigorous approach to replicability in linguistics.
1 Introduction
Contemporary communication relies heavily on the written mode with an ever-growing share of our daily interactions happening via or at least involving written messages. Producing and consuming short and long form texts is of increasing importance for both our private and professional lives. With this burgeoning reliance on written language, it is no wonder there is an increasing interest in research on writing and its intricate interactions with speech. Although the language system of a literate individual comprises both a phonological system and a spelling system, traditionally, research has conceptualized a supportive role for written language. In this view, written words piggyback on the primacy of spoken words, and spelling functions as a kind of meta-linguistic knowledge that exerts only task-specific influence (c.f., Damian and Bowers 2009; Ehri and Wilce 1980; Linell 2005; Mitterer and Reinisch 2015). However, a steadily growing number of studies provide support for the view that the interaction between spoken and written language might be much more reciprocal and interactive than has been previously assumed.
Over the last few decades, a growing body of evidence has been acquired that the spelling of words has an influence on the perception and processing of spoken words. The perhaps most compelling evidence for spelling effects on speech perception comes from studies on what has been termed the consistency effect. This effect arises both from pronunciations that can be spelled in various ways and from spellings that can be pronounced in various ways. It has been demonstrated, for example, that reaction times in lexical decision tasks are slowed down by inconsistent spellings (e.g., Ziegler and Ferrand 1998), or that an overlap in orthographic form has a facilitative effect on rhyming judgments (Seidenberg and Tanenhaus 1979). Beyond this consistency effect, it has also been suggested that the frequency of a given spelling has an effect on its processing. In a phoneme monitoring task, Dijkstra et al. (1995) report that phonemes are recognized faster if they appear in their most frequent spelling.
Further, it has been repeatedly suggested that the conceptualization of sounds is also influenced by their written form. In a phonetic segmentation task, Ehri and Wilce (1980) show that literate speakers tend to identify one more sound in words like pitch with a tch spelling compared to rich with a ch spelling. This is similarly suggested by Bürki et al. (2012) who find that, in a naming experiment, the letter “e” in the spelling of French pseudowords increases the probability of a schwa vowel in production even if the speakers were previously exposed to auditory forms of the same pseudowords that did not contain a corresponding schwa phoneme. Moreover, pseudowords that contain an orthographic correlate of the schwa sound are produced slower in comparison to words that do not contain such correlates. The slowed response times are taken to indicate a complex competition between orthographic and phonological representation. While the former is generated by learning the new spelling, the latter has already been established by repeated auditory exposure to the reduced variant (Bürki et al. 2012: 462).
It has also been suggested that similarity of orthographic and phonological form intricately interact. Grainger et al. (2005) report that the directionality of phonological neighborhood effects (e.g., Chen et al. 2016; Yates 2005; Yates et al. 2004) is modulated by the density of the corresponding orthographic neighborhood. Increasing phonological neighborhood size results in inhibitory effects for words with sparse orthographic neighborhoods but has facilitative effects for words with dense orthographic neighborhoods. The findings receive further support from Carrasco-Ortiz et al. (2017), who investigate the influence of neighborhood density interaction on event-related potentials. In line with Grainger et al. (2005)’s findings, the authors report larger N400 amplitudes for words with a similar number of both phonological and orthographic neighbors. In the discussion of their results, both Grainger et al. (2005) and Bürki et al. (2012) argue for a cross-code consistency account to explain these patterns of orthographic and phonological neighborhood effects.
At this point, three things are important to note. First, the reported findings for perception appear to be robust across languages and paradigms. Second, the majority of studies on spelling effects report them in the auditory domain, which suggests that – even if the lexicon is accessed via the phonological code – information on spelling must also be active in some form. Third, spelling effects have been observed for both pronunciations that can be spelled in various ways and for spellings that can be pronounced in various ways. This is especially relevant if we look at the majority of established models of speech perception and production which generally assume that activation flows from orthography to phonology but not the other way round (e.g., Levelt et al. 1999; Luce and Pisoni 1998; McClelland and Elman 1986). Although most models of writing and spelling conceptualize a dual route – i.e., phonological mediation and additionally the possibility of autonomous access to orthographic information in production – there is still the prevalent idea of a unidirectional influence of spoken language on written language (e.g., Caramazza 1997; Logan and Crump 2011; Miceli and Capasso 2006; Rapp et al. 1997). The effects we just mentioned, however, suggest that we are looking at both feed-forward as well as feedback effects. In other words, these findings present a serious challenge to the prevalent views and necessitate more research on how exactly spelling would come to influence speech perception and also production.
So far, three competing explanations have been discussed in the literature as to how spelling would come to influence speech perception and production. According to the first, orthographic information exerts only task-specific influence, offering the speaker strategic advantages, for example, to discriminate between words (e.g., Damian and Bowers 2009; Mitterer and Reinisch 2015). The second explanation hypothesizes a bidirectional link between the spoken and written language system that is co-activated online, as has been suggested for processing by, for example, Grainger and Ferrand (1996) or Ziegler and Ferrand (1998). Thirdly, it has been hypothesized that learning how to read and write could permanently alter the previously acquired phonological representations, resulting in ‘phonographic’ representations (Pattamadilok et al. 2014: 104). In other words, offline restructuring integrates spelling into lexical representations (see, e.g., Perre et al. 2009; Pattamadilok et al. 2014 for more detailed overviews). To date, none of the existing studies can convincingly argue for one of these accounts or rule out one of the others definitely.
In order to shed further light on the question of how exactly an individual’s experience with written language comes to influence their speech capacities, it is necessary to also look at spelling effects in speech production. Unfortunately, this area of research has received little attention to date. Table 1 gives a non-exhaustive overview of the existing studies that have either looked at homophonous words, at homophonous syllables and segments with heterographic spellings, or at novel words with spelling-to-sound or sound-to-spelling inconsistencies. The majority of these studies has been carried out on alphabetic languages. Researchers have predominantly focused on either reaction times or acoustic duration as response variables.
Overview of spelling effects in speech found in previous studies. Where “yes” is put in brackets, this indicates either inconsistencies across conditions or results just below the level of significance.
| Phenomenon | Response variable | Source | Language | Spelling? |
|---|---|---|---|---|
| homophones | acoustic duration |
Warner et al. (2004)
Warner et al. (2006) |
Dutch | yes |
| Gahl (2008) | English | no | ||
| Hellwig and Indefrey (2017) | German | (yes) | ||
| Seyfarth et al. (2018) | English | no | ||
| Grippando (2021) | Japanese | yes | ||
| response times | Wheeldon and Monsell (1992) | English | (yes) | |
| homophonous segments | acoustic duration | Brewer (2008) | English | (yes) |
| Plag et al. (2020) | English | no | ||
| response times (priming) | Damian and Bowers (2003) | English | yes | |
| Roelofs (2006) | Dutch | no | ||
| Alario et al. (2007) | French | no | ||
| homophonous syllables | Alario et al. (2007) | French | no | |
| novel words | response times (object naming) | Rastle et al. (2011) | English | yes |
| Bürki et al. (2012) | French | yes | ||
| Zhang and Damian (2012) | Chinese | (yes) | ||
| Han and Choi (2016) | Korean | yes |
As can be seen from this overview, the results are far from consistent. The reported effects are often not only of a small magnitude but also inconsistent across speakers or various conditions within an experiment, sometimes in unpredictable ways. The most coherent results are obtained in studies on novel words or pseudowords, which suggests that novel word learning may potentially strengthen the influence of spelling on speech production. This view is additionally supported by evidence from second language acquisition (e.g., Bürki et al. 2019; Young-Scholten 2002). What might make the acquisition of novel words more prone to an influence of spelling is the fact that learning the phonological and the orthographic code is much more interconnected in these cases. In first language learning, speakers are exposed to orthographic forms long after they have already acquired the phonological forms. In comparison, in novel word learning and second language acquisition, learning the orthographic form often coincides with learning the phonological form (cf. Ulicheva et al. 2021a, 2021b on orthographic effects in L1 and L2). However, looking exclusively at pseudowords or second language learning also increases the risk of overlooking potentially confounding factors, such as age of acquisition or mode of learning (cf. Sadoski 2005 on the dual coding theory).
1.1 Why letters might matter
In order to better understand the effects of spelling in speech production, it seems promising to take a closer look at the acoustic properties of linguistic forms with different spellings. Specifically, a spelling effect on segment durations would lend support to those views that argue that the acquired orthographic information either permanently alters the lexical entries, or that this information is co-activated online during speech production. The existing studies on spelling and acoustic duration can be split up into two groups: those that look at the word level and those that look at sublexical units below the word level. As examples of the latter group, Warner et al. (2004) and Warner et al. (2006) look at potential effects of spelling on acoustic duration in the case of incomplete neutralization. The researchers report durational differences for consonants and vowels in Dutch homophone pairs such as heten ‘to be called’ versus heetten ‘were called’. The main result reported by Warner and colleagues is that consonants are produced significantly longer for words spelled with a double consonant letter, which the authors attribute directly to the differences in spelling. Similarly, in a study on frequency effects in German homophones, Hellwig and Indefrey (2017) find durational differences only for heterographic homophone pairs such as Seite ‘page’ versus Saite ‘string’, while no effect of frequency can be found in homographic pairs such as Melone ‘melon’ versus Melone ‘bowler hat’. Most recently, and for a non-alphabetic language, Grippando (2021) finds that Japanese homophones also exhibit comparable durational differences related to the number of characters in their orthographic form. In short, Grippando (2021) reports that words represented by two characters were produced significantly longer than those words that are represented by just one character. In contrast, there are also a number of studies on differences in acoustic duration that do not find an effect of spelling, specifically not one that is related to the number of letters or signs. Gahl (2008), Plag et al. (2020), and Seyfarth et al. (2018), for example, all report null results for an influence of spelling on the acoustic duration. Below the word-level, a seminal study by Brewer (2008) reports systematic durational differences for homophonous sounds that allow for heterographic representation. The main effect reported in this study is that segment duration is positively correlated with length in letters. For instance, the word-final /k/ in clique, which is spelled with three letters, appears to be significantly longer than /k/ in click (two letters), or /k/ in tic spelled with just one letter.
To summarize at this point, a growing number of results suggest a direct link between the orthographic representation of a word or phoneme and its acoustic properties – a link that is not explicable by most established models of speech production such as Levelt et al. (1999), as the forms investigated in the pertaining studies are traditionally considered to be homophonous despite their heterography. These models conceptualize homophonous forms as having identical underlying representations and phonetically identical realizations (see also Roelofs and Ferreira 2019 for an updated version of a feed-forward architecture model). Recent findings on the interaction of morphological structure with phonetic detail have increasingly called such strictly categorical approaches to homophony into question (e.g., Drager 2011; Gahl 2008; Plag et al. 2017, 2020; Schmitz et al. 2021; Seyfarth et al. 2018). It has to be noted in this context, though, that, with the exception of Brewer (2008) and Grippando (2021), the studies mentioned were not designed to specifically test the influence of orthographic information on speech production. Rather, spelling emerged as a potential post-hoc explanation of findings once some of the pertinent covariates were controlled for statistically. It is therefore not surprising that these studies did not find convincing evidence for an effect of spelling, where a study with a clear focus on spelling may have yielded different results.
As was just mentioned, Brewer (2008) presents one of two exceptions here. The series of experiments and the corpus study described in this unpublished doctoral thesis were among the first to specifically examine a direct correlation of acoustic duration and graphemic length in number of letters. The main result reported is that there is a positive correlation between the number of letters used in the representation of a word-final obstruent and the duration of this obstruent as well as the duration of the whole word. Brewer (2008) interprets her findings as an indication that an increased orthographic length increases the acoustic duration of sounds. Brewer also discusses other spelling characteristics, such as the number of possible variants and the frequency of each variant, that potentially impact segment duration which intricately interacts with orthographic length. These findings are theoretically very relevant for a number of reasons, as was explained above. The effects reported by Brewer support the notion that there is a link between the orthographic form and the phonetic realization of phonemes. As evident from the discussion above, such a link has immediate consequences for theoretical models of speech production, and it is therefore not surprising that Brewer’s (2008) doctoral thesis, even though unpublished, has been cited by a number of later publications, such as Grippando (2021), Schmitz et al. (2021), Alarifi and Tucker (2023), or Hammond (2020).
However as will be described below, the analysis in Brewer (2008) may not stand up to close scrutiny. There appear to be considerable weaknesses in some of the methodological decisions as well as in the quantitative analysis, and consequently, the effects reported in Brewer (2008) are far less robust than one may desire given the potential theoretical impact. Hence, we decided to carry out a modified replication of Brewer’s corpus study. There are at least three strong arguments why a replication is valuable at this point: Firstly, as has been mentioned, the evidence presented by Brewer (2008) has been picked up by a number of influential studies. Yet, it appears to be far from robust. We aimed to overcome the methodological and technical flaws so that our analysis produces clear evidence either in favor or against the described direct influence of spelling on articulation. Secondly, we consider Brewer’s design an ideal test case for investigating such an influence. The hypothesis under examination is very straightforward, namely that more letters equal longer durations. The fact that the study looks at segment durations at the end of words is highly relevant because spelling effects have mostly been shown using reaction times as a response variable as was discussed above. Consequently, it appears plausible to interpret them as related to lexical access and retrieval. In contrast, examining durational differences at the end of words may reveal an effect that is rather located in the articulation or production stages. If we want to further our understanding of the locus of spelling effects, we need studies that help disentangle preparation and execution phase (cf. Brewer 2008: 25). Moreover, the present study uses a corpus of conversational speech, which has the advantage of limiting task specific orthography-effects that have been discussed to potentially arise from an interference of shallow visual processing in experimental tasks (cf. Mitterer and Reinisch 2015). The third argument for a principled replication and expansion of Brewer (2008) is a methodological one: by now, there is a considerable amount of research that investigated the variables affecting segment durations. By incorporating these variables, our analysis can considerably reduce the risk of unknown confounding factors, which in effect will substantiate any effect that remains visible once the confounding factors are accounted for.
For these reasons, we decided to re-examine Brewer’s quantity hypothesis with a broader empirical base employing state of the art statistical analysis in order to substantiate the evidence for spelling effects on speech duration. We suggest that if systematic durational differences can be found for heterographic but seemingly homophonous segments in corpus data, these spelling effects can be assumed to be robust rather than a by-product of confounding morphological, lexical or contextual factors. This would in turn call for a revision of models on how orthographic knowledge is integrated in spoken word production. Ultimately, this adds to our general understanding of how lexical representations are created and maintained.
As a last remark, a thorough re-examination of a theoretically relevant finding also ties in with the general discussion around the replicability of, specifically, small effects in speech production research in linguistics. Given the small magnitude of some of the effects we are often referencing and the ever-growing body of speech data that is available for analysis, it seems crucial to also re-examine and verify previous findings documented in the literature. Replications are meant to test the validity of results. This becomes especially relevant if the results stem from a single study, which is constrained to its specific research methodology and environment as well as the state of the art of this time (cf., e.g., Roettger and Baer-Henney 2019; Strycharczuk 2019). Confirming or rejecting existing results can pave the way forward, either with substantiated findings or with new insights on why results can also vary.
2 Background: a closer look at Brewer (2008)
In the next section, we will take a closer look at Brewer’s (2008) corpus study and critically evaluate some methodological choices that may need modification in our replication.
In her dissertation on “Phonetic Reflexes of Orthographic Characteristics in Lexical Representation”, Brewer (2008) reports systematic durational differences for heterographic but presumably homophonous sounds. Based on existing findings from perception and first tentative evidence from production, the author hypothesizes that the more letters are used to spell a sound, the longer the sound’s duration. This quantity hypothesis is tested with three experiments and a corpus study. The studies examine the acoustic duration of six voiceless obstruents [p t k f s tʃ] in word-final position of monosyllabic English words and pseudowords that are spelled with either one, two or three letters, e.g., /k/ in tic versus /k/ in click versus /k/ in clique. For the corpus study, the main result reported is that there is a positive correlation between the number of letters used in the representation of a word-final obstruent and the duration of this obstruent as well as the duration of the whole word. Additionally, Brewer (2008) looks at the frequency of given spellings and their overall variability, i.e., the number of different realizations for one and the same sound. She finds that, all else being equal, both more frequent spellings and more variable spellings tend to be realized with shorter durations. Other effects reported include a difference in effect size for more frequent words, function versus content words, and simple versus complex words.
The data analyzed in Brewer (2008) was extracted from the Buckeye Corpus of Conversational Speech (Pitt et al. 2007). In addition to the raw speech files, this corpus offers time-aligned written and phonetic transcriptions of 40 interview-style recordings that could be used to extract both phonetic and orthographic information. Brewer (2008: 85–87) extracts all monosyllabic words that contained [p t k f s tʃ] in the coda. Depending on the actual position of the consonant, the words are coded either as non-final (e.g. /s/ in risk) or final (e.g. /k/ in pick). The final data set contains roughly 38,000 tokens of approximately 950 types, although the numbers in an overview and in the model output tables vary around 33,000 tokens.
Table 2 gives an overview of the variables that are included in Brewer’s analysis. Some of the variables have already been investigated in previous studies, while others are novel to Brewer (2008). The primary and secondary response variables in her study are Sound Duration and Word Duration, respectively. The main variables of interest are Consonant Letters, Word Letters, Realization, and Variation, while the remaining variables are included as covariates to account for potential additional influences on the response variables. As addressed in the following, there are a range of problems with several of these variables, which call for a change of the methodology as reported by Brewer (2008) in our replication. In line with the modeling conducted by Brewer (2008), we have divided the variables into response variables, variables of interest, i.e., those variables that the researchers were mainly interested in, and covariates, i.e., variables that were introduced to subtract out effects that do not fall within the scope of the present analysis but are known or suspected to influence the response variables.
Overview of variables of original study, divided into response variables, variables of interest, and covariates.
| Variable name | Explanation | Status |
|---|---|---|
| sound duration | Sound duration calculated from the phonetic alignments provided in the Buckeye Corpus | Primary response variable |
| word duration | Word duration calculated from the phonetic alignments provided in the Buckeye Corpus | Secondary response variable |
| consonant letters | Number of letters used to represent the sound in question: 1, 2 or 3 | Variable of interest |
| word letters | Orthographic length in number of letters used to represent word in question | Variable of interest |
| realization | Raw frequency of a given spelling of sound in question as found in the Buckeye Corpus | Variable of interest |
| variation | The overall number of different spellings of a given sound in English | Variable of interest |
| individual sound | Categorical variable that contains target obstruents themselves | Covariate |
| word phones | Phonological length of a word as the number of phones in its citation form | Covariate |
| finality | Binary contrast between final and non-final members of a consonant cluster in the data set (e.g., /s/ and /k/ in risk) | Covariate |
| function status | Binary contrast between function words and content words | Covariate |
| frequency | Absolute frequency of a given word in the corpus | Covariate |
| morpheme | Binary contrast between monomorphemic and bimorphemic | Covariate |
We see a range of problems with this specific set of variables, which we will briefly address in the following, mainly to motivate the changes we have made to the original design of the study.
The number of letters used to realize the obstruent is at the heart of the quantity hypothesis and is considered the main variable of interest (Brewer 2008: 39): Consonant letters is supposed to be positively correlated with sound duration. In other words, the more letters are used to spell a sound, the longer that sound’s duration, all else being equal. The hypothesis is later extended to include word length by analogy: The more letters are used to spell a sound, the longer the duration of the word this sound occurs in, all else being equal. The variable consonant letters comprises three levels: The original data set contained 29,392 items with consonant letters = 1, 3,320 with consonant letters = 2, and 655 with consonant letters = 3.
As can be seen from Table 3, there is a great imbalance within and across the categories. The only three-letter obstruents included in the corpus data are the letter combinations ‘ght’ and ‘tch’ as in height or itch, respectively. Further, it is unclear how exactly the sounds were aligned to the letters and letter combinations. For instance, two-letter combinations such as ‘ce’ or ‘te’ are included as realizations of voiceless /s/ or /t/ while for the one-letter realization ‘x’ as in six or mix it is unclear whether it was coded as a realization of word-final /s/ or /k/, or both. The documentation does not contain detailed information on the alignment procedure. An additional issue concerns the treatment of ‘ght’, which is considered three-letter-string realizing /t/ in words such as weight or height. Contrary to, for example, Berg (2019: 98), who analyzes ‘ght’ as two graphemes <gh> and <t>, Brewer justifies her alignment decision with results from an “informal pilot”, in which most subjects considered ‘gh’ to realize the consonant instead of the vowel (2008: 39, footnote 2). It is not discussed, however, whether a similar type of pre-screening was applied to other letter strings like ‘ke’ or ‘ce’ in words like take and nice, respectively. By analogy, if the data selection was strictly phonetically driven, letter strings like ‘bt’ in debt or ‘lf’ in words like half should have been included in the data set as they also fit the description of monosyllabic words ending in [p t k f s tʃ] (Brewer 2008: 88). They were, however, not included. Additionally, this operationalization treats letter combinations such as ‘ce’, ‘se’, ‘te’ and ‘ke’ all as identical in realizing voiceless /s/, /t/ or /k/ in words such as since, house, hate or like and, by implication, equates them with combinations like ‘ss’ or ‘ck’ in words like fuss or sick. Such a procedure might not be able to detect more fine-grained differences within the categories. In sum, the particular selection and alignment procedure thus remains unclear and seems neither historically nor methodologically well justified. We will address these issues with a modified data-selection and an additional variable (see Section 3.1).
Overview of categories and instances taken from Brewer (2008: 86).
| Word final sounds | |||
|---|---|---|---|
| Letters | 1 | 2 | 3 |
| Sound | |||
| /p/ | 1,732 | 1 | |
| /t/ | 10,357 | 5 | 550 |
| /k/ | 7,349 | 856 | |
| /f/ | 1,546 | 717 | |
| /s/ | 8,408 | 487 | |
| /tʃ/ | 1,254 | 105 | |
A statistical issue emerges from the inclusion of both frequency and realization in the analysis. According to the documentation, frequency is the raw lemma frequency of a word as extracted from the Buckeye Corpus and realization is indicated as “the raw number of ways that sound was spelled a particular way in the corpus” (Brewer 2008: 97). Operationalized like this, realization is likely to be correlated with frequency: spellings will be more frequent in more frequent words. However, the inclusion of highly correlated variables in a single linear model can lead to undesired collinearity effects, which may make the influence of the involved variables hard to interpret (see, e.g., Tomaschek et al. 2018 for a discussion of collinearity issues). Similarly, the variables word phones and word letters may also be expected to be collinear: longer words will consist of both more phones and more letters. If both are present in the same analysis without appropriate redeeming measures, the results may be uninterpretable. For this reason, we excluded both realization and word letters in the analysis.
The variable finality is included to capture the difference between final and non-final members of a consonant cluster. It is unclear whether a variable like this is sufficient to offset the variation that is introduced to the data set by including both final and non-final members of consonant clusters and there is no discussion of this issue in Brewer (2008) either. We decided to eliminate the non-final consonants from our data set to focus on one stable context, namely word-final segments.
The variable morpheme has a binary value, namely monomorphemic and bimorphemic, and is meant to capture morphological structure. However, this morphological classification is strongly confounded with the type of consonant. As all items are monosyllabic, there are only two types of morphologically complex words that may be included in the data, namely words in which the target consonant is /t/ (if the word contains the past-tense suffix -ed) or /s/ (if the word contains either a plural suffix, a 3rd-Sg suffix, or a genitive marker). As no other bimorphemic monosyllabic type of words may possibly exist, the inclusion of this binary variable conflates the consonant with the structure of the word in a rather unfortunate way. In order to address this, we re-coded and merged the respective variables.
As the preceding discussion shows, there are several issues with the variables that enter the analysis in Brewer (2008). In addition to that, there is a range of factors that are known by now from other studies to influence segment durations which are not covered by the above selection. Among them are effects of the segmental context both preceding and following the segment in question (e.g., Berkovits 1993; Klatt 1976; Oller 1973; Umeda 1977), effects of the syntactic context, such as phrase-final lengthening (e.g., Byrd et al. 2006; Cooper and Danly 1981; Klatt 1976), and effects of informativity and resulting predictability (e.g., Bell et al. 2009; Jurafsky et al. 2001; Pluymaekers et al. 2005; Tang and Shaw 2021; Torreira and Ernestus 2009; Zee et al. 2021).
With regard to the statistical analysis, Brewer (2008) employed various Ordinary Least Square (OLS) regression models using either sound duration or word duration as dependent variables. Since the study was published, there has been steady methodological progress in the type of statistical models that are commonly used in linguistic analyses. Any critical re-analysis or replication can only benefit from employing state-of-the-art models, especially since the analysis in Brewer (2008) faces two serious issues that OLS regression cannot address well (e.g., Baayen et al. 2008; Barr 2013; Gries 2013). First, the data set is highly unbalanced. Not all combinations of values of categorical independent variables are represented in the data with equal frequency and some combinations do not occur at all. This is largely due to the nature of the English lexicon and its spelling system. Some of the examined letter combinations simply cannot be found at the end of a word and there is an overall scarcity of three letter combinations realizing a consonant (Berg 2019: 455–457). While this imbalance cannot be changed, it can be counteracted with the appropriate statistical tools that are better able to handle such unbalanced distributions. The second problem lies in the nature of the data examined and the effect sought after. Corpus data contain a lot of noise that stems from speaker- or item-specific variation. At the same time, we are looking at effects with a potentially very small magnitude. To be able to identify any real effects, even small ones, through the noise, it is crucial that we employ techniques with a maximized control of random variation. Luckily, both these concerns – an imbalanced data set and a high degree of speaker- and item-specific variation – can be addressed by using mixed-effects regression models (cf. Bates et al. 2015).
As the preceding discussion has shown, there are several issues in the corpus analysis from Brewer (2008) due to which the published results are less reliable than one may desire: the handling of several variables is somewhat dubious and ignores intercorrelations, several variables that are by now known to affect segment durations are not considered and the statistical model may not be powerful enough to distinguish a random variation from a meaningful one. Hence, while still largely following the procedure described by Brewer (2008), we adjusted and advanced the procedure in several crucial areas for the current replication. In the following section, we will explain the methods of the present study, starting with what has been modified in comparison to the original.
3 Methods
3.1 Data set and pre-processing
Unfortunately, the original data set was no longer available, and could not be completely reconstructed on the basis of the documentation in Brewer (2008).[1] In order to be able to reanalyze the data from Brewer (2008) we wrote a Python script that followed the steps outlined in the original study as closely as possible except for two slight tweaks. First, we did not include non-final segments in the data set to increase the comparability of the phonological and orthographic environment in question. In other words, for words like risk, we excluded the target fricative that occurred in non-final position. Second, we added two more voiceless obstruents that were not examined in the original study: [ʃ] and [θ].
As was explained above, the data selection in Brewer (2008) was strictly phonetically driven. This means, every token was included that was phonetically realized with a target obstruent in a matching environment, regardless of the underlying phonological form. For this, the PERL script used the time-aligned written and phonetic transcription provided in the Buckeye corpus. We adopted this strategy for the present study. Our script retrieved each instance of a monosyllabic English word with [p, t, k, f, s, tʃ, ʃ, θ] in word-final position. The resulting data set was complemented using data from the English Lexicon Project (ELP, Balota et al. 2007) to obtain additional information on orthographic variables, additional frequency measures, biphone frequencies and biphone conditional probability. Items with missing values, i.e. those tokens from the Buckeye corpus that are not listed in the ELP, were removed from the data set. We excluded tokens with exceptionally short word durations (i.e. shorter than 0.6 s) and target segment durations (i.e. shorter than 0.3 s), or an implausibly slow speech rate (i.e. a value below 1 at this preprocessing stage, which corresponds to less than 1 segment per second). Additionally, we manually removed the item prompts from the data set, as it was the only item with more than six segments and was thus classified as an outlier (see OSF repository at https://osf.io/pt65r/).
The final data-set contains 21,679 monosyllabic tokens (700 types) ending in one of eight voiceless obstruents [p, t, k, f, s, tʃ, ʃ, θ]. Table 4 gives an overview of the extracted obstruents and their respective frequencies.
Overview of sounds and their frequencies in the data set.
| Sound | p | t | k | f | s | tʃ | ʃ | θ |
|---|---|---|---|---|---|---|---|---|
| Freq | 1,281 | 6,812 | 6,041 | 1,747 | 3,542 | 1,028 | 58 | 1,170 |
As was described above, there were some issues with how the word-final sounds are aligned with letters or letter combinations. In order to increase the systematicity of the procedure, we aligned all letters following the representation of the vowel in writing with the consonant sound, starting with the first consonant letter. We wanted to keep the alignment as straightforward as possible and associated all letters that followed the vowel representation with the unit to the right, even if there might not be a 1:1 correspondence of sounds and letters. These units comprise one to three letters and can contain both consonant and vowel letters. This way, we ensure that the procedure remains phonetically driven and avoid any a priori restrictions on which letter combinations should be included in the analysis that could prove theoretically hard to justify, as we have discussed above.
This approach is of course complicated by units that combine vowel and consonant letters or by units that contain silent letters. Silent letters have been discussed as both modifying vowel phonemes (cf. Kessler and Treiman 2001; Venezky 1967, 1999) and being part of the consonantal realization (cf. Treiman et al. 1995). We account for this complication by adding a categorical variable called graphemic complexity, which distinguishes five different types of letter combinations: portmanteau, simple, compound, silent, and discontinuous. We use the term portmanteau to classify a specific case with only one exemplar: the single letter unit ‘x’ in words like six or flex realizes two sounds at the same time. Letter strings with a straightforward correspondence of sound and letter are classified as simple. For [p, t, k, f, s] each there is a consonant letter that realizes these sounds word-finally. Compound strings are defined as combinations of two or three consonant letters, such as ‘ck’ in sick or ‘tch’ in stitch. This category also includes the past-tense ‘ed’ that is a combination of a vowel and a consonant letter. It was classified with the remaining compound strings because there is consensus in the literature to assign these letter strings with the unit to the right, in our case, the word final consonant phoneme (cf. Kessler and Treiman 2001: 598). Combinations of two or three postvocalic consonant letters, which have been discussed as both modifying the phoneme to the left and to the right, were classified as silent (cf. Kessler and Treiman 2001: 598). Examples would be ‘bt’ in debt or ‘lf’ in words like half. Lastly, discontinuous letter strings contain postvocalic consonant letters followed by one or two vowel letters that can also be aligned with both vowel and consonant phoneme, such as ‘se’ in house, ‘ke’ in like or ‘que’ in clique. Table 5 summarizes the classification scheme with examples from the corpus.
Overview of target consonants, final letters, and graphemic complexity.
| Graphemic complexity (5 levels) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| portmanteau | simple | compound | silent | discontinuous | ||||||
| /p/ | p | lap | pe | shape | ||||||
| /t/ | t | cat | tt | butt | bt | debt | te | hate | ||
| ed | walked | ght | light | |||||||
| /k/ | k | oak | ck | suck | lk | walk | ke | like | ||
| ch | tech | que | clique | |||||||
| /f/ | f | if | ff | fluff | lf | half | fe | wife | ||
| gh | tough | |||||||||
| /s/ | x | tux | s | bus | ss | fuss | se | house | ||
| ce | nice | |||||||||
| / tʃ/ | ch | each | che | niche | ||||||
| tch | itch | |||||||||
| /ʃ/ | sh | fish | ||||||||
| /θ/ | th | with | ||||||||
This new typology offers two major advantages. First, it allows us to include all graphemic variants that are encountered in the data set as a realization of the consonant sounds under investigation without restricting ourselves to a specific set of variants. This increases the data set’s balance and allows the data selection to remain phonetically driven. Second, we can now distinguish systematically between the different letter-sound correspondences we find in the data. We presume that a more fine-grained distinction between types of letter-sound correspondences will allow us to probe whether sequences with silent letters or consonant letter doubling can really all be treated the same as was done in the original analysis.
Done this way, the final data set contains 21,679 monosyllabic words (700 types) ending in one of eight different voiceless obstruents [p, t, k, f, s, tʃ, ʃ, θ], realized by the 29 different word-final letters and letter combinations shown in Table 5.
3.2 Variables and covariates
The primary response variable FinalSegDur was obtained from the time-aligned written and phonetic transcriptions available for the Buckeye Corpus (Pitt et al. 2007) using the abovementioned Python script (see OSF repository). The Buckeye corpus text grids are based on a partially automatic phonetic annotation. Pitt et al. (2007) state that measurements for stops are made on the basis of spectrogram and wave-form analysis from the beginning of closure to the end of the burst, and for unreleased stops to the onset of the next segment. For fricatives, the measurements were made from the onset to end of frication. The distribution of the measured durations of each target consonant are given in Figure 1. The violin shapes represent the density function of the respective observations: wider areas indicate data ranges with relatively many observations, while narrow areas indicate a relatively sparsely populated data range.

Distribution of durations by target sounds (in seconds).
As can be seen from Figure 1, the individual sounds show roughly the expected inherent length differences: stops are generally shorter on average than fricatives (cf. Umeda 1977, among others). It can also be seen from the plots that for some of the obstruents, the distribution is more scattered than for others and non-normally distributed. This was addressed in the modeling procedure (see Section 3.3).
Let us now look at the variables of interest and covariates that were included in the present analysis. One of the more fundamental modifications of our analysis in comparison to Brewer (2008) concerns the choice of covariates that are included in the analysis. As noted above, previous studies on the acoustic duration of individual sounds have identified a wide range of important acoustic and non-acoustic factors, including effects of the segmental context, syntactic structure, speech rate, or informativity, not all of which were included in the original analysis. Other changes in the variable selection concerned factors that were considered in Brewer (2008) but which may introduce potentially harmful collinearity effects. Table 6 gives an overview of all variables considered for our analysis. In analogy to Table 2 above, each variable is classified either as a response variable, a variable of interest, or a covariate. The table also indicates whether the variable has been modified or added in comparison to the original analysis. The unmodified variables were gathered in the same way as described in the original analysis (see Table 2). We will briefly discuss the modified variables following the table.
Overview of all variables. Names given in parentheses are the names used for the variable in the original study.
| Name | Explanation | Status | Modified |
|---|---|---|---|
| FinalSegDur (Sound duration) | Sound duration calculated from the phonetic alignments provided in the Buckeye corpus | Primary response variable | No |
| NumLetSeg (Consonant letters) | Number of letters used to represent the sound in question: 1, 2 or 3 | Variable of interest | No |
| NumLetWord (Word letters) | Orthographic length in number of letters of word in question | Variable of interest | No |
| GraphemicComplexity | 5 Levels: portmanteau, simple, compound, discontinuous, and silent | Variable of interest | Yes |
| GraphVariantFreq (Realization) | Raw frequency of a given spelling of sound in question as found in the Buckeye Corpus | Variable of interest | No |
| NumGraphVariants (Variation) | The overall number of different realizations of a given sound in English | Variable of interest | No |
| OLD20 | Numeric predictor that indicates the number of deletions, insertions, and substitutions needed to generate one orthographic string from another (Yarkoni et al. 2008, p. 972) | Variable of interest | Yes |
| AgeOfAcquisition | Age when respective word was acquired | Variable of interest | Yes |
| FinalSeg (Individual Sound) | Categorical variable that depicts the inherent acoustic differences | Covariate | No |
| IsMorph (Morpheme) | Binary variable that distinguishes morphemic and non-morphemic segments | Covariate | Yes |
| SpeechRate | Calculated as words per second in a given frame extending 7 words prior to and following the target word excluding pauses | Covariate | Yes |
| NumSegConsClus | Combinatory variable that indicates number of segments/word and whether or not the word contains a consonant cluster | Covariate | Yes |
| UtterancePos | Utterance-based approximation of the syntactic environment that depicts whether another word, a pause, a turn-taking or a type of verbal interruption follows | Covariate | Yes |
| FreqCOCA (Frequency) | Frequency of the word the target sound occurs in from the spoken part of the Corpus of Contemporary American English (COCA, Davies 2008–) | Covariate | Yes |
| BiphoneCondProb | Frequency of combinations of final and preceding segment | Covariate | Yes |
GraphemicComplexity This categorical variable with the five different levels portmanteau, simple, compound, discontinuous and silent has already been described in detail in Section 3.1. It is a typology of the different letters and letter strings that may be aligned with a target segment in the data set. Together with NumLetSeg and NumLetWord (the number of letters aligned with the target segment and the number of letters in the target word, respectively), this variable concludes the set of spelling-related predictors.
NumGraphVariants This numeric variable depicts the variability in the orthographic form of a sound. We counted the number of variants that were found to realize a given sound in the data. As summarized in Table 4 above, in total, the eight different target obstruents are realized by 29 letters and letter strings. The values range from one (1) variant (/θ/) to six (6) variants (/t, k/); the remaining consonants fall between (two (2) variants for /p, ʃ, tʃ/; five (5) variants for /f, s/). The idea that the degree of variability in sound-to-spelling mapping might have an effect on speech stems from research on perception, where (in)consistency effects are well-documented (e.g., Cassar and Treiman 1997; Ehri and Wilce 1980; Nayernia et al. 2019; Pattamadilok et al. 2007; Taft 2006).
GraphVariantFreq This is another numeric predictor that depicts the raw frequency of each of the graphemic variants, i.e. how often a given variant occurs in the corpus as a realization of the aligned target consonant. This, too, is a predictor that was inherited from Brewer (2008). It presumably taps into similar facilitative effects as variability, which has also been discussed in the context of speech perception. It has been suggested that more frequent variants facilitate lexical access, for example, in phoneme monitoring tasks (Dijkstra et al. 1995), which in turn may influence the acoustic duration of the target consonant.
FinalSeg This categorical variable indicates the target consonant of the associated observation (i.e. /p, t, k, f, θ, s, ʃ, tʃ/) and captures any expected inherent differences as a random effect in the analysis.
IsMorph This is a binary variable that distinguishes between the cases in which segments have morphemic status, namely {past-ed} in taxed or {plural-s} in months, and cases where segments are non-morphemic. It has been repeatedly shown that the morphemic status of a segment can influence its duration (e.g., Plag et al. 2017; Schmitz et al. 2021) and the contrast of morphemic versus non-morphemic segment was already a prevalent variable in the original analysis (Brewer 2008: 113–115).
NumSegConsClus This categorical variable encodes the number of segments that the speaker used to realize the word containing the target segment and the presence or absence of a consonant cluster. It is based on the actual pronunciation of the word as transcribed in the Buckeye Corpus and not on the canonical phonological form. In other words, segment deletions and reductions are taken into account by this variable. However, the number of segments in monosyllabic words is strongly correlated with the presence or absence of a consonant cluster in the coda, i.e. with the presence/absence of other consonants immediately preceding the target consonant. This is due to the phonotactic structure of English syllables: since the maximum number of segments in the onset of a syllable is restricted to three and the nucleus is an obligatory part of the syllable, higher number of segments in a word are increasingly more likely if the coda contains a consonant cluster (Giegerich 1992: Ch. 6). In order to account for this, we decided to combine the number of segments and the presence of a consonant cluster in the coda into the categorical variable NumSegConsClus with the eight levels 2n, 3n, 3y, 4n, 4y, 5n, 5y and 6y (‘n’ indicates that the target consonant is the only coda consonant, ‘y’ indicates that there is at least one additional consonant in the coda, forming a cluster).
UtterancePos There is rich evidence that the presence of a prosodic boundary has a lengthening effect on the preceding linguistic material (cf. Turk and Shattuck-Hufnagel 2007; White et al. 2014). In order to account for this potential prosodic effect, we included the categorical variable UtterancePos with the values ‘final’ for a word at the end of a turn, ‘prepause’ and ‘prevoc’ if it occurred before a pause or a vocalization such as laughter, respectively, or ‘internal’ if the target occurs within a turn. As the Buckeye Corpus does not contain syntactic parsing, and given the often interrupted nature of naturally spoken utterances in general, no further attempt was made to identify syntactic boundaries beyond this classification.
FreqCOCA This variable was included to incorporate the well-documented effect that word durations correlate negatively with their frequency, so that, all other things being equal, more frequent words are produced shorter on average than less frequent words. We obtained the frequency of each word containing a target consonant from the Corpus of Contemporary American English (COCA, Davies 2008–) using the corpus query tool Coquery (Kunter 2017) and the DVD version of COCA spanning years 1998–2012. In order to match the spoken nature of the Buckeye Corpus, the frequency count was restricted to the spoken component of COCA.
OLD20 The Orthographic Levenshtein Distance (OLD, Levenshtein 1966) was extracted from the ELP (Balota et al. 2007). OLD is a numeric predictor that indicates the number of deletions, insertions, and substitutions needed to generate one orthographic string from another. In other words, it is a measure of relatedness. For example, the Levenshtein Distance from BUS to FUSS is 2, reflecting one deletion and one substitution. For the present analysis, we used OLD20 measures, which is defined as the mean OLD from a word to its 20 closest orthographic neighbors (Yarkoni et al. 2008: 972). Neighborhood effects are well documented in lexical processing and there is growing evidence that letter-based similarity intricately interacts with phonological and semantic relatedness (e.g., Grainger et al. 2005).
AgeOfAcquisition As was discussed above, research on second language learning suggests that spelling effects might be more pronounced for learners of language because the learning process usually involves increased exposure to written language, whereas first languages are usually learned on the basis of phonology (Bürki et al. 2012: 451). We included this predictor as a first proxy of whether a later acquisition – and hence a potentially closer connection of phonological form and orthographic information – would show any effects (see also Afonso et al. 2018; Treiman et al. 2022; Qu and Damian 2019, for a discussion of L1–L2 differences and experience as a potential amplifier for spelling effects). Like OLD20, the average age of acquisition was also obtained from the ELP (Balota et al. 2007).
We considered two further variables that addressed the fact that the distribution of the individual variants may be somewhat skewed in that for some consonants, one or more variants occur much more frequently than other variants. The first variable GraphVariantProb expresses the probabilities that a given consonant is realized by the associated graphemic variants by dividing the raw frequencies of each graphemic variant by the overall frequency of the respective sound. The second variable GraphVariantEntropy assesses the informativity of the set of graphemic variants for a target segment. For example, when comparing a consonant for which all graphemic variants have similar probabilities to a consonant that has a small number of graphemic variants that occur much more frequently than the other possible variants, the latter will have a much lower entropy (and hence, will be less informative) than the former. As low informativity appears to correlate with short acoustic durations (e.g., Seyfarth 2014 for word informativity), a similar pattern may be expected for the graphemic variant entropy as well.
Despite the reasonable motivations for these variables, we decided against incorporating these two informativity-related variables into our model after an inspection of the potential predictor variables. The correlation matrix of the numeric predictors using the cor() function from the corrplot package (Wei and Simko 2021) revealed high correlations between LogGraphVariantFreq and GraphVariantProb (r = 0.65) and in particular between GraphVariantProb and GraphVariantEntropy (r = −0.69). Of course, the correlation between probabilities, entropy and frequency is no surprise: if one spelling variant is much more frequent, it will also be much more probable than the others and in turn much more expected, but this increased probability will decrease the entropy for the set of grapheme variants as a whole. However, as discussed in Tomaschek et al. (2018), including variables that contribute very similar information to the model can cause harmful side effects such suppression (the apparent insignificance of a variable that should by all expectations emerge as significant) and enhancement (an exaggerated contribution of a variable that may be expected to have a much smaller effect). In other words, including highly correlated variables as predictors may severely impair the interpretability of the resulting model. Consequently, of the three highly correlated variables, we decided to retain only GraphVariantFreq. It may be reasonably argued based on the correlational structure that much of the information contributed by the other variables is already accounted for by this variable.
Table 7 summarizes all variables included in our analysis.
Overview of all variables included in the initial model.
| Response variable | M | SD | Med | Min | Max |
|---|---|---|---|---|---|
| FinalSegDur (s) | 0.075 | 0.042 | 0.066 | 0.005 | 0.297 |
|
|
|||||
| Numerical predictors | |||||
|
|
|||||
| GraphVariantFreq | 3,422 | 2,778 | 2,217 | 1 | 7,635 |
| OLD20 | 1.097 | 0.305 | 1.550 | 1 | 3.4 |
| AgeOfAcquisition (years) | 4.694 | 1.164 | 4.581 | 2.311 | 14.060 |
| LogFreqCOCA | 11.95 | 1.894 | 12.80 | 2.079 | 14.793 |
| LogBiphoneCondProb | −5.121 | 1.362 | −5.121 | −10.884 | −2.318 |
| SpeechRate (words per second) | 4.726 | 1.302 | 4.611 | 1.015 | 10 |
|
|
|||||
| Categorical predictors | N | % | |||
|
|
|||||
| GraphemicComplexity | |||||
| portmanteau | 172 | 0.8 | |||
| simple | 12,238 | 56.4 | |||
| compound | 3,641 | 16.8 | |||
| silent | 490 | 2.3 | |||
| discontinuous | 5,138 | 23.7 | |||
| FinalSeg | |||||
| p | 1,281 | 5.9 | |||
| t-morph | 403 | 1.9 | |||
| t-nonmorph | 6,409 | 29.6 | |||
| k | 6,041 | 27.9 | |||
| f | 1,747 | 8.1 | |||
| s-morph | 907 | 4.2 | |||
| s-nonmorph | 2,635 | 12.2 | |||
| ch | 1,028 | 4.8 | |||
| sh | 58 | 0.3 | |||
| th | 1,170 | 5.4 | |||
| IsMorph | |||||
| morphemic | 1,393 | 6.4 | |||
| non-morphemic | 20,286 | 93.6 | |||
| NumSegConsClus | |||||
| 2n | 3,792 | 17.5 | |||
| 3n | 11,289 | 52.1 | |||
| 3y | 733 | 3.4 | |||
| 4n | 968 | 4.5 | |||
| 4y | 4,274 | 19.7 | |||
| 5n | 54 | 0.2 | |||
| 5y | 507 | 2.3 | |||
| 6y | 62 | 0.3 | |||
| UtterancePos | |||||
| final | 427 | 2 | |||
| internal | 18,254 | 84.2 | |||
| prepause | 1,606 | 7.4 | |||
| prevoc | 1,392 | 6.4 | |||
3.3 Statistical analysis
Perhaps the most obvious change in the statistical procedure is to replace the OLS regression analyses that were conducted by Brewer (2008) with mixed-effects linear regression analyses, as implemented in the packages lme4 (Bates et al. 2015) and lmerTest (Kuznetsova et al. 2017) for R (R Core Team 2022; RStudio Team 2022). Mixed-effects models offer at least two advantages over OLS models (Bates et al. 2015). First, they distinguish between fixed effects (typically variables which may be expected to contribute systematically to the variance of the dependent variable) and random effects (sometimes termed noise variables that introduce unsystematic variation). A mixed-effects model may account, for example, for speaker-specific or item-specific variation that is introduced to the data because the speakers or items were randomly sampled. Second, mixed effects models are generally better at dealing with unbalanced data sets, i.e., data in which some categories or category combinations are much more frequent than others. These properties are highly useful for the present analysis for two reasons. First, we are looking at fine phonetic detail in speech data, which means the effects will have a potentially small magnitude (Strycharczuk 2019: 8–10). This calls for a maximized control of random variation to be able to identify any real effects. At the same time, due to the nature of the English lexicon and spelling system, not all combinations of values of independent variables are represented in the data with equal frequency (e.g., Berg 2019 on the peculiarities of the English orthographic system). This skewness is best addressed with a mixed-effects model.
The main hypothesis by Brewer (2008) states that spelling properties of a sound will affect the sound’s duration. In order to directly test this hypothesis, we fitted two separate models. The QUANT model closely replicates Brewer’s (2008) analysis in that it tests the influence of the mere number of letters NumLetSeg on the sound’s duration. The QUAL model is the modified version of the analysis which includes our newly added variable GraphemicComplexity that captures not only information on the quantity of letters in the spelling of the word-final sound but also the quality, i.e., the specific make up (see Section 3.1 for a detailed explanation). As such, the QUAL model may be considered a superset model that may detect the influence of different types of orthographic representations beyond merely the number of letters. Apart from the diverging main variable of interest, the remaining structures of the models were exactly the same. Following the principle of maximal models (cf. Barr et al. 2013), we included all explanatory variables and covariates in the initial models that were explained in Table 7. In order to reduce the strong skew of the two frequency measures GraphVariantFreq and FreqCOCA, these variables were log-transformed prior to their inclusion in the model. Similarly, we log-transformed the conditional probability BiphoneCondProb to eliminate the problem that probabilities are intrinsically bound between 0.0 and 1.0, which is a property that linear models cannot fully account for. The numerical variables (including the log-transformed ones) were normalized prior to their inclusion. In addition to the predictor variables, we also included random intercepts for the variables Filename (i.e., the name of the Buckeye recording, which captures speaker-specific variation), Word (which captures word-specific variation), and FollowingSegment (which is derived from the context in the Buckeye Corpus, since the phonological environment may be considered to be another source of variation). We decided to also add a random intercept for FinalSeg (which captures the intrinsic variation of the individual consonant that is represented by the grapheme).
The specification of our initial QUANT and QUAL models are shown in (A) and (B), respectively, using the R formula notation.
(A) QUANT
FinalSegDur ∼ NumLetSeg + GraphVariantFreq + NumGraphVariants + OLD20 + AgeOfAcquisition + SpeechRate + LogFreqCOCA + LogBiphoneCondProb + IsMorph + NumSegConsClus + UtterancePos + (1|Filename) + (1|Word) + (1|FollowingSeg) + (1|FinalSeg)
(B) QUAL
FinalSegDur ∼ GraphemicComplexity + NumLetSeg + GraphVariantFreq + NumGraphVariants + OLD20 + AgeOfAcquisition + SpeechRate + LogFreqCOCA + LogBiphoneCondProb + IsMorph + NumSegConsClus + UtterancePos + (1|Filename) + (1|Word) + (1|FollowingSeg) + (1|FinalSeg)
The decision to treat FinalSeg as a random effect in these models requires some explanation. It is a well-known fact that different speech segments can have notably different segment-intrinsic durations (cf., e.g., Klatt 1976 for a discussion). For American English, Umeda (1977) finds that fricatives have generally longer durations than plosives, with the longest fricatives being /s/, and the shortest plosives being /t/ (but see, e.g., Crystal and House (1988) for differing evidence with regard to segment durations in connected speech). It seems obvious to incorporate this information in a statistical analysis. Indeed, the various models in Brewer (2008) show a significant effect of the target consonant that matches closely the average durations reported in Umeda (1977). However, including a predictor FinalSeg that expresses the identity of a consonant may conflict with other variables in the model. As can be seen in Table 5, there are consonants with only one orthographic variant (/ʃ/ and /θ/). For these specific consonants, the variable that identifies the target consonant reduplicates the information provided by the variable that identifies the orthographic variant. However, a linear model with standard dummy coding will estimate a separate parameter for each additional level of a categorical variable. If two parameters from different categorical variables account for the same variation in the response variable, this will result in a rank-deficient model, in which one of the reduplicated parameters will be discarded from the model. When Brewer (2008: 100) describes that “(…) ’k’ is dropped from the model, mathematically necessary to uniquely identify the added variable VARIATON”, she is probably referring to exactly such a rank-deficient model. We can avoid rank-deficiency if FinalSeg is treated not as a fixed effect but as a random effect instead. Fixed effects estimate one parameter for each variable level in addition to the reference level, while a random effect in a mixed-effects model is not considered a parameter regardless of the number of categorical levels, and random effects are calculated separately from the fixed effects (cf. Baayen et al. 2008). However, the mathematical properties of random effects and in particular, the fact that the adjustments are unbiased and hence have an average of zero mean that the resulting random intercepts for each level of FinalSeg cannot be interpreted as representing the average duration of the corresponding consonant. While we expect to find that plosives have lower random intercepts than fricatives to reflect the shorter intrinsic duration of plosives, the actual numerical values will not correspond directly to the average intrinsic durations. We will return to this point when we look at the results of our final models.
After deciding on the set of predictors and the random effects structure, we applied the following modeling strategy for both QUANT and QUAL model: First, we fitted a linear mixed effects model according to the model specification in (A) and (B), respectively, using the lmer() function from the lme4 package (Bates et al. 2015). An inspection of the QQ plot of the residuals of both models revealed severe deviations of the residuals from a normal distribution. We addressed this potential violation of the linearity assumption by applying power translations to the response variable in each model (Box and Cox 1964; Venables and Ripley 2002; λ QUANT = 0.075, λ QUAL = 0.070). Refitting the initial models with the transformed variables greatly reduced the initial deviations from normality of the residuals. Following the suggestions from Baayen and Milin (2010) for mixed-effects modeling, data points with residuals larger than 2.5 standard deviations were removed from the resulting models (257 and 256 out of 21,679 data points for the QUANT and QUAL models, respectively, i.e., ∼1.2 percent of the total number of observations). Both models were evaluated for the presence of potentially harmful collinearity using the collin.diag() function from the misty package (Yanagida 2023). The VIFs suggested no serious risk of collinearity for both the QUANT and QUAL model. Finally, the fully specified models based on the formulas in (A) and (B) were subdued to a stepwise elimination procedure using the step() function with the Satterthwaite method implemented in the lmerTest package (Kuznetsova et al. 2017). This ensures that all insignificant predictors were removed one after another. The results from the final QUANT and QUAL models are described in the next section.
4 Results
Before we describe which predictors emerged as significant and insignificant in the QUANT and QUAL models, and before we dive into the central differences, let us return to the random effects structure of the models. As noted above, our models included random intercepts for four categorical variables. While the random intercepts for Filename and Word accounted for speaker-specific and word-specific random variation, the random intercepts for FollowingSeg and FinalSeg accounted for random variation in the word durations that were induced by the phonological correlate of the graphemic units under investigation. The role of these variables for QUAL and QUANT model is documented in Table 8. σ 2 corresponds to the variance of the residuals, τ 00 Word, τ 00 Filename, τ 00 FollowingSeg, and τ 00 FinalSeg are the variance of the four random intercepts, and N Word, N Filename, N FollowingSeg, and N FinalSeg are the number of distinct grouping levels used in the random effects. Conditional R 2 expresses the total explained variance of each model including fixed and random effects, while Marginal R 2 is the variance that is explained by fixed effects only (cf. Nakagawa and Schielzeth 2013). The values were computed using the tab_model() function implemented in the sjPlot package (Lüdecke 2022).
Random effects and marginal and conditional R 2 in the QUANT and QUAL model.
| Random effects | QUANT | QUAL |
|---|---|---|
| σ 2 | 0.00055 | 0.00049 |
| τ 00 Word | 0.00004 | 0.00003 |
| τ 00 Filename | 0.00004 | 0.00003 |
| τ 00 FollowingSeg | 0.00007 | 0.00007 |
| τ 00 FinalSeg | 0.00013 | 0.00010 |
| N Filename | 238 | 238 |
| N Word | 700 | 699 |
| N FollowingSeg | 63 | 63 |
| N FinalSeg | 8 | 8 |
| Observations | 21,422 | 21,423 |
| Marginal R 2/Conditional R 2 | 0.128/0.413 | 0.133/0.405 |
What becomes evident from the table is that both models can explain just more than 40 percent of the variance observed in the acoustic durations in the data sets, which means there are presumably several other undetected variables at play here. The bulk of the explained variance can be attributed to the random effects as indicated by the much larger conditional R 2 in comparison to the marginal R 2.
Among the random effects, FinalSeg was the most important one, as indicated by a τ 00 that is larger than that of the other random effects. Given the known property of consonants to differ in their intrinsic average durations, this observation is hardly surprising. Figure 2, which shows the random intercepts for each consonant type in the models, illustrates that the random intercepts agree well with the pattern described by Umeda (1977) for the average duration of American English consonants: other things being equal, fricatives tend to have longer durations than plosives, which is reflected in the generally longer random intercepts for fricative consonants.

Random intercepts for FINALSEG in the QUANT and QUAL models.
Table 9 depicts the Analysis of Variance of the significant fixed effects in the final QUANT and QUAL models. Notably in the QUANT model, the primary spelling-related variable of interest NumSegLet is not retained as a significant predictor. Apparently, the model fit does not benefit from the information of how many letters are used in the orthographic representation of the consonant. In other words, we could not replicate the results by Brewer (2008) that the acoustic duration of a consonant increases with increasing numbers of letters in the corresponding graphemic realization. As the significant effect of GraphemicComplexity in the QUAL model shows, the type of spelling representation does influence the durations, but not in a way that is captured well by simply accounting for the number of letters used to represent a sound.
Analysis of variance (type III with Satterthwaite’s method) for final QUANT and QUAL models.
| QUANT model | Sum Sq | Mean Sq | NumDF | DenDF | F | p | |
|---|---|---|---|---|---|---|---|
| GraphVariantFreq | 0.011 | 0.011 | 1 | 508.5 | 20.7 | <0.000 | *** |
| OLD20 | 0.003 | 0.003 | 1 | 470.9 | 5.7 | 0.017 | * |
| SpeechRate | 0.154 | 0.154 | 1 | 21160.1 | 279.3 | <0.000 | *** |
| FreqCOCA | 0.018 | 0.018 | 1 | 457.6 | 32.4 | <0.000 | *** |
| BiphoneCondProb | 0.003 | 0.003 | 1 | 1736 | 5.3 | 0.022 | * |
| IsMorph | 0.004 | 0.004 | 1 | 486.2 | 6.4 | 0.012 | * |
| NumSegConsClus | 0.032 | 0.005 | 7 | 450.7 | 8.2 | <0.000 | *** |
| UtterancePos | 0.019 | 0.006 | 3 | 42.3 | 11.7 | <0.000 | *** |
|
|
|||||||
| QUAL model | Sum Sq | Mean Sq | NumDF | DenDF | F | p | |
|
|
|||||||
| GraphemicComplexity | 0.011 | 0.003 | 4 | 372.0 | 5.7 | 0.000 | *** |
| GraphVariantFreq | 0.004 | 0.004 | 1 | 421.8 | 7.7 | 0.006 | ** |
| OLD20 | 0.003 | 0.003 | 1 | 492.7 | 5.3 | 0.022 | * |
| SpeechRate | 0.139 | 0.139 | 1 | 21,155.2 | 281.8 | <0.000 | *** |
| FreqCOCA | 0.015 | 0.015 | 1 | 466.8 | 29.6 | 0.000 | *** |
| IsMorph | 0.008 | 0.008 | 1 | 460.2 | 16.4 | 0.000 | *** |
| NumSegConsClus | 0.024 | 0.003 | 7 | 453.4 | 6.9 | 0.000 | *** |
| UtterancePos | 0.017 | 0.006 | 3 | 42.2 | 11.7 | 0.000 | *** |
In both models, the covariates OLD20, SpeechRate, FreqCOCA, IsMorph, NumSegConsClus, and UtterancePos emerge as significant predictors, while BiphoneCondProb is significant only in the QUANT model. The covariates NumGraphVariants and AgeOfAcquisition are not significant in either model.
Despite these differences, the conditional R 2 for the two models in Table 9 above is very similar (QUANT: 0.413; QUAL: 0.405), and the regression coefficients of the predictors that are included in both models are also highly similar (see OSF repository). This suggests that both models are almost equally successful in explaining the observed variation and that they result in very similar estimates for the effects of the shared predictors. As QUAL is the model that retains its spelling-related variable of interest GraphemicComplexity, we will continue focusing on this model for the rest of this section, and refer to QUANT only where it differs from QUAL.
The partial effects of the QUAL model are illustrated in Figure 3. For the vertical axis, the Box–Cox transformed response variable was back-transformed to show the estimated grapheme duration in seconds while the horizontal axis was scaled for numerical variables. For categorical variables, all pairwise contrasts were subjected to post-hoc Tukey tests with Holm-corrected p-values.

Partial effects plot of significant predictors in the final QUAL model (numeric predictors are scaled).
With regard to the primary variable of interest GraphemicComplexity (partial effect plot in the upper-left panel), there are two significant pairwise contrasts: compound strings are significantly longer than portmanteau strings (i.e., single letter units that realize two sounds at the same time, z = 3.449, p = 0.004) and compound strings are also significantly longer than strings containing a silent letter (z = 3.567, p = 0.003). No other pairwise contrast of GraphemicComplexity is significant.
Figure 3 also depicts visualization of the other significant main effects in the QUAL model. To summarize starting from the top-right, all else being equal,
the duration of the target consonant decreases with an increasing frequency of the graphemic variant used to realize the target consonant (GraphVariantFreq)
segments tend to be longer in words with sparser orthographic neighborhoods, i.e., for which the OLD20 is relatively high (cf. Carrasco-Ortiz et al. 2017)
an increased speech rate leads to generally shortened segments (cf., e.g., Crystal and House 1990)
segments tend to be shorter in more frequent words (cf., e.g., Jurafsky et al. 2001)
with regard to the morphemic status of the segments (IsMorph), non-morphemic consonants are generally longer than morphemic consonants. This is partially in line with the body of literature (e.g. Plag et al. 2017; Schmitz et al. 2021; Tomaschek et al. 2021) that reported significantly longer durations for non-morphemic word-final /s/ than for morphemic word-final /s/, and it does not contradict those studies that failed to find corresponding effects for morphemic and non-morphemic /t/: in our model, the difference for morphemic and non-morphemic /s/ may be so strong that the predictor IsMorph still emerges as significant even if morphemic and non-morphemic /t/ and /θ/ are also addressed by the predictor[2]
an increased number of segments of the word leads to generally shorter consonant durations. This effect mostly applies to those cases where the target consonant occurs as the final consonant of a complex coda (NumSegConsClus, cf. Umeda 1977). For the sake of brevity, we will not report all significant pairwise contrasts for this variable (11 out of 28 contrasts, see OSF repository) but just note that in words with five or six segments and a complex coda, the target consonant has a significantly shorter duration than in words with fewer segments or without a complex coda
segments that do not occur at a prosodic boundary (i.e., those with UtterancePos = ‘internal’) are significantly shorter than those before a prosodic phrase boundary (‘prepause’, z = 2.850, p = 0.021), in utterance-final position (‘final’, z = 3.871, p < 0.001), or those before a vocalization (‘prevoc’, z = 3.785, p < 0.001). No other pairwise contrast of UtterancePos is significant (cf., e.g., Klatt 1976, for a detailed discussion of boundary effects on segment durations).
As the partial effects plots in Figure 3 work with the same scaling for the vertical axis, we can use the plots to assess the effect strength of the different predictors. As can be seen from comparing the panels, UtterancePos has the strongest effect by far. All other things being equal, a consonant occurring before any prosodic boundary is about 0.035 s longer on average than a consonant in other prosodic environments. The effect size of the other significant predictors is smaller. For the primary variable of interest, the significant differences between ‘portmanteau’ and ‘compound’ and between ‘portmanteau’ and ‘silent’ amount to about 0.015 s and 0.010 s, respectively.
As noted above, the effect of the significant covariates in the QUAL model are all also significant in the QUANT model; the coefficients of the numerical predictors are highly correlated and the same pairwise contrasts are significant for the categorical predictors. Yet, the primary variable of interest in the QUANT model NumLetSeg, i.e., the number of letters used to realize the phoneme, was not significant. Apart from that, there is one predictor in the QUANT model that was not significant in the QUAL model: log BiphoneCondProb. As the partial effects plot for this covariate in Figure 4 shows, segments that are highly predictable due to the preceding phone are significantly shorter than those that are less predictable. This finding agrees well with the existing literature that reports generally shorter acoustic durations for linguistic units that occur in highly predictable environments and which are therefore less informative than less predictable equivalents (cf., e.g., Kuperman et al. 2007; Seyfarth 2014).

Partial effects plot of log-transformed predictor BiphoneCondProb in the final QUANT model (scaled).
To summarize the role of the graphemic representation of the target consonants, we found a significant effect of three spelling-related predictors: GraphVariantFreq, OLD20, and GraphemicComplexity. Other things being equal, a more frequent graphemic realization leads to a generally shorter consonant duration in both QUANT and QUAL model. Target consonants in words with high OLD20 values, i.e., words with sparse orthographic neighborhoods, tend to have longer acoustic durations. This is in line with existing results on the effects of phonological neighborhood density on acoustic duration (e.g., Gahl et al. 2012), as well as effects of cross-code competition (cf. Grainger et al. 2005: 982ff; Carrasco-Ortiz et al. 2017: 7ff). The pattern of longer durations in sparse neighborhoods is the mirror image of a prevalent pattern of phonetic reduction in highly frequent or predictable words (Gahl et al. 2012: 801). Finally, the nature of the graphemic realization itself significantly affects the consonant duration but only if operationalized as GraphemicComplexity, not as NumLetSeg, the mere number of letters in the graphemic realization. To illustrate this with an example, the present results suggest that, all else being equal, the word-final /s/ in words like tux appears to be shorter than in words like tucks.
All else being equal, there seems to be an effect of spelling on segment duration. However, it does not translate into a mere quantity effect of more letters equal longer durations. Instead, we see a pattern that suggests the actual make up of letter combinations plays a role. Additionally, the covariates in our analysis all replicate known effects, which strongly supports the validity of the present results.
5 Discussion
The present study fails to replicate a straightforward quantity effect as it was reported by Brewer (2008). Since its original publication, this quantity effect, despite being rather tenuous, has been and continues to be referenced by numerous studies. As was discussed, the underlying assumption is as straightforward as it is compelling – more letters, longer durations – which is probably why it still gains traction despite the lack of any true confirmation (see, e.g., Alarifi and Tucker 2023; Grippando 2021; Hammond 2020). The results of the present study, however, suggests that the effect of letter quantity is either of a very small magnitude or nonexistent. Another possibility could be that quantity alone does not capture the whole of the complex relation between sounds and their written form. There is no direct correspondence between letters and phonemes in English (cf. Berg 2019: 22–27). Consequently, an effect that presupposes such a direct correspondence – which a pure quantity effect necessarily does – might just draw on a flawed premise.
So, is the original effect real or just an artifact? As with any null result, lack of evidence does not necessarily mean lack of effect. Yet, the elusiveness of the reported effects in the original study could not be remedied by more powerful analytical tools. Instead, the effect appears to be canceled by a greater statistical control of noise variables and covariates. As was mentioned in the beginning, the inability to replicate a fairly often-cited effect, however, emphasizes the general importance of replicability of published results. As a scientific community, we can only build on knowledge if we constantly keep check of the reproducibility of our results, especially if we look at effects of small magnitudes (cf. Roettger and Baer-Henney 2019; Strycharczuk 2019). Our re-analysis with a modified variable of interest GraphemicComplexity does not dismiss any of the previous work in this direction but instead builds on it and extends the scope. As suspected, our results suggest that the letters used to represent a sound indeed appear to have an influence on its duration, just not in the straightforward way originally suggested by Brewer (2008) or – in part – by Grippando (2021). Instead, speakers appear to realize those consonants with longer durations that have unambiguously aligned direct counterparts and those with shorter durations that do not have a visible surface form or that have more complex graphemic realizations.
Admittedly, our evidence, too, is not overwhelming. One of the main pitfalls might be that we are also looking at an idiosyncrasy of the letter x. This letter is a lexicalized form of a consonant cluster, which is almost unique in its near perfect consistency. Another rightful criticism would be that the evidence we are seeking with this study is necessarily derived from a between-items comparison, which is known to be at risk of overlooking confounding factors. There is one thing we can hold against this to support the generalizability of our results, though. Our data replicated so many well-known effects of segment duration, which can be seen as reinforcing the validity of the overall results and allows us to tentatively conclude that the effect we find is truly located in the word final segment and its spelling.
We do not argue against the possibility that the letter x indeed has a special status and that the effects we see in the data are highlighted by its properties. We maintain, however, that these effects still allow for a valuable insight into the nature of spelling effects that we can generalize from. In the following, we want to briefly discuss some potential explanations, highlight their relevance and implications for existing theories and give an outlook on how these could be further tested and implemented.
5.1 Explanations and theoretical implications: consistency, visibility, or discriminative power?
As was outlined in the Introduction, one of the best-documented effects of spelling arises from (in)consistencies in the spelling-to-sound or sound-to-spelling mapping. Across different languages and a range of paradigms, it has been repeatedly shown that consistent mapping of sound and spelling can facilitate response times while, conversely, diverging pronunciation or spelling will slow speakers down (e.g., Pattamadilok et al. 2007; Perre and Ziegler 2008; Perre et al. 2009; Seidenberg and Tanenhaus 1979; Taft 2011; Taft et al. 2008; Ziegler and Ferrand 1998; Ziegler et al. 1997a, 1997b). Now, as was already mentioned, the letter x exhibits a near perfect consistency. In other words, there is a high likelihood that x in word-final position is going to be realized as /ks/ in speech. This could give speakers a processing advantage, which would help to explain why the /s/ in fox is shorter than the /s/ in toss.
At the same time, another line of evidence from perception studies suggests that the presence of corresponding letters raises speakers’ awareness for sounds, which presumably translates into differences in categorization. Such an effect has been discussed early on, for example, by Ehri and Wilce (1980), who show that literate speakers of English tend to identify one more sound in words like pitch with a tch spelling compared to rich with a ch spelling. Comparable effects were reported by, for example, Dijkstra et al. (1995), or Hallé et al. (2000), Bürki et al. (2012), which were discussed above. Althaus et al. (2022) similarly argue for a reciprocal influence of phonological and orthographic representations, stressing that phonological alternations that are orthographically represented have processing advantages as their discriminative power over the competitor is increased. Althaus et al. (2022: 222) further argue that spelling has the ability to boost discrimination, which they view as an acquired processing advantage of literate speakers. This view would also readily accommodate other findings on heterographic versus homographic homophones (Hellwig and Indefrey 2017), effects of pseudohomophones (Taft 2006), pseudohomographs (Taft et al. 2008), unrepresented phonemes (Nayernia et al. 2019), and cross-modal neighborhood effects (e.g., Carrasco-Ortiz et al. 2017; Grainger et al. 2005). Taking these discussions into account, the shorter duration of /s/ in words like fox compared to toss could arise from the lack of a direct orthographic representation for /s/ in fox or else the unambiguous presence of such a representation in words like toss. More generally, speakers presumably conceptualize sounds (partially) represented by silent letters or with an unclear sound-letter-alignment differently from sounds represented by just one letter or a clearly aligned letter combination.
To complicate matters even further, there is a third possible source of influence. In English, monosyllabic words that end in the letter sequences cs, ks or cks as representations of /ks/ are always morphologically complex, with the s representing either the plural morpheme, third person singular or a non-standard written form of clitic-s. Monosyllabic words ending in x, on the other hand, are not only fewer in numbers but also all simplex words, such as fox or box. Similarly, monosyllabic English words ending in a double s are most likely simplex words, such as toss or fuss. In other words, word-final /s/ in these words is always non-morphemic. Now, a growing body of studies on the phonetics of word-final /s/ in English suggests that – as a category – non-morphemic /s/ is actually the longest in comparison to third person singular, plural and/or clitic /s/ (cf. Plag et al. 2017; Schmitz et al. 2021; Tomaschek et al. 2021). Do these results contradict each other? Not necessarily. If morphological structure is reflected in the phonetics of words and a words’ spelling also exerts an influence, it might very well be that the two are connected. Strong effects could arise where spelling not only consistently represents sound but additionally reflects morphological structure, as is the case with x. This would render x ‘the odd one out’ in the category of non-morphemic /s/, but, given the small size of the set of words ending in x, it is no wonder that these would not have an overriding effect on the presumably otherwise longer non-morphemic /s/.
To summarize, the shorter durations of segments represented by a single letter or a doubling of unequivocally aligned letters could be a reflex of three presumably interconnected factors: the differences in sound-to-spelling consistency, the differences in sound categorization for different orthographic representations, and the potentially boosted discriminatory effect of consistent spellings and morphological status. Our interpretation is that, taken together, these explanations all support the view that literate language users utilize a bi- or even multimodal processing architecture. In other words, letters matter, but we do not see an isolated effect of spelling. More likely, we are observing an intricate patterning of reciprocal influence of language use in all modalities (cf. Bürki et al. 2012; Qu and Damian 2019). Such a view could also help to explain why Brewer (2008) did not find any quantity effects for pseudowords in both a reading task and a learning task, which she conducted prior to the corpus analysis replicated for the present study. The lack of results for these pseudowords is discussed as evidence for an underlying effect of lexicality. At the same time, the finding is used as a counter argument to the claim that the lengthening effect is merely an effect of shallow visual processing and task-specific to reading. If this were the case, speakers would be expected to treat words and pseudowords equally and distinguish between different spellings with distinct or lengthened articulation (Brewer 2008: 117). We would further argue that the lack of effect for pseudowords supports the view that it is only in connection with an underlying lexical representation and an acquired link between spoken and written language processing, that the orthographic representation can exert discriminative power, which could manifest, for example, in longer pronunciation durations.
How could such a link between spoken and written – and presumably also signed (cf. Cohen-Goldberg 2017) – language in both production and perception be explained within existing frameworks and theories? To date, there is a tug-of-war between the two main accounts that have been put forward (Brewer 2008; Grippando 2021; Perre et al. 2009). As was outlined in the introduction, the on-line activation account assumes that speakers co-activate orthographic information during speech processing and production and that any of the observed effects arise from active cross-talk between the two modalities during lexical retrieval (cf. Grainger and Ferrand 1996; Ziegler and Ferrand 1998). The off-line restructuring account assumes that the acquisition of orthographic information permanently alters the abstract representations and turns them into phonographic representations, enriched with spelling information (Pattamadilok et al. 2014). Beyond that, it was also suggested that spelling might be utilized by speakers in specific contexts where it warrants an advantage to discriminate between sounds or words (e.g., Mitterer and Reinisch 2015). Following Brewer (2008: 25), the present study was designed to seek evidence that furthers our understanding of the locus of spelling effects. What have we gained with regard to this? The fact that we find spelling effects in corpus data can be seen as evidence in favor of a view that spelling effects are not simply task-specific (see also, e.g., Pattamadilok et al. 2014 for a more detailed discussion of task-specificity in these effects). With regard to the exact locus of spelling effects, our conclusion is twofold. We do not find a straight-forward quantity effect of number of letters on speech duration, which can be seen as evidence against an on-line activation. Instead, we find a pattern of complexity effects that seem to be in line with the overall rather inconsistent results previously found. Our interpretation would be that the presumed interaction of spelling with both pronunciation duration and morphological status can be seen as evidence in favor of the off-line restructuring account (cf. Aronoff et al. 2016; Gahl and Plag 2022; Sandra 2010 for a discussion of how morphology and spelling might interact). We would further argue that language users seem to draw on spelling as an additional component of language units, not altering existing fixed (and minimal) abstract representations but instead dynamically evolving them as they acquire and use language in all its modalities and capacities. The result might be described as morpho-phonographic representations.
Given the complexity and dynamicity of these integrations, the theoretical models that would best accommodate such a view are probably exemplar-based models (e.g., Bybee 2006; Pierrehumbert 2002). Such models presume that language use is shaped by a collection of memory tokens or exemplars that are used to “recognize[…] inputs and generate[…] outputs by analogical evaluation across a lexicon of distinct memory traces of remembered tokens of speech” (Gahl and Yu 2006: 213). To the best of our knowledge, however, these models currently do not integrate linguistic experience in all modalities but instead restrict experience to speech, as is illustrated by the above quote. Yet, as Bürki et al. (2012: 463) emphasize, the impact of exposure to written language is not to be underestimated (cf. Bürki et al. 2019). Maybe it is high time that we take this potency into account.
Finally, the present study has several limitations, some of which are inherited from the original study design, while others are inherent in data from spontaneous speech. For example, the choice of word-final consonant in monosyllabic words does not come without issues and it seems that – besides it being a stable environment – the degree of variation introduced by this configuration is hard to control. However, as we are trying to replicate the effects as claimed to be present in the data, we, to some extent, have to live with this weakness in the design. To address this concern at least to some degree, we introduced our variable NumSegConsClus, that captures some of the duration variation in connected speech. Yet, it might be that the target consonants were elided and thus the measurements would not be fully reliable (cf., e.g., Johnson 2004). One way of addressing this in future studies would be, for example, to focus on one specific consonant (and its orthographic realizations) in different phonological environments and then look at more fine-grained parameters of phonetic realization, maybe also beyond the segmental level. This would also address the rightful criticism that individual sounds can always be clearly demarcated and segmented.
6 Conclusions
We replicated a seminal study that tested whether presumably homophonous segments represented by heterographic spellings display any systematic durational differences. We fail to replicate a straightforward quantity effect of the number of letters on segment (and word) durations. Instead, we find that the letters used to represent a sound indeed appear to have an influence on its duration, just not in the way originally suggested by Warner and colleagues (2006), Brewer (2008) or Grippando (2021). Our findings presumably support an offline restructuring account for spelling effects while simultaneously highlighting the need for more research that helps disentangle the intricacy of spelling effects. Presumably, there is no one mechanism that integrates orthographic information into speech perception and production, but instead we have to carefully distinguish between on-line and off-line sources of potential effects (cf. Bürki et al. 2012).
The results of our study may bring up further questions. Does the acquisition of spelling information really boost discrimination? Do visibility effects occur for both vowels and consonants in the same way? Do speakers conceptualize visible sounds differently? Some of these questions have been addressed before, but they need more attention by future research. Paradigms that are capable of testing this would naturally include comparing literate and non-literate or preliterate populations (Grippando 2021: 338). Pseudowords and word-learning paradigms might be equally suitable to gain insights into how exactly spelling is integrated into the language processing system.
Funding source: Deutsche Forschungsgemeinschaft
Award Identifier / Grant number: FOR2373
Award Identifier / Grant number: MU 4366/1-1
Acknowledgements
We would like to thank the two anonymous reviewers for their comments, which helped to substantially improve the manuscript. We are also grateful to the members of the DFG Research Unit FOR2373 and the audience of the Morphology in Production and Perception conference 2022 for their valuable feedback and comments.
-
Research funding: This research was funded by the DFG Research Unit FOR2373 ‘Spoken Morphology’, grant MU 4366/1-1 which we gratefully acknowledge.
-
Author contribution: Both authors fulfill the ICMJE Criteria for Authorship. Julia Muschalik (corresponding author) conceived of the presented idea and, together with Gero Kunter, performed the statistical analysis. Both authors wrote and proofread the manuscript, provided critical feedback and helped shape the research, analysis, and manuscript.
-
Statement of ethics: The Ethics Committee of the Heinrich-Heine-University Düsseldorf waived the need for ethics approval and the need to obtain consent for the analysis and publication of the retrospectively obtained and anonymized, freely available speech data from The Buckeye Speech Corpus (Pitt et al. 2007) for this non-interventional study. No further personal data was collected and/or analyzed.
-
Conflict of interest: The authors have no conflicts of interest to declare.
-
Data availability: The data that support the findings of this study are openly available at https://osf.io/pt65r/.
References
Afonso, Olivia, Paz Suárez-Coalla, Nagore González-Martín & Fernando Cuetos. 2018. The impact of word frequency on peripheral processes during handwriting: A matter of age. Quarterly Journal of Experimental Psychology 71(3). 695–703. https://doi.org/10.1080/17470218.2016.1275713.Search in Google Scholar
Alarifi, Abdulaziz & Benjamin V. Tucker. 2023. Orthographic influence in the distributional learning of non-native speech sounds. Second Language Research. https://doi.org/10.1177/02676583231191611.Search in Google Scholar
Alario, François Xavier, Laetitia Perre, Caroline Castel & Johannes Ziegler. 2007. The role of orthography in speech production revisited. Cognition 102(April). 464–475. https://doi.org/10.1016/j.cognition.2006.02.002.Search in Google Scholar
Althaus, Nadja, Sandra Kotzor, Swetlana Schuster & Aditi Lahiri. 2022. Distinct orthography boosts morphophonological discrimination: Vowel raising in Bengali verb inflections. Cognition 222. 104963. https://doi.org/10.1016/j.cognition.2021.104963.Search in Google Scholar
Aronoff, Mark, Kristian Berg & Vera Heyer. 2016. Some implications of English spelling for morphological processing. The Mental Lexicon 11(July). https://doi.org/10.1075/ml.11.2.01aro.Search in Google Scholar
Baayen, R. Harald & Petar Milin. 2010. Analyzing reaction times. International Journal of Psychological Research 3(2). 12–28. https://doi.org/10.21500/20112084.807.Search in Google Scholar
Baayen, R. Harald, Doug J. Davidson & Douglas M. Bates. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59(4). 390–412. https://doi.org/10.1016/j.jml.2007.12.005.Search in Google Scholar
Balota, David A., Melvtn J. Yap, Michael J. Cortese, Keith A. Hutchison, Brett Kessler, Bjorn Loftis, James H. Neely, Douglas L. Nelson, Greg B. Simpson & Rebecca Treiman. 2007. The English lexicon project. Behavior Research Methods 39(3). 445–459. https://doi.org/10.3758/BF03193014.Search in Google Scholar
Barr, Dale. 2013. Random effects structure for testing interactions in linear mixed-effects models. Frontiers in Psychology 4(June). https://doi.org/10.3389/fpsyg.2013.00328.Search in Google Scholar
Barr, Dale, Roger Levy, Christoph Scheepers & Harry Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68(January). 255–278. https://doi.org/10.1016/j.jml.2012.11.001.Search in Google Scholar
Bates, Douglas, Martin Mächler, Benjamin M. Bolker & Steven C. Walker. 2015. Fitting linear mixed-effects models using Lme4. Journal of Statistical Software 67(1). https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar
Bell, Alan, Jason Brenier, Michelle Gregory, Cynthia Girand & Dan Jurafsky. 2009. Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language 60(January). 92–111. https://doi.org/10.1016/j.jml.2008.06.003.Search in Google Scholar
Berg, Kristian. 2019. Die Graphematik der Morpheme im Deutschen und Englischen. Berlin, Boston: De Gruyter.10.1515/9783110604856Search in Google Scholar
Berkovits, Rochele. 1993. Utterance-final lengthening and the duration of final-stop closures. Journal of Phonetics 21(4). 479–489. https://doi.org/10.1016/s0095-4470(19)30231-1.Search in Google Scholar
Box, George Edward Pelham & David Roxbee Cox. 1964. An analysis of transformations. Journal of the Royal Statistical Society 26(2). 211–252. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x.Search in Google Scholar
Brewer, Jordan. 2008. Phonetic reflexes of orthographic characteristics in lexical representation. Tucson, AZ: University of Arizona PhD thesis.Search in Google Scholar
Bürki, Audrey, Elsa Spinelli & M. Gareth Gaskell. 2012. A written word is worth a thousand spoken words: The influence of spelling on spoken-word production. Journal of Memory and Language 67(4). 449–467. https://doi.org/10.1016/j.jml.2012.08.001.Search in Google Scholar
Bürki, Audrey, Pauline Welby, Mélanie Clément & Elsa Spinelli. 2019. Orthography and second language word learning: Moving beyond ‘friend or foe?’ The Journal of the Acoustical Society of America 145(4). 265–271. https://doi.org/10.1121/1.5094923.Search in Google Scholar
Bybee, Joan. 2006. From Usage to Grammar: The Mind’s Response to Repetition. Language 82(4). 711–733.10.1353/lan.2006.0186Search in Google Scholar
Byrd, Dani, Jelena Krivokapić & Sungbok Lee. 2006. How far, how long: On the temporal scope of prosodic boundary effects. The Journal of the Acoustical Society of America 120(3). 1589–1599. https://doi.org/10.1121/1.2217135.Search in Google Scholar
Caramazza, Alfonso. 1997. How many levels of processing are there in lexical access? Cognitive Neuropsychology 14(1). 177–208. https://doi.org/10.1080/026432997381664.Search in Google Scholar
Carrasco-Ortiz, Haydee, Katherine Midgley, Jonathan Grainger & Phillip Holcomb. 2017. Interactions in the neighborhood: Effects of orthographic and phonological neighbors on N400 amplitude. Journal of Neurolinguistics 41(February). 1–10. https://doi.org/10.1016/j.jneuroling.2016.06.007.Search in Google Scholar
Cassar, Marie & Rebecca Treiman. 1997. The beginnings of orthographic knowledge: Children’s knowledge of double letters in words. Journal of Educational Psychology 89(4). 631–644. https://doi.org/10.1037/0022-0663.89.4.631.Search in Google Scholar
Chen, Wei-Fan, Pei-Chun Chao, Ya-Ning Chang, Chun-Hsien Hsu & Chia-Ying Lee. 2016. Effects of orthographic consistency and homophone density on Chinese spoken word recognition. Brain and Language 157–158(June). 51–62. https://doi.org/10.1016/j.bandl.2016.04.005.Search in Google Scholar
Cohen-Goldberg, Ariel M. 2017. Informative differences: An argument for a comparative approach to written, spoken, and signed language research. In Sylvie Plane, Charles Bazerman, Fabienne Rondelli, Christiane Donahue, Arthur N. Applebee, Catherine Boré, Paula Carlino, Martine Marquilló Larruy, Paul Rogers & David Russell (eds.), Research on writing: Multiple perspectives, 457–476. Fort Collins: The WAC Clearinghouse.10.37514/INT-B.2017.0919.2.25Search in Google Scholar
Cooper, William E. & Martha Danly. 1981. Segmental and temporal aspects of utterance-final lengthening. Phonetica 38(1–3). 106–115. https://doi.org/10.1159/000260017.Search in Google Scholar
Crystal, Thomas H. & Arthur S. House. 1988. Segmental durations in connected‐speech signals: Current results. The Journal of the Acoustical Society of America 83(4). 1553–1573. https://doi.org/10.1121/1.395911.Search in Google Scholar
Crystal, Thomas H. & Arthur S. House. 1990. Articulation rate and the duration of syllables and stress groups in connected speech. The Journal of the Acoustical Society of America 88(1). 101–112. https://doi.org/10.1121/1.399955.Search in Google Scholar
Damian, Markus & Jeff Bowers. 2003. Effects of orthography on speech production in a form-preparation paradigm. Journal of Memory and Language 49(July). 119–132. https://doi.org/10.1016/S0749-596X(03)00008-1.Search in Google Scholar
Damian, Markus F. & Jeffrey S. Bowers. 2009. Assessing the role of orthography in speech perception and production: Evidence from picture-word interference tasks. European Journal of Cognitive Psychology 21(4). 581–598. https://doi.org/10.1080/09541440801896007.Search in Google Scholar
Davies, Mark. 2008–. The corpus of contemporary American English (COCA): 560 million words, 1990-present. Available at: https://www.english-corpora.org/coca/.Search in Google Scholar
Dijkstra, Ton, Ardi Roelofs & Steffen Fieuws. 1995. Orthographic effects on phoneme monitoring. Canadian Journal of Experimental Psychology = Revue Canadienne de Psychologie Expérimentale 49(July). 264–271. https://doi.org/10.1037/1196-1961.49.2.264.Search in Google Scholar
Drager, Katie K. 2011. Sociophonetic variation and the lemma. Journal of Phonetics 39(4). 694–707. https://doi.org/10.1016/j.wocn.2011.08.005.Search in Google Scholar
Ehri, Linnea C. & Lee S. Wilce. 1980. The influence of orthography on readers’ conceptualization of the phonemic structure of words. Applied Psycholinguistics 1(4). 371–385. https://doi.org/10.1017/S0142716400009802.Search in Google Scholar
Gahl, Susanne & Ingo Plag. 2022. Spelling errors in English derivational suffixes reflect morphological boundary strength. The Mental Lexicon 14(1). 1–36. https://doi.org/10.1075/ml.19002.gah.Search in Google Scholar
Gahl, Susanne, Yao Yao & Keith Johnson. 2012. Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language 66(4). 789–806. https://doi.org/10.1016/J.JML.2011.11.006.Search in Google Scholar
Gahl, Susanne & Alan C. L. Yu. 2006. Introduction to the special issue on exemplar-based models in linguistics. Linguistic Review 23(3). 213–216. https://doi.org/10.1515/TLR.2006.007.Search in Google Scholar
Gahl, Susanne. 2008. Time and thyme are not homophones: The effect of lemma frequency on word durations in spontaneous speech. Language 84(September). 474–496. https://doi.org/10.1353/lan.0.0035.Search in Google Scholar
Giegerich, Heinz J. 1992. English phonology: An introduction. Cambridge: CUP.10.1017/CBO9781139166126Search in Google Scholar
Grainger, Jonathan & Ludovic Ferrand. 1996. Masked orthographic and phonological priming in visual word recognition and naming: Cross-task comparisons. Journal of Memory and Language 35(5). 623–647. https://doi.org/10.1006/jmla.1996.0033.Search in Google Scholar
Grainger, Jonathan, Mathilde Muneaux, Fernand Farioli & Johannes C. Ziegler. 2005. Effects of phonological and orthographic neighbourhood density interact in visual word recognition. Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology 58(6). 981–998. https://doi.org/10.1080/02724980443000386.Search in Google Scholar
Gries, Stefan T. 2013. Statistics for linguistics with R: A practical introduction. Berlin: De Gruyter.10.1515/9783110307474Search in Google Scholar
Grippando, Shannon. 2021. Japanese orthographic complexity and speech duration in a reading task. Phonetica 78(4). 317–344. https://doi.org/10.1515/phon-2021-2008.Search in Google Scholar
Hallé, Pierre A., Céline Chéreau & Juan Segui. 2000. Where is the /b/ in ‘absurde’ [apsyrd]? It is in French listeners’ minds. Journal of Memory and Language 43(4). 618–639. https://doi.org/10.1006/jmla.2000.2718.Search in Google Scholar
Hammond, Michael. 2020. Prosodic phonology. In Bas Aarts, April McMahon & Lars Hinrichs (eds.), The Handbook of English linguistics, 2nd edn., 365–384. Hoboken, NJ: Wiley Blackwell.10.1002/9781119540618.ch19Search in Google Scholar
Han, Jeong-Im & Tae-Hwan Choi. 2016. The influence of spelling on the production and storage of words with allophonic variants of /h/ in Korean. Applied Psycholinguistics 37(4). 757–780. https://doi.org/10.1017/S0142716415000235.Search in Google Scholar
Hellwig, Frauke & Peter Indefrey. 2017. Homophones and their representations in the mental lexicon. Paper presented at the Architecture and Mechanisms of Language Processing, 7–9 September 2017, Lancaster, UK.Search in Google Scholar
Johnson, Keith. 2004. Massive reduction in conversational American English. In Kiyoko Yoneyama & Kikuo Maekawa (eds.), Spontaneous speech: Data and analysis. Proceedings of the 1st session of the 10th international symposium, 29–54. Tokyo: The National International Institute for Japanese Language.Search in Google Scholar
Jurafsky, Daniel, Alan Bell, Michelle Gregory & William D. Raymond. 2001. Probabilistic relations between words: Evidence from reduction in lexical production. In Joan Bybee & Paul Hopper (eds.), Frequency and the emergence of linguistic structure, 229–254. Amsterdam: John Benjamins.10.1075/tsl.45.13jurSearch in Google Scholar
Kessler, Brett & Rebecca Treiman. 2001. Relationships between sounds and letters in English monosyllables. Journal of Memory and Language 44(4). 592–617. https://doi.org/10.1006/JMLA.2000.2745.Search in Google Scholar
Klatt, Dennis H. 1976. Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the Acoustical Society of America 59(5). 1208–1221. https://doi.org/10.1121/1.380986.Search in Google Scholar
Kunter, Gero. 2017. Coquery: A free corpus tool. version 0.10.0. Available at: https://www.coquery.org.Search in Google Scholar
Kuperman, Victor, Mark Pluymaekers, Mirjam Ernestus & Harald Baayen. 2007. Morphological predictability and acoustic duration of interfixes in Dutch compounds. The Journal of the Acoustical Society of America 121(4). 2261–2271. https://doi.org/10.1121/1.2537393.Search in Google Scholar
Kuznetsova, Alexandra, Per B. Brockhoff & Rune H. B. Christensen. 2017. LmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82(13). 1–26. https://doi.org/10.18637/JSS.V082.I13.Search in Google Scholar
Levelt, Willem J. M., Ardi Roelofs & Antje S. Meyer. 1999. A theory of lexical access in speech production. Behavioural and Brain Sciences 22. 1–75. https://doi.org/10.1017/s0140525x99001776.Search in Google Scholar
Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics – Doklady 10(8). 845–848.Search in Google Scholar
Linell, Per. 2005. The written language bias in linguistics. Its nature, origins and transformation. London and New York: Routledge.10.4324/9780203342763Search in Google Scholar
Logan, Gordon D. & Matthew J. C. Crump. 2011. Hierarchical control of cognitive processes: The case for skilled typewriting. Psychology of Learning and Motivation – Advances in Research and Theory 54. 1–27. https://doi.org/10.1016/B978-0-12-385527-5.00001-2.Search in Google Scholar
Luce, Paul A. & David B. Pisoni. 1998. Recognizing spoken words: The neighborhood activation model. Ear and Hearing 19(9). 1–36. https://doi.org/10.1097/00003446-199802000-00001.Search in Google Scholar
Lüdecke, Daniel. 2022. sjPlot: Data Visualization for Statistics in social science_. R package version 2.8.12. Available at: https://CRAN.R-project.org/package=sjPlot.Search in Google Scholar
McClelland, James L. & Jeffrey L. Elman. 1986. The TRACE model of speech perception. Cognitive Psychology 18(1). 1–86. https://doi.org/10.1016/0010-0285(86)90015-0.Search in Google Scholar
Miceli, Gabriele & Rita Capasso. 2006. Spelling and dysgraphia. Cognitive Neuropsychology 23(1). 110–134. https://doi.org/10.1080/02643290500202730.Search in Google Scholar
Mitterer, Holger & Eva Reinisch. 2015. Letters don’t matter: No effect of orthography on the perception of conversational speech. Journal of Memory and Language 85. 116–134. https://doi.org/10.1016/J.JML.2015.08.005.Search in Google Scholar
Nakagawa, Shinichi & Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2). 133–142. https://doi.org/10.1111/j.2041-210x.2012.00261.x.Search in Google Scholar
Nayernia, Leila, Ruben van de Vijver & Peter Indefrey. 2019. The influence of orthography on phonemic knowledge: An experimental investigation on German and Persian. Journal of Psycholinguistic Research 48(6). 1391–1406. https://doi.org/10.1007/s10936-019-09664-9.Search in Google Scholar
Oller, D. Kimbrough. 1973. The effect of position in utterance on speech segment duration in English. The Journal of the Acoustical Society of America 54. https://doi.org/10.1121/1.1914393.Search in Google Scholar
Pattamadilok, Chotiga, José Morais, Paulo Ventura & Régine Kolinsky. 2007. The locus of the orthographic consistency effect in auditory word recognition: Further evidence from French. Language and Cognitive Processes 22(5). 700–726. https://doi.org/10.1080/01690960601049628.Search in Google Scholar
Pattamadilok, Chotiga, José Morais, Cécile Colin & Régine Kolinsky. 2014. Unattentive speech processing is influenced by orthographic knowledge: Evidence from mismatch negativity. Brain and Language 137. 103–111. https://doi.org/10.1016/j.bandl.2014.08.005.Search in Google Scholar
Perre, Laetitia & Johannes C. Ziegler. 2008. On-line activation of orthography in spoken word recognition. Brain Research 1188(1). 132–138. https://doi.org/10.1016/j.brainres.2007.10.084.Search in Google Scholar
Perre, Laetitia, Chotiga Pattamadilok, Marie Montant & Johannes C. Ziegler. 2009. Orthographic effects in spoken language: On-line activation or phonological restructuring? Brain Research 1275. 73–80. https://doi.org/10.1016/j.brainres.2009.04.018.Search in Google Scholar
Pierrehumbert, Janet B. 2002. Word-specific phonetics. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratory Phonology (Phonology and Phonetics; 4,1), 101–140. Berlin, New York: DeGruyter.10.1515/9783110197105.1.101Search in Google Scholar
Pitt, Mark A, Laura Dilley, Keith Johnson, Scott Kiesling, William Raymond, Elizabeth Hume & Eric Fosler-Lussier. 2007. Buckeye corpus of conversational speech (2nd release). Columbus, OH: Department of Psychology, Ohio State University.Search in Google Scholar
Plag, Ingo, Julia Homann & Gero Kunter. 2017. Homophony and morphology: The acoustics of word-final S in English. Journal of Linguistics 53(1). 181–216. https://doi.org/10.1017/S0022226715000183.Search in Google Scholar
Plag, Ingo, Arne Lohmann, Sonia Ben Hedia & Julia Zimmermann. 2020. An S is an’S, or is it? Plural and genitive plural are not homophonous. In Lívia Körtvélyessy & Pavol Štekauer (eds.), Complex words: Advances in morphology, 260–292. Cambridge: CUP.10.1017/9781108780643.015Search in Google Scholar
Pluymaekers, Mark, Mirjam Ernestus & R. Harald Baayen. 2005. Articulatory planning is continuous and sensitive to informational redundancy. Phonetica 62(2–4). 146–159. https://doi.org/10.1159/000090095.Search in Google Scholar
Qu, Qingqing & Markus F. Damian. 2019. Orthographic effects in Mandarin spoken language production. Memory and Cognition 47(2). 326–334. https://doi.org/10.3758/s13421-018-0868-7.Search in Google Scholar
R Core Team. 2022. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar
RStudio Team. 2022. RStudio. Boston, MA: Integrated Development for R. RStudio, PBC. Available at: http://www.rstudio.com/.Search in Google Scholar
Rapp, Brenda, Lisa Benzing & Alfonso Caramazza. 1997. The autonomy of lexical orthography. Cognitive Neuropsychology 14(1). 71–104. https://doi.org/10.1080/026432997381628.Search in Google Scholar
Rastle, Kathleen, Samantha F. McCormick, Linda Bayliss & Colin J. Davis. 2011. Orthography influences the perception and production of speech. Journal of Experimental Psychology: Learning, Memory, and Cognition 37(6). 1588–1594. https://doi.org/10.1037/a0024833.Search in Google Scholar
Roelofs, Ardi & Victor S. Ferrreira. 2019. The architecture of speaking. In Peter Hagoort (ed.), Human language: From genes and brains to behavior, 35–50. Cambridge (MA) & London: The MIT Press.10.7551/mitpress/10841.003.0006Search in Google Scholar
Roelofs, Ardi. 2006. The influence of spelling on phonological encoding in word reading, object naming, and word generation. Psychonomic Bulletin & Review 13. 33–37. https://doi.org/10.3758/BF03193809.Search in Google Scholar
Roettger, Timo & Dinah Baer-Henney. 2019. Toward a replication culture: Speech production research in the classroom. Phonological Data & Analysis 1(4). 1–23. https://doi.org/10.3765/pda.v1art4.13.Search in Google Scholar
Sadoski, Mark. 2005. A dual coding view of vocabulary learning. Reading & Writing Quarterly 21(3). 221–238. https://doi.org/10.1080/10573560590949359.Search in Google Scholar
Sandra, Dominiek. 2010. Homophone dominance at the whole-word and sub-word levels: Spelling errors suggest full-form storage of regularly inflected verb forms. Language and Speech 53(3). 405–444. https://doi.org/10.1177/0023830910371459.Search in Google Scholar
Schmitz, Dominic, Dinah Baer-Henney & Ingo Plag. 2021. The duration of word-final /s/ differs across morphological categories in English: Evidence from pseudowords. Phonetica 78(5–6). 571–616. https://doi.org/10.1515/phon-2021-2013.Search in Google Scholar
Seidenberg, Mark & Michael Tanenhaus. 1979. Orthographic effects on rhyme monitoring. Journal of Experimental Psychology: Human Learning & Memory 5(6). 546–554. https://doi.org/10.1037/0278-7393.5.6.546.Search in Google Scholar
Seyfarth, Scott. 2014. Word informativity influences acoustic duration: Effects of contextual predictability on lexical representation. Cognition 133(1). 140–155. https://doi.org/10.1016/J.COGNITION.2014.06.013.Search in Google Scholar
Seyfarth, Scott, Marc Garellek, Gwendolyn Gillingham, Farrell Ackerman & Robert Malouf. 2018. Acoustic differences in morphologically-distinct homophones. Language, Cognition and Neuroscience 33(1). 32–49. https://doi.org/10.1080/23273798.2017.1359634.Search in Google Scholar
Strycharczuk, Patrycja. 2019. Phonetic detail and phonetic gradience in morphological processes. Oxford Research Encyclopedia of Linguistics(March). 1–23. https://doi.org/10.1093/acrefore/9780199384655.013.616.Search in Google Scholar
Taft, Marcus. 2006. Orthographically influenced abstract phonological representation: Evidence from non-rhotic speakers. Journal of Psycholinguistic Research 35(1). 67–78. https://doi.org/10.1007/s10936-005-9004-5.Search in Google Scholar
Taft, Marcus. 2011. Orthographic influences when processing spoken pseudowords: Theoretical implications. Frontiers in Psychology 2. 1–7. https://doi.org/10.3389/fpsyg.2011.00140.Search in Google Scholar
Taft, Marcus, Anne Castles, Chris Davis, Goran Lazendic & Minh Nguyen-Hoan. 2008. Automatic activation of orthography in spoken word recognition: Pseudohomograph priming. Journal of Memory and Language 58(2). 366–379. https://doi.org/10.1016/J.JML.2007.11.002.Search in Google Scholar
Tang, Kevin & Jason A. Shaw. 2021. Prosody leaks into the memories of words. Cognition 210. 104601. https://doi.org/10.1016/j.cognition.2021.104601.Search in Google Scholar
Tomaschek, Fabian, Peter Hendrix & R. Harald Baayen. 2018. Strategies for addressing collinearity in multivariate linguistic data. Journal of Phonetics 71(November). 249–267. https://doi.org/10.1016/J.WOCN.2018.09.004.Search in Google Scholar
Tomaschek, Fabian, Ingo Plag, Mirjam Ernestus & R. Harald Baayen. 2021. Phonetic effects of morphology and context: Modeling the duration of word-final S in English with naïve discriminative learning. Journal of Linguistics 57(1). 123–161. https://doi.org/10.1017/S0022226719000203.Search in Google Scholar
Torreira, Francisco & Mirjam Ernestus. 2009. Probabilistic effects on French [t] duration. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 448–451.10.21437/Interspeech.2009-160Search in Google Scholar
Treiman, Rebecca, John Mullennix, Ranka Bijeljac-Babic & E. Daylene Richmond-Welty. 1995. The special role of rimes in the description, use, and acquisition of English orthography. Journal of Experimental Psychology: General 124(2). 107–136. https://doi.org/10.1037/0096-3445.124.2.107.Search in Google Scholar
Treiman, Rebecca, Brett Kessler & Tatiana Cury Pollo. 2022. Prephonological spelling and its connections with later word reading and spelling performance. Journal of Experimental Child Psychology 218. 105359. https://doi.org/10.1016/j.jecp.2021.105359.Search in Google Scholar
Turk, Alice E. & Stefanie Shattuck-Hufnagel. 2007. Multiple targets of phrase-final lengthening in American English words. Journal of Phonetics 35(4). 445–472. https://doi.org/10.1016/j.wocn.2006.12.001.Search in Google Scholar
Ulicheva, Anastasia, Marco Marelli & Kathleen Rastle. 2021a. Sensitivity to meaningful regularities acquired through experience. Morphology 31(3). 275–296. https://doi.org/10.1007/s11525-020-09363-5.Search in Google Scholar
Ulicheva, Anastasia, Max Coltheart, Oxana Grosseck & Kathleen Rastle. 2021b. Are people consistent when reading nonwords aloud on different occasions? Psychonomic Bulletin & Review 28(5). 1679–1687. https://doi.org/10.3758/s13423-021-01925-w.Search in Google Scholar
Umeda, Noriko. 1977. Consonant duration in American English. The Journal of the Acoustical Society of America 61(3). 846–858. https://doi.org/10.1121/1.381374.Search in Google Scholar
Venables, William N. & Brian D. Ripley. 2002. Modern applied statistics with S, 4th edn. New York: Springer.10.1007/978-0-387-21706-2Search in Google Scholar
Venezky, Richard L. 1967. English orthography: Its graphical structure and its relation to sound. Reading Research Quarterly 2(3). 75–105. https://doi.org/10.2307/747031.Search in Google Scholar
Venezky, Richard L. 1999. The American way of spelling: The structure and origins of American English orthography. New York & London: The Guilford Press.Search in Google Scholar
Warner, Natasha, Erin Good, Allard Jongman & Joan Sereno. 2006. Orthographic vs. Morphological incomplete neutralization effects. Journal of Phonetics 34(2). 285–293. https://doi.org/10.1016/j.wocn.2004.11.003.Search in Google Scholar
Warner, Natasha, Allard Jongman, Joan Sereno & Rachèl Kemps. 2004. Incomplete neutralization and other sub-phonemic durational differences in production and perception: Evidence from Dutch. Journal of Phonetics 32(2). 251–276. https://doi.org/10.1016/S0095-4470(03)00032-9.Search in Google Scholar
Wei, Taiyun & Viliam Simko. 2021. R package ‘corrplot’: Visualization of a correlation matrix (Version 0.92). Available at: https://github.com/taiyun/corrplot.Search in Google Scholar
Wheeldon, Linda R. & Stephen Monsell. 1992. The locus of repetition priming of spoken word production. The Quarterly Journal of Experimental Psychology Section A 44(4). 723–761. https://doi.org/10.1080/14640749208401307.Search in Google Scholar
White, Laurence, Caroline Floccia, Jeremy Goslin & Joseph Butler. 2014. Utterance-final lengthening is predictive of infants’ discrimination of English accents. Language Learning 64(2). 27–44. https://doi.org/10.1111/lang.12060.Search in Google Scholar
Yanagida, Takuya. 2023. misty: Miscellaneous functions ‘T. Yanagida’. R package version 0.4.7. Available at: https://CRAN.R-project.org/package=misty.Search in Google Scholar
Yarkoni, Tal, David Balota & Melvin Yap. 2008. Moving beyond coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review 15(5). 971–979. https://doi.org/10.3758/PBR.15.5.971.Search in Google Scholar
Yates, Mark. 2005. Phonological neighbors speed visual word processing: Evidence from multiple tasks. Journal of Experimental Psychology: Learning, Memory, and Cognition 31(6). 1385–1397. https://doi.org/10.1037/0278-7393.31.6.1385.Search in Google Scholar
Yates, Mark, Lawrence Locker & Greg B. Simpson. 2004. The influence of phonological neighborhood on visual word perception. Psychonomic Bulletin & Review 11(3). 452–457. https://doi.org/10.3758/BF03196594.Search in Google Scholar
Young-Scholten, Martha. 2002. Orthographic input in L2 phonological development. In Petra Burmeister, Thorste Piske & Andreas Rhode (eds.), An integrated view of language development – papers in honor of henning wode, 263–279. Trier: Wissenschaftlicher Verlag Trier.Search in Google Scholar
Zee, Tim, Louis ten Bosch, Ingo Plag & Mirjam Ernestus. 2021. Paradigmatic relations interact during the production of complex words: Evidence from variable plurals in Dutch. Frontiers in Psychology 12(September). 1–16. https://doi.org/10.3389/fpsyg.2021.720017.Search in Google Scholar
Zhang, Qingfang & Markus F. Damian. 2012. Effects of orthography on speech production in Chinese. Journal of Psycholinguistic Research 41(4). 267–283. https://doi.org/10.1007/s10936-011-9193-z.Search in Google Scholar
Ziegler, Johannes C. & Ludovic Ferrand. 1998. Orthography shapes the perception of speech: The consistency effect in auditory word recognition. Psychonomic Bulletin & Review 5(4). 683–689. https://doi.org/10.3758/BF03208845.Search in Google Scholar
Ziegler, Johannes C., Marie Montant & Arthur M. Jacobs. 1997a. The feedback consistency effect in lexical decision and naming. Journal of Memory and Language 37(4). 533–554. https://doi.org/10.1006/jmla.1997.2525.Search in Google Scholar
Ziegler, Johannes C., Gregory O. Stone & Arthur M. Jacobs. 1997b. What is the pronunciation for -ough and the spelling for /u/? A database for computing feedforward and feedback consistency in English. Behavior Research Methods, Instruments, and Computers 29(4). 600–618. https://doi.org/10.3758/BF03210615.Search in Google Scholar
© 2023 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.