Abstract
There is now a large literature probing syllable affiliation of consonant sequences through phonetic measurements. These studies often use one of two diagnostic measures: (1) temporally stable intervals using relative standard deviation, and (2) compensatory shortening effects. In this study, we argue that both measures are difficult to infer from without precise theoretically predicted expectations and additional controls. We studied eleven native speakers of North-Central Peninsular Spanish who pronounced disyllabic real/nonce Spanish words with varying consonant sequences. On the face of it, our temporal stability and compensatory shortening results challenge the standard analysis of syllabic affiliation in Spanish phonology, potentially supporting a complex onset analysis for /sl/ and /sm/. However, in post hoc analyses we observed shortening effects outside the target syllable due to consonant sequences, indicating evidence for poly-constituent shortening. Therefore, compensatory shortening effects within a syllable cannot automatically be assumed to be due to syllable structure. Our results and simulations suggest that, despite superficial evidence of a c-centre alignment, the clusters are more consistent with a right-edge alignment once poly-constituent shortening and domain-initial lengthening are taken into consideration.
1 Introduction
The last 40 years have seen many advances in connecting phonetic signatures to syllabic affiliation of segments. Using articulatory methods, and more recently replicated with acoustics, measures of temporal stability, like c-centre-to-anchor and right-edge-to-anchor patterns, have been probed to see if they reflect syllabic affiliation of consonant sequences (Browman and Goldstein 1988; Durvasula et al. 2021; Goldstein et al. 2007; Hermes et al. 2017, 2013; Marin and Pouplier 2014; Shaw et al. 2009; Sotiropoulou et al. 2020). Generally, two patterns of stability have been observed in the literature, each theorized to be characteristic of a particular syllabic affiliation. In languages allowing complex onsets, such as English, studies reveal that typically word-initial consonant sequences temporally reorganize as a unit. They synchronize with the nuclear vowel, meaning that the midpoint of consonantal gestures aligns temporally with the end of the nuclear vowel, maintaining a constant duration between these two points, regardless of the number of consonants in the onset. In contrast, languages that do not admit complex onsets typically exhibit a different articulatory behavior (Durvasula et al. 2021; Goldstein et al. 2007; Hermes et al. 2017; Shaw et al. 2009, 2011). In such languages, the articulatory gesture for the second consonant in word-initial sequences maintains a stable relationship with the following vowel. This results in temporal stability between the second consonant (rather than the entire sequence) and the end of the nuclear vowel. Although these patterns have been identified in various languages, not all languages adhere strictly to these distinctions. Sometimes they can even vary within the same language (Hermes et al. 2013). German (Wiese 1996), French (Dell 1995), Hebrew (Bolozky 1997), and Italian (Davis 1987) have all been shown to exhibit different types of temporal stability patterns with different types of consonant sequences, arguably undermining the relationship between stability patterns and syllabic affiliation (an issue we expand on in Section 2).
In this article, we build upon existing research by examining temporal stability patterns within consonant sequences in North-Central Peninsular Spanish. In addition to using acoustic techniques, we also incorporate word-medial environments into our study. More specifically, we look at the temporal stability patterns observed in consonant sequences and the phonetic shortening effect observed in the C2 position of particular word-initial /fl/ sequences and word-medial /fl sl sm/ sequences in Spanish. In contrast to the standard phonological analysis, our findings show patterns which could be interpreted as being consistent with being complex onsets, for all consonant sequences under analysis, and in all analyzed word positions. However, post-hoc analyses show shortening effects outside the target syllable as well, indicating that the observed c-centre effects are confounded by poly-constituent shortening and perhaps domain-initial lengthening. Although the examination of poly-constituent shortening was not originally a part of our research design, its consideration in the post-hoc analysis opened an alternative explanation to the patterns uncovered, namely that the observed durational changes are not just local to the syllable and therefore are not evidence of syllabic affiliations. We followed up with simulation work along the lines of Shaw and Gafos (2015) and Shaw et al. (2009) and confirmed that the general direction of the observed temporal stability patterns was possible under both complex and simplex onset organisation, and that this was true even when we modelled in poly-constituent shortening based on our own results. However, as we note below, there is some evidence that the observed temporal stability patterns were more consistent with right-edge-to-anchor stability.
In the next two sub-sections, we review how temporal stability metrics have generally been viewed to relate to onset affiliation, followed by a review of relevant phonological and phonetic facts of Spanish syllable structure, specifically as it relates to consonant sequences. We then present the methods and results of two experiments in Sections 5 and 6, respectively. In Section 7 we present the post-hoc analyses, followed by the relevant simulations in Section 8. We conclude the article with a discussion of some implications of this research in Section 9.
2 Temporal stability metrics and onset affiliation
In an important first assay on the topic, Browman and Goldstein (1988) found that onset consonants in American English showed a temporal stability pattern around the centre of the mid-points of their oral gestures. They termed this abstract centre point the c-centre. Specifically, they analysed articulatory data from the Tokyo x-ray microbeam database, which consisted of sets of nonsense words/phrases (e.g., pi lats vs. pi plats vs. pi splats), and found that irrespective of the number of consonants (1 vs. 2 vs. 3) in the onsets of the second word in such phrases, the c-centre point was in a stable relationship with the following vowel (the anchor). So, the addition of more consonants to the onset did not seem to affect the duration between the c-centre and the anchor. This pattern of stability is schematised in Figure 1 (left), and has since been termed c-centre-to-anchor interval stability. The pattern of stability of the c-centre-to-anchor interval for onset consonants has been replicated for at least some consonant sequences, in American English (Marin and Pouplier 2010), Romanian (Marin and Pouplier 2014), Georgian (Goldstein et al. 2007), Italian (Hermes et al. 2013), Polish (Hermes et al. 2017), and Spanish (Sotiropoulou et al. 2020).

Schematic representations of c-centre-to-anchor interval stability patterns (left) and right-edge-to-anchor interval stability patterns (right) (figure adapted from Durvasula et al. (2021) and Shaw et al. (2009). The x-axis in the figure represents time. The anchor marks the end of the following vowel, and C1-C2 represent word-initial consonants.
In contrast to the above stability pattern observed in languages that allow complex onsets, languages that do not allow complex onsets typically show a different, right-edge-to-anchor, stability. For example, Tashlhiyt Berber, despite having word-initial consonant sequences, disallows complex onsets. Goldstein et al. (2007) and Hermes et al. (2017) observed that the right-most consonant of a word-initial consonant sequence (the right-edge) is in a stable relationship with the following vowel, as schematised in Figure 1 (right). This pattern of right-edge-to-anchor interval stability has also been observed in Moroccan Arabic (Shaw et al. 2009, 2011) and Jazani Arabic (Durvasula et al. 2021).
In an important result in this line of research, Hermes et al. (2013) showed that the stability patterns can vary within the same language. In Italian, phonologists have argued for different types of onset complexity for different consonant sequences (Davis 1987). For example, there is a clear morpho-phonological pattern related to the definite article that consonant sequences exhibit in Italian – nouns with some types of word-initial sibilant-initial sequence (e.g., /sp/) appear with the allomorph /lo/, while other consonant sequences (e.g., /pr/) and singleton consonants (e.g., /p/) appear with the allomorph /il/. Based on such patterns, it has been argued that the different consonant sequences have different types of onset complexity – wherein some types of word-initial sibilant-initial consonant sequences do not form a complex onset, while the other word-initial consonant sequences do (Davis 1987). In line with this distinction, Hermes et al. (2013) argued that the relevant sibilant-initial consonant sequences show right-edge coordination, but non-sibilant-initial consonant sequences do not; they seem to show a c-centre co-ordination. This particular set of facts from Italian argues against the possibility that the previously observed stability patterns in other languages were simply a language specific setting of gestural co-ordination between pre-vocalic consonants, and allows us to clearly establish the link between the stability patterns and onset complexity.
In summary, the results discussed above suggest a potential linking hypothesis between onset complexity and temporal stability patterns associated with the following vowel, namely, that consonant sequences that form complex onsets have a c-centre-to-anchor interval stability, while those that form simplex onsets have a right-edge-to-anchor interval stability. Consequently, researchers can potentially use the c-centre-to-anchor versus right-edge-to-anchor interval stability pattern to probe onset complexity. If a consonant sequence belongs to the same onset (or syllable), then there should be a c-centre effect. If a consonant sequence has consonants that are not part of the same onset (or syllable), there should not be a c-centre effect.
However, there are some results that contradict the above linking hypothesis. Some arguments in the phonological literature suggest that word-initial consonant sequences in Hebrew (Bolozky 1997), French (Dell 1995) and German (Wiese 1996) form complex onsets. However, the three languages have been observed to show a right-edge alignment, at least for some consonant sequences (Brunner et al. 2014; Pouplier 2012; Tilsen et al. 2012). While the observations might at first blush seem problematic for the linking hypothesis discussed above, there are at least three different ways one could account for them. First, Mücke et al. (2020) argued that the patterns observed in the languages are consistent with a complex onset organisation, and previous research likely misinterpreted the relevant articulatory data. More specifically, they suggest that explicitly modelling the speaker-specific coupling strength between gestures and speaker-specific biomechanical interactions between articulators allows us to still understand the patterns in such languages as stemming from a c-centre organisation. They argue that this is likely the case for at least German. Second, Sotiropoulou et al. (2020) suggest that relevant cues of different syllabic affiliations are in fact distributed over a variety of gestural adjustments within a syllable, and may not show up in each such aspect. They thereby suggest a “global” organisation over syllables. Finally, Durvasula et al. (2021) suggest that, when the intervals are extracted from acoustic measurements, the relevant articulatory data is in fact indirect evidence of the stability patterns present in the acoustics, and that there might be more stability for the c-centre-to-anchor than for the right-edge-to-anchor for the three languages. This last possibility receives further support from recent work that did find a c-centre-to-anchor stability pattern using the acoustic method for the same sequences that didn’t show the pattern in the articulations (Franke et al. 2023). All three of the above suggestions raise the possibility that the articulatory stability patterns observed in the three languages may not be counterexamples to the linking hypothesis after all.
While almost all previous related work has studied the phenomenon by observing gestural coordination using articulatory data, in work that is most relevant for the present article, there is clear evidence that acoustic recordings can be used to observe a c-centre-to-anchor interval stability pattern for word-initial consonant sequences in a complex onset language like English (Durvasula et al. 2021; Selkirk and Durvasula 2013; Shaw and Gafos 2015), and a right-edge-to-anchor interval stability pattern for word-initial consonant sequences in a language that does not allow complex onsets, like Jazani Arabic Durvasula et al. (2021). Durvasula et al. (2021, p. 198) point out that their particular result opens up the possibility of probing such effects both in the lab and in field work, and express hope that the technique “will be employed in a variety of languages and contexts – not only to test its viability, but also to examine its correlation with more traditional analytical techniques for inferring syllable structure.” We follow up on this hope in this paper by probing the syllable structure of word-initial and word-medial consonant sequences in Spanish through acoustic techniques.
3 Syllabic affiliation of consonant sequences in Spanish
The standard analysis of the syllabic affiliation of onset consonant clusters is not generally regarded as a controversial topic in Spanish phonology (Colina 2009, 2012; Harris 1983; Hualde 1991, 2005; Morales-Front 2018; Real Academia Española 2011; Saporta and Contreras 1962). Under this standard view, Spanish onset clusters (either word-initially or word-medially) may have at most two consonants and their structure is very constrained. Licit sequences consist of an obstruent (including a labiodental fricative),[1] /p t k b d g f/,[2] as the first member of the cluster, and a liquid, /l/ or /ɾ/, as the second (see examples in Table 1). All other consonant sequences are standardly described as heterosyllabic. Onset clusters with coronals /d t/ followed by /l/ are typically seen as exceptions to the generalisation, as they are not observed word-initially in most dialects.[3] The standard analysis of the syllabic affiliation of onset consonant clusters is not generally regarded as a controversial topic in Spanish phonology (Colina 2009, 2012; Harris 1983; Hualde 1991, 2005; Morales-Front 2018; Real Academia Española 2011; Saporta and Contreras 1962). Under this standard view, Spanish onset clusters (either word-initially or word-medially) may have at most two consonants and their structure is very constrained. Licit sequences consist of an obstruent (including a labiodental fricative),[4] /p t k b d g f/, as the first member of the cluster, and a liquid, /l/ or /ɾ/, as the second (see examples in Table 1). All other consonant sequences are standardly described as heterosyllabic. Onset clusters with coronals /d t/ followed by /l/ are typically seen as exceptions to the generalisation, as they are not observed word-initially in most dialects.[5]
Word-initial and word-medial consonant sequences with /r/ (top) and /l/ (bottom).
| Word-initial | Word-medial | ||
|---|---|---|---|
|
|
arm.masc.sg |
|
hug.masc.sg |
|
|
drains.3sg |
|
adrenaline.fem.sg |
|
|
grade.masc.sg |
|
sacred.masc.sg |
|
|
prize.masc.sg |
|
urgency.masc.sg |
|
|
bring.inf |
|
distract.inf |
|
|
believable.sg |
|
incredible.sg |
|
|
chilly.masc.sg |
|
soft drink.masc.sg |
|
|
block.inf |
|
unblock.inf |
|
|
gluten.masc.sg |
|
swallow.3pl |
|
|
park.fem.sg |
|
postpones.3sg |
|
|
clone.masc.sg |
|
cyclone.masc.sg |
|
|
Spanish cream caramel.masc.sg |
|
inflate.3pl |
Early experimental work on complex onsets looked primarily at the distribution and properties of epenthetic vowels[6] in word-initial consonant sequences. It is only more recently that instrumental studies have directly examined the phonetic consequences of syllabic affiliation.
Three sets of studies are relevant to the research we report on in this article. One strand of laboratory work has examined vowel compression in a number of syllable structures (Aldrich and Simonet 2019; Marchini and Ramsammy 2022a, 2022b). For example, Aldrich and Simonet (2019) examined vowel duration of a mixture of real and nonce words with the templates pVpa (e.g., papa), pVCpa (e.g., palpa), pVCCpa (e.g., panspa), and pCVpa (e.g., plapa and prapa), where V in all cases was the target vowel. With regard to the forms that are of interest for the present study, they found that nuclear vowels in pCVpa were systematically shorter when preceded by a word-initial consonant sequence. That is, the presence of two consonants –a complex onset– caused an acoustic shortening of the nuclear vowel, thus offering evidence for acoustic cues signaling syllabic affiliation. Note, however, the vowel of interest in their study was always in the word-initial syllable. As a result, complex onsets were only examined word-initially. We build on Aldrich and Simonet (2019) by precisely including these word-medial [fC sC] sequences to further probe the acoustic correlates of syllable affiliation of consonant sequences.
Another strand of work examines consonant durations in different positions of the word. Prieto (2002) showed that lateral consonants in complex onsets (e.g., /bl/ in
Finally, articulatory work examining the temporal coordination of consonantal gestures has also probed syllabic affiliation of Spanish consonant sequences. Sotiropoulou et al. (2020) observed a number of gestural adjustments. There are two key insights of this research. First, regarding consonant-lateral sequences, they found that prevocalic laterals in word-initial consonant sequences (e.g.,
4 Linking hypotheses used in this paper
Based on the discussion in the preceding sections, two mutually compatible linking hypotheses seem to be available to probe consonant sequences, which we present in (1). The first linking hypothesis comes directly from the discussion on temporal stability metrics. The second linking hypothesis comes from prior experimental work on consonant sequences in Spanish as discussed in Section 3, though it has also been found to be useful in distinguishing alignment patterns in American English and Jazani Arabic (Durvasula 2023). Note, the second linking hypothesis is also consistent with the first – Sotiropoulou et al. (2020) suggest the relevant temporal stability patterns are only a part of the set of correlates that distinguish global timing stability in complex onset languages from local timing stability in simple onset languages, and really the distinction between the two is simultaneously expressed over a set of different phonetic parameters rather than just through a single measure such as c-centre-to-anchor or right-edge-to-anchor interval stability. It is important to re-iterate, based on the review of the literature in the previous sections, that both these potential linking hypotheses are meant to be true of languages in general, and not specific to just Spanish.
| Linking hypotheses |
| Linking hypothesis 1: A consonant sequence that is a complex onset will show c-centre-to-anchor interval stability, while one that is a simplex onset will show right-edge-to-anchor interval stability. |
| Linking hypothesis 2: The second member (C2) will be shorter in a word containing a C1C2 sequence than in a minimal-pair word without C1, if the sequence is a complex onset in the original word. |
Based on the above linking hypotheses, word-initial and word-medial /fl/ sequences should show a c-centre-to-anchor interval stability and the [l] should shorten when compared to stimuli without the preceding [f]. Similarly, the word-medial sC sequence should show right-edge-to-anchor interval stability and the C following the /s/ should not shorten when compared to stimuli without the preceding /s/.
These are the hypotheses the study was designed to test. However, as we will point out in later sections, they are actually difficult to maintain without further elaboration of the effect of poly-constituent shortening. There is evidence from many languages that as words become longer, individual segments exhibit shortening (Cuenca 1997; Farnetani and Kori 1986; Fowler 1981; Katz 2012; Marin and Pouplier 2010; Munhall et al. 1992). Therefore, shortening of C2 (linking hypothesis 2) maybe an artefact of a longer word (by one segment here). As a result, the stability for the right-edge-to-anchor interval would decrease as well, unrelated to the underlying syllable structure. Thus, an observed c-centre-to-anchor interval stability (the putative phonetic signature of complex onsets), could actually be masking a true right-edge alignment. Sections 7 and 8 expand on this issue.
The predictions stemming from the above hypotheses are also affected by domain-initial lengthening, whereby segments at the beginning of higher prosodic domains are longer (Cho et al. 2003; Fougeron and Keating 1997). Specifically, word-initial segments may be lengthened due to domain-initial lengthening, an effect that is reduced or absent for consonants in non-initial positions. Consequently, a word-initial consonant may appear to shorten when in a word-initial complex onset (singleton /l/ vs. /l/ in a Cl sequence, for example). Additionally, the shortening of the consonant increases the stability measure for right-edge-to-anchor. Such an interaction could account for the observation of c-centre-to-anchor stability word-initially, i.e., this interaction could account for the apparent c-centre alignment at word-initial positions. Although domain-initial lengthening is a potential confound, our experimental stimulus design does not allow us to observe it independently from the effect under study. We address this issue further in Section 9, where where we discuss its implications for our overall findings.
We first present two experiments that use the above linking hypotheses to study four different consonant sequences in Spanish: word-initial /fl/ sequences, and word-medial /fl/ and /sl/ in Experiment 1, and word-medial /sm/ sequences in Experiment 2.
5 Experiment 1
5.1 Methods
5.1.1 Participants
Prior to data collection, a pre-selection study was set up in Prolific (www.prolific.co) for native speakers of north-central Peninsular Spain (self-reported). North-central Peninsular Spanish was targeted because it is a [s]-preserving dialect (Hualde 2005), which was crucial for ease of consistent demarcation (see below), and the availability of participants in the Prolific platform. Note, in Prolific, we were able to limit the participant pool to those who resided in Spain, held Spanish nationality and reported Spanish as their L1. Additionally, they also reported having lived in Spain their whole lives. The participants were invited to submit recordings of the words in Table 2.
Pre-selection stimuli.
| Item | IPA transcripction | Gloss | Item | IPA transcripction | Gloss |
|---|---|---|---|---|---|
|
|
/feliθidad/ | happiness.fem.sg |
|
/pasta/ | pasta.fem.sg |
|
|
/ensaladas/ | salads.fem.pl |
|
/xiɾafas/ | giraffes.fem.pl |
|
|
/familia/ | family.fem.sg |
|
/espaɲol/ | Spanish.masc.sg |
|
|
/tristeθa/ | sadness.femsg |
|
/inkompletas/ | incomplete.fem.pl |
|
|
/xente/ | people.fem.sg |
|
/xabali/ | wild boar.masc.pl |
|
|
/formaθion/ | formation.fem.sg |
We took inspiration from Durvasula et al. (2021) in deploying the study over the internet. However, we went beyond the procedure established by them by adding a pre-selection task, for two reasons: (a) we wanted to ensure that we got sufficiently high-quality recordings for us to be able to annotate our recordings for segment boundaries with little confusion; (b) we wanted to ensure that the speakers were indeed from our target population. The stimuli for the pre-selection in Table 2 were chosen specifically to ensure that the speakers were from an /s/ preserving dialect and exhibited /θ/ in their pronunciations, as would be expected for speakers of the target dialect. Of the 31 submissions, 11 were selected based on the quality of the recordings. The spectrograms in Figures 2–5 are representative of the overall corpus of analysis.

Sample annotation of the nonce word

Sample annotation of the nonce word

Sample annotation of the word

Sample annotation of the word
5.2 Materials
Stimuli for the actual experiment consisted of disyllabic real and nonce Spanish words, with a word-initial or word-medial sequence of the form C1C2V(C3), and penultimate stress. In all cases, we were careful in choosing words where there was no morpheme boundary (or the appearance of one for nonce words) between the consonants in the target sequence. For word-initial sequences, the C1 was /f/, C2 was /l/, V was either /a/ or /o/ and C3 was /s/; and for word-medial cases, C1 could also be /s/. Thus, the consonant sequence of interest was in word-initial or word-medial positions (see Table 3). Their paired single-consonant words included a first or final syllable in the form of C1V(C2), where C1 = /l/, V = /a o/ and C2 = /s/. Thus, a word like
Experiment 1 Stimuli.
| Real word pair | Gloss | Nonce word pair | |
|---|---|---|---|
| /fl/ word-initial |
|
skinny.fem.sg, lacquer.fem.sg |
|
| /ˈfla.ka/, /ˈla.ka/ | /ˈfla.to/, /ˈla.to/ | ||
|
|
flan.masc, LAN.fem.sg |
|
|
| /ˈflan/, /ˈlan/ | /ˈfla.pe/, /ˈla.pe/ | ||
|
|
float.3sg.sjv, plot-of-land.masc.sg |
|
|
| /ˈflo.te/, /ˈlo.te/ | /ˈflo.ke/, /ˈlo.ke/ | ||
| /fl/ word-medial |
|
||
| /ˈna.flas/, /ˈna.las/ | |||
|
|
|||
| /ˈba.flos/, /ˈba.los/ | |||
|
|
|||
| /ˈgo.flas/, /ˈgo.las/ | |||
| /sl/ word-medial |
|
thigh.masc.pl, mule.masc.pl |
|
| /ˈmus.los/, /ˈmu.los/ | /ˈkes.las/, /ˈke.las/ | ||
|
|
island.masc.pl,thread.2sg |
|
|
| /ˈis.las/, /ˈi.las/ | /ˈtos.los/, /ˈto.los/ | ||
|
|
Tesla.pl, fabric.masc.pl |
|
|
| /ˈtes.las/, /ˈte.las/ | /ˈpos.las/, /ˈpo.las/ |
We used /f/ and /s/ as C1 because: (a) their acoustic boundaries are easier to demarcate, in comparison to segments such as stops, and (b) because we assumed that they allowed us to probe both cases of tautosyllabic and hetorosyllabic consonant sequences. The lateral was chosen as C2 in order to avoid the epenthetic vowels often found with production of Spanish /Cɾ/ sequences (Bradley 2006; Colantoni and Steele 2005). Additionally, using the rhotic as C2 would not allow the comparison of word-internal and word-initial contexts, since only the trill is licit in word-initial position whereas only the tap is in complex onsets. When in word-final position, the addition of C3 (C2 in the single-consonant member of the pair; always /s/) ensured that the offset of the vowel following the crucial consonant sequence could be easily identified. Nonce words were included in the study because perfect Spanish minimal pairs were not always possible. There were no observations of productions of these nonce words with variable stress. A set of fillers items, both real and nonce, were also included to be used in a different study.
5.3 Procedure
After recruitment on Prolific, participants were directed to a survey on JotForm.[7] We chose this platform because it includes a participant-friendly recording widget. Participants read five repetitions of test items in Table 3 (as well as fillers).[8] Each repetition was its own pseudo-randomised list, where the words were presented in blocks of about 10. The total of tokens produced by speaker was 270 (54 stimuli x 5 repetitions), of which 150 were test items (30 stimuli x 5 repetitions).
5.4 Measurements
The recordings were first automatically forced-aligned using the Montreal Forced Aligner (McAuliffe et al. 2017), and then the annotations were manually corrected by both authors in Praat (Boersma and Weenink 2023). One author corrected repetitions one, two and three, while the other author corrected repetitions three, four and five. The third repetition, annotated by both authors, allowed us to test for annotation reliability (see below). The annotation contained three tiers: a word tier, a phone tier and a quality tier. If participants misread the item, or there were any other types of disfluencies (e.g., a pause), the token was marked as ‘bad’; it was otherwise kept empty. Approximately 11.5 % of the data was removed due to poor quality (defined here as ‘bad/weird’ in our annotation). During the first phase of annotation, the label ‘unclear’ was used for cases in which boundaries between segments were not straightforward, and the label ‘weird’ was used for cases where the production was unexpected based on the stimulus prompt. These cases were reviewed by both authors. There were no cases of disagreement. After these exclusions, a total of 1328 observations were submitted for analysis.
For each token, the focus of annotation was the target C1, C2, V, and C3. The interval for the fricative was identified by the presence of a noisy spectrum; the offset of the interval was identified as the point in which the onset of formant structure for the following segment was observable. The interval for [l], which could be word-initial or word-internal, was identified at the start clear onset of voiced waveform and formants. Finally, the vowel interval was identified based on the strong presence of formant structure, and in the waveform, a noticeably higher intensity when compared to its surrounding segments. When followed by /s/ (e.g., in the word-final syllable, like in
As noted above, in order to determine inconsistencies across annotators, the third repetition was corrected by both authors and their percentage of agreement was calculated at various levels. Segmental boundaries were within 5 ms of each other in 54 % of the cases, within 10 ms in 79 %, and within 15 ms in 89 %. Given these numbers, we assume that there was reasonable consistency across authors in annotation.
We then extracted: (a) the duration from the mid-point of the right-most prevocalic consonant to the end of the following vowel (i.e, right-edge-to-anchor), (b) the duration from the mean of the mid-points of the word-initial consonants to the end of the following vowel (i.e., c-centre-to-anchor).
Following Shaw et al. (2009, 2011), we calculated the Relative Standard Deviation (RSD) of the durations for each pair of words using the formula below (Equation (1)) to estimate the spread, and therefore the stability, in the durations. To measure the RSD for each pair of words for each participant, we used all the repetitions of the pair produced by the participant. Note, we used RSDs as our measure of stability since they have been argued to control for the larger variance that is typically associated with longer durations; in contrast, an uncorrected measure such as standard deviation or variance would have an inherent bias against measures involving longer durations (in this case, c-centre-to-anchor) over those involving shorter durations (right-edge-to-anchor).
In our data, the RSD value was calculated for the c-centre-to-anchor interval and right-edge-to-anchor interval separately. For each interval, within each word-pair and subject, the standard deviation of all the interval durations was calculated, and then divided by the mean of all the interval durations. The RSD value is expected to be higher when there is a greater increase in interval duration between words with consonantal sequences and those without, and is expected to be 0 when there is absolutely no variation in interval durations.
In cases where the underlying syllable co-ordination is one of c-centre (complex) alignment, the RSD of the c-centre-to-anchor interval is expected to be generally lower than that of the right-edge-to-anchor interval. In cases where the underlying syllable co-ordination is one of right-edge (simplex) alignment, the RSD of the right-edge-to-anchor interval is expected to be generally lower than that of the c-centre-to-anchor interval. However, as Shaw et al. (2009) point out through simulations, the latter expectation needs more nuance. The RSDs for c-centre-to-anchor interval stability can be lower than the RSDs for right-edge-to-anchor interval stability if there is sufficient variance in the durational mesarurements. Therefore, with sufficient variance in durations, a lower RSD value for c-centre-to-anchor interval stability becomes difficult to interpret. In contrast, if the RSD for the right-edge-to-anchor interval is generally lower in the results, then that can be interpreted as evidence in favour of right-edge (simplex) alignment.
It is important to note here that Shaw and Gafos ’s (2015) and Shaw et al. ’s (2009) result was a simulation with a very specific set of parametric estimates stemming from the observations in their dataset. As noted in their work and in Gafos et al. (2014), one needs parametric estimates from the relevant production study to identify the threshold/tipping point variance value. Therefore, we are not able to use an a priori threshold to see if the issue they raise is a problem for our results. The issue is further compounded by the phenomenon of poly-constituent shortening that we discussed above and return to in Section 7, which was not part of their simulations. To resolve this issue, we present our own simulations using parametric estimates obtained from the data presented in this manuscript in Section 8, and request the reader’s indulgence for now.
5.5 Results
All plotting and statistical modelling in this article were done using the programming language R (R Core Team 2021) within the Rstudio IDE (RStudio Team 2020). The plotting and data munging were done using the package tidyverse (Wickham 2017). For each experiment, we first visually inspected the results and then followed the visual inspection with linear mixed effects modelling with the packages lme4 (Bates et al. 2015) and lmerTest (Kuznetsova et al. 2017). Finally, the statistical models were converted to LATEX code using the package stargazer (Hlavac 2018). All the measurements used for analysis in this paper, along with the correponding Praat scripts, and scripts for the analyses and simulations presented are available at https://osf.io/3xcj4.
As can be seen in Figure 6, the overall RSDs for each pair of consonants are lower for the c-centre-to-anchor interval.

Overall Relative Standard Deviations (RSDs) for Experiment 1. Each boxplot/violin plot represents the RSDs calculated for each CC ∼ C pair (left of each facet = RSDs of c-centre-to-anchor durations, right of each facet = RSDs of right-edge-to-anchor durations). Included statistics are based on mixed-effects models discussed in the prose.
We followed up on the visual inspection of the overall RSDs with linear mixed-effects modelling. As mentioned above, the crucial dependent variable when looking at interval stability is the Relative Standard Deviations (RSD), and the independent variable considered was interval (c-centre-to-anchor, right-edge-to-anchor; baseline = c-centre-to-anchor). The random-effects structure included a random intercept of participant, word-pair, and nonce status.[9] We modelled each consonant pair case separately in order to see which of the two intervals was more stable. The modelling results are presented in Table 4.
Linear mixed-effects models for each consonant pair in Experiment 1 (reference: c-centre).
| Consonant pair | Position | Estimate | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|---|---|
| fl ∼ l | Word-initial | (Intercept) | 10.4 | 0.5 | 25.0 | 20.8 |
|
| right-edge | 3.4 | 0.6 | 109.0 | 5.3 |
|
||
| fl ∼ l | Word-medial | (Intercept) | 9.6 | 0.7 | 7.5 | 12.9 |
|
| right-edge | 2.0 | 0.8 | 47.0 | 2.4 | 0.02 | ||
| sl ∼ l | Word-medial | (Intercept) | 9.9 | 0.7 | 13.0 | 14.9 |
|
| right-edge | 0.9 | 0.5 | 109.0 | 1.7 | 0.09 |
As can be seen in Table 4, the right-edge-to-anchor interval had a higher RSD value for each consonant pair. In the case of /fl ∼ l/, there is a statistically clear difference between the RSD values of the c-centre-to-anchor interval and the right-edge-to-anchor interval.[10] In the case of the word-medial /sl ∼ l/, although the difference is in the same direction, the difference is statistically not clear.[11]
We then looked at the acoustic durations of the lateral consonant in both members of each pair. As a reminder, Prieto (2002) and Sotiropoulou et al. (2020) observe that the pre-vocalic consonant is shorter in the case of a complex onset alignment. The durations in Figure 7 suggest that there is a general shortening of the pre-vocalic consonant in words with a sequence of two consonants in the relevant position for all three cases.

Durations of the prevocalic consonants for the consonant pairs in Experiment 1. Included statistics are based on mixed-effects models discussed in the prose.
Again, we followed up the visual inspection with linear mixed-effects modelling. The dependent variable was the duration of the pre-vocalic consonant, and the independent variable considered was length of consonant sequence (Length-1, Length-2; baseline = Length-1). The random-effects structure included a random intercept of participant, and word-pair.[12] The modelling results are presented in Table 5. The results showed a statistically clear shortening of the pre-vocalic consonant in the consonant sequences for all three cases. Note further, that the unstandardised effect sizes in each of the cases observable in the Table (namely, −42.3 ms, −15.8 ms, −10.5 ms) are substantial, and are therefore likely to be practically significant.
Linear mixed-effects models for the duration of the pre-vocalic consonant for each consonant pair in Experiment 1 (reference: c-centre).
| Consonant pair | Position | Estimate (ms) | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|---|---|
| fl ∼ l | Word-initial | (Intercept) | 156.1 | 8.0 | 11.7 | 19.5 |
|
| Length-2 | −42.3 | 2.0 | 498.2 | −21.6 |
|
||
| fl ∼ l | Word-medial | (Intercept) | 94.8 | 4.7 | 16.8 | 20.0 |
|
| Length-2 | −15.8 | 1.6 | 258.0 | −9.8 |
|
||
| sl ∼ l | Word-medial | (Intercept) | 89.3 | 4.3 | 13.7 | 20.7 |
|
| Length-2 | −10.5 | 1.2 | 532.5 | −8.8 |
|
On the recommendation of a reviewer, we fitted a post-hoc model to all the fl ∼ l nonce word pairs, with length of consonant sequence and position and the interaction between them as fixed effects, and the same random effects structure as the above models (Table 6). There is a statistically clear interaction between length of consonant sequence and position, suggesting that the degree of C2 shortening observed in the word-initial case is larger than in the word-medial case.
Linear mixed-effects models for the duration of the pre-vocalic consonant for fl ∼ l nonce words in Experiment 1 (reference: c-centre, Word-medial).
| Estimate (ms) | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|
| (Intercept) | 95.0 | 6.4 | 20.1 | 14.8 |
|
| Length-2 | −16.2 | 2.4 | 511.0 | −6.7 |
|
| Position (Word-initial) | 59.8 | 5.4 | 511.1 | 11.1 |
|
| Length-2: Position (Word-initial) | −25.1 | 3.5 | 511.2 | −7.1 |
|
5.6 Discussion
The results of the current experiment suggest that the c-centre-to-anchor interval is more stable than the right-edge-to-anchor interval for both word-initial and word-medial /fl/ sequences. Furthermore, there is some statistically unclear evidence that the same is true for word-medial /sl/ sequences. Based on the linking hypotheses, the pattern of interval stability in turn would be interpreted as showing that both word-initial and word-medial /fl/ sequences are consistent with being complex onset clusters, while there is no statistically clear difference in the stability results for the word-medial /sl/ case.
The results from the analysis of the duration of the pre-vocalic consonant also would have a similar inference. In all three cases, the presence of a preceding consonant shortens the pre-vocalic consonant. The pattern of pre-vocalic consonant duration shortening is consistent with a pattern of complex onset clusters for all three cases, despite standard analyses of tautosyllabic versus heterosyllabic constituency.
One potential reason for the lack of a clear statistical difference with /sl/ when looking at the stability metrics is the general issue of annotating the boundary between /sl/. More specifically, there were cases of short acoustic silences between the /s/ and the /l/, which we interpreted as excrescent stops. While we consistently included them as part of the preceding fricative, one could argue that this is inappropriate. Our worry was that there is no “right” annotation scheme in such cases, and so we simply proceeded in a consistent fashion. This issue could however have led to noisy measurements of the relevant intervals for the /sl/ case, thereby affecting the effect size and the p-value calculations.
It is for this reason that we turn to word-medial /sm/ sequences in Experiment 2.
6 Experiment 2
In Experiment 1, we generally observed that the consonantal sequences considered generally appeared to show c-centre alignment, both word-medially and word-initially. However, the interval stability pattern had no clear statistically clear difference in the case of /sl/ sequences. As stated above, we conjectured that a part of the problem with the sequences was with annotating the boundaries between /sl/. For this reason, in this experiment, we turned to /sm ∼ m/ pairs. Additionally, given that word-medial effects in Experiment 1 were notably smaller than word-initially, in Experiment 2 we only examine word-medial cases.
6.1 Methods
6.1.1 Participants
Participants in Experiment 1, and only those participants, were invited to take part in Experiment 2. The participants were re-invited for this experiment through Prolific and directed to a survey on JotForm.[13] Of the original eleven subjects in Experiment 1, eight returned; therefore, Experiment 2 had eight participants. Participant label was retained; that is, Subject 1 in the first experiment is Subject 1 in this second one.
6.2 Materials
Stimuli consisted of real words, with word-medial /sm/ sequences and the singleton counterpart /m/, and penultimate stress. We used this sequence, instead of /sn/ due to availability of real word pairs. Table 7 presents the list of target words. A set of fillers of real words were also included to be used in a different study (see Table 13 in the appendix).
Experiment 2 Stimuli.
| Real word pair | IPA transcription | Gloss |
|---|---|---|
|
|
/ˈkos.mos/, /ˈko.mos/ | cosmos.masc.sg, how.masc.pl |
|
|
/ˈθis.mas/, /ˈθi.mas/ | schism.masc.pl, summit.masc.pl |
|
|
/ˈas.mas/, /ˈa.mas/ | asthma.masc.sg, love.2sg.ind |
|
|
/us.ˈme.as/, /u.ˈme.as/ | sniff.2sg.ind, smoke.2sg.ind |
|
|
/ˈmis.mas/, /ˈmi.mas/ | same.fem.pl, pamper.2sg.ind |
|
|
/ˈmis.mos/, /ˈmi.mos/ | same.masc.pl, cuddle.masc.pl |
6.2.1 Procedure
Participants read five repetitions of the stimuli in isolation, each its own pseudo-randomised list, presented one at a time. Each participant produced a total of 135 test items (27 stimuli x 5 repetitions), of which 60 were the test words analysed for this experiment (12 stimuli x 5 repetitions).
6.2.2 Measurements
Following Experiment 1, recordings were automatically forced-aligned with the Montreal Forced Aligner (McAuliffe et al. 2017) and manually corrected by both authors. For revising the automatic annotation, the same division of labour among authors was followed in this experiment as well (repetitions one, two and three corrected by one author, repetitions three, four and five by the other one, with repetition three serving as inter-annotator reliability). The annotation scheme of three tiers (word, segment and quality) used in Experiment 1 was adopted here as well. The same measurements as in Experiment 1 were taken for Experiment 2. Approximately 9.5 % of the data was removed due to poor quality. A total of 431 observations were submitted for analysis.
Much like in Experiment 1, the third repetition allowed to assess inter-annotator reliability. Within 5 ms of each other, annotators were in 51 % agreement; within 10 ms, 78 %; and within 15 ms, 90 %.
6.3 Results
As with all the consonant pairs in Experiment 1, it can be seen in Figure 8 that the RSDs for the word-medial /sm ∼ m/ pair are generally lower for the c-centre-to-anchor interval.

Overall Relative Standard Deviations for Experiment 2. Included statistics are based on mixed-effects models discussed in the prose.
Again, as with the data in Experiment 1, we followed up on the inspection of the overall RSDs with linear mixed-effects modelling, where the crucial dependent variable is the Relative Standard Deviations (RSD), and the independent variable considered was interval (c-centre-to-anchor, right-edge-to-anchor; baseline = c-centre-to-anchor). The random-effects structure included a random intercept of participant and word-pair.
The right-edge-to-anchor interval had a higher RSD than the c-centre-to-anchor interval for the word-medial /sm ∼ m/ pair, and the difference was statistically clear (see Table 8).
Linear mixed-effects models for the word-medial /sm ∼ m/ pair in Experiment 2.
| Estimate | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|
| (Intercept) | 8.4 | 0.7 | 11.7 | 12.7 |
|
| right-edge | 3.4 | 0.5 | 82 | 6.3 |
|
As with Experiment 1, we then looked at the acoustic duration of the consonant immediately before the crucial vowel. The durations in Figure 9 again suggest that there is a general shortening of the pre-vocalic consonant, i.e., /m/, in word-medial /sm/ cases.
![Figure 9:
Durations of the prevocalic consonants for the word-medial [sm ∼ m] pair in Experiment 2. Included statistics are based on mixed-effects models discussed in the prose.](/document/doi/10.1515/phon-2024-0004/asset/graphic/j_phon-2024-0004_fig_009.jpg)
Durations of the prevocalic consonants for the word-medial [sm ∼ m] pair in Experiment 2. Included statistics are based on mixed-effects models discussed in the prose.
We followed up the visual inspection with linear mixed-effects modelling of the pre-vocalic consonant as the dependent variable, and the Length of consonant sequence (Length-1, Length-2; baseline = Length-1) as the independent variable. The random-effects structure included a random intercept of participant, and word-pair. The results showed a statistically clear shortening of the pre-vocalic vowel in the consonant sequences for both cases (see Table 9). Note further that, as with Experiment 1, the unstandardised effect size (namely, −23.5 ms) is substantial, and is therefore likely to be practically significant.
Linear mixed-effects models for the duration of the pre-vocalic consonant for the word-medial [sm ∼ m] pair in Experiment 2 (reference: Length-1).
| Estimate (ms) | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|
| (Intercept) | 116.3 | 3.3 | 33.6 | 35.6 |
|
| Length-2 | −23.5 | 1.6 | 419.3 | −14.9 |
|
6.4 Discussion
The results of the current experiment not only replicate the interval stability patterns observed in Experiment 1, in that c-centre-to-anchor interval is more stable than right-edge-to-anchor interval for both the word-initial and word-medial sequences, they also show the same pattern of shortening of the pre-vocalic consonant /m/ in /sm/ sequences. At first blush, both observed patterns appear to be consistent with the temporal stability patterns of word-medial complex onsets in other languages.
If correct, our results are quite intriguing, as they stand in contrast to standard analyses of syllable structure in Spanish. However, as we point out, such an inference would be premature as the analyses haven’t controlled for potential confounds.
7 Post-hoc analyses reveal that the C-centre effect is confounded by poly-constituent shortening
In Experiments 1 and 2, /sm/ and /sl/ were observed to have a c-centre-to-anchor stability and a shortening of the C2 similar to complex onsets. This is superficially consistent with a c-centre-to-anchor organisation. As per standard accounts /sm/ and /sl/ are hetero-syllabic, and consequently, our findings are quite surprising given the traditional analysis of syllabic affiliation in Spanish. One possible way to interpret our results is that the standard analysis of Spanish syllables, particularly that for word-medial /sm/ and /sl/ cases, is wrong, and that indeed such word-medial sequences are tauto-syllabic and form complex onsets. However, such an inference would be rather hasty, in our opinion, if the experimental probe has not been sufficiently vetted against possible confounds. For this reason, we wanted to explore if there were potential confounds that could explain our findings.
One potential confound discussed earlier is poly-constituent shortening, which is observed in many languages. As words become longer, individual segments tend to shorten (Cuenca 1997; Farnetani and Kori 1986; Fowler 1981; Katz 2012; Marin and Pouplier 2010; Munhall et al. 1992). Thus, the shortening of C2 may simply be an artefact of a longer word (by one segment, in our case). If C2 in CC sequences shortens due to poly-constituent shortening (or other factors), that could make the right-edge-to-anchor interval appear less stable across words with and without consonant sequences, accounting for reduced interval stability. In order to test this possibility, we examined the duration of non-target segments in each experiment. If these segments show no changes in acoustic durations, this confound can be ruled out.
A second potential confound brought up earlier is that of domain-initial lengthening (Cho et al. 2003; Fougeron and Keating 1997). As mentioned earlier, our experimental stimulus design doesn’t allow us to observe domain-initial lengthening independent of the effect under study. However, we return to the issue in the conclusion (Section 9), where we suggest that the effect has a bearing on the full account of the facts.
7.1 Post-vocalic consonant durations
In this sub-section, we specifically look at the duration of the consonant following the crucial vowel (e.g., /t/ in
In the case of word-initial /fl ∼ l/ sequences, the post-vocalic consonant is standardly analyzed as part of the following syllable (e.g., /ˈfla.to/-/ˈla.to/). In contrast, for the word-medial /fl ∼ l/ and /sl ∼ l/ sequences in Experiment 1 and the word-medial /sm ∼ m/ sequences in Experiment 2, the post-vocalic consonant is standardly analysed as part of the same syllable as the pre-vocalic consonant (e.g., /ˈna.flas/-/na.las/, /ˈmus.los/-/ˈmu.los, and /ˈsis.mas/-/ˈsi.mas/, respectively).
A visual inspection of all the relevant cases suggested a consistent shortening of the post-vocalic consonant in all the cases we looked at (Figure 10).

Durations of the post-vocalic consonants for the relevant pairs in Experiments 1 and 2. Included p-values are based on mixed-effects models discussed in the prose.
A linear mixed-effects model was fitted for each of the cases with the number of consonants in the target consonant sequence (length) as the independent variable, and random intercepts of participants and word-pair. These models are shown in Table 10. Importantly, irrespective of syllabic affiliation of the target consonant, each case shows a shortening effect correlated with the presence of an additional consonant in the target sequence. This suggests that the pre-vocalic consonant shortening observed in the main experiments is confounded by poly-constituent shortening across the whole word, i.e., the pre-vocalic consonant shortening could have been from general poly-constituent shortening that applies beyond the syllable and not due to syllable-structure per se. Consequently, the c-centre-to-anchor stability pattern observed may also be confounded by the same finding.
Linear mixed-effects models for the duration of the word-initial consonant for the word-medial /fl ∼ l/ and /sl ∼ l/ pairs in Experiment 1, and the /sm ∼ m/ pair in Experiment 2 (reference: Length-1).
| Experiment | Estimate (ms) | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|---|
| Exp. 1 /fl ∼ l/ Word-initial | (Intercept) | 140.7 | 8.1 | 14.2 | 17.3 |
|
| Length | −8.9 | 1.9 | 493.4 | −4.6 |
|
|
| Exp. 1 /fl ∼ l/ Word-medial | (Intercept) | 181.0 | 11.6 | 12.3 | 15.6 |
|
| Length | −16.7 | 3.0 | 260.0 | −5.6 |
|
|
| Exp. 1 /sl ∼ l/ | (Intercept) | 181.6 | 10.7 | 11.1 | 17.0 |
|
| Length | −17.8 | 2.1 | 532.4 | −8.4 |
|
|
| Exp. 2 /sm ∼ m/ | (Intercept) | 182.4 | 15.4 | 8.0 | 11.9 |
|
| Length | −15.2 | 2.5 | 419.2 | −6.0 |
|
7.2 Word-initial consonant durations when looking at medial c-centre effects
In this post-hoc analysis, we looked at the word-initial consonant durations in the cases where the c-centre effect being probed was word-medial. For Experiment 1, we chose the word-pairs /naflas ∼ nalas/, /muslo ∼ mulo/; and, for Experiment 2, we examined the word-pairs /mismas ∼ mimas/, /mismos ∼ mimos/. We specifically chose these words as they begin with a nasal consonant, for which the annotation of the acoustic onset/offset is cleaner than the other word-initial consonants (e.g., mulos vs. telas) in our stimuli.
A visual inspection of all the relevant cases suggested a consistent shortening of the initial consonant in all the cases we looked at (Figure 11).

Durations of the initial consonants for the relevant pairs in Experiments 1 and 2. Included statistics are based on mixed-effects models discussed in the prose.
Table 11 shows the mixed effects model results for each of the pairs. There is a consistent shortening of the word-initial consonant in cases where there was an additional consonant word-medially. Crucially, this result cannot be attributed to syllabic structure, per se, as it is both outside the syllable under study and not even immediately adjacent to the C(C)V sequence under study. As with the previous post-hoc analysis, the results here suggest that the findings observed in the main experiments should be interpreted with caution, as they are consistent with a poly-constituent shortening effect from the presence of an additional segment in the word.
Linear mixed-effects models for the duration of the word-initial consonant for the word-medial /fl ∼ l/ and /sl ∼ l/ pairs in Experiment 1, and the /sm ∼ m/ pair in Experiment 2 (reference: Length-1).
| Experiment | Estimate (ms) | S.E. | df | t-value | Pr(
|
|
|---|---|---|---|---|---|---|
| Exp. 1 /fl ∼ l/ | (Intercept) | 127.5 | 11.5 | 25.8 | 11.1 | 0 |
| Length | −12.8 | 5.1 | 83.1 | −2.5 | 0.01 | |
| Exp. 1 /sl ∼ l/ | (Intercept) | 120.8 | 11.8 | 22.9 | 10.2 | 0 |
| Length | -13.3 | 5.0 | 77.2 | −2.7 | 0.01 | |
| Exp. 2 /sm ∼ m/ | (Intercept) | 122.8 | 10.5 | 18.9 | 11.7 | 0 |
| Length | −14.4 | 4.6 | 133.3 | −3.1 | 0.002 |
8 Model simulations of onset alignment with poly-constituent shortening
As pointed out by Shaw and Gafos (2015) and Shaw et al. (2009), RSD measurements need to be understood in a nuanced way. They show through simulations that: (a) with sufficient variation, the RSD for c-centre-to-anchor interval will be less than the RSD for right-edge-to-anchor interval, even if the underlying structure actually has right-edge alignment; (b) the RSD for c-centre-to-anchor interval will always be less than that of RSD for right-edge-to-anchor interval, if the underlying structure has c-centre alignment. That is, if there is sufficient variance in the duration measurements, a lower RSD value for c-centre-to-anchor interval is difficult to interpret as evidence that aligns with a specific underlying stability pattern (c-centre vs. right-edge alignment), but lower RSD value for right-edge-to-anchor interval is always interpretable as evidence that aligns with right-edge alignment. In short, with sufficient variation in durations, the RSD measure becomes an asymmetric evidentiary source.
However, Shaw and Gafos ’s (2015) and Shaw et al. ’s (2009)’s inferences were based on simulations with a very specific set of parametric values stemming from the observations in their dataset; consequently, it is not clear if the issue they observe in (a) above is also true for our data. Furthermore, given our results in the previous sections, the inclusion of poly-constituency is important to assess whether their inferences hold in this study as well.
To address the above two concerns, we undertook modelling and simulations that include poly-constituent shortening, based on parameter estimates from our own results. We took inspiration from Shaw and Gafos (2015) and Shaw et al. (2009), who modeled c-centre and right-edge alignment using estimates from articulatory data. But, since we used acoustic measurements, we had to modify their original model for our purposes. In order to estimate parameter values, we used average acoustic measurements from word-medial /fl ∼ l/ pairs.
In Figure 13, we show the specifics of the modeling. For modelling purposes, we follow Durvasula et al. (2021) in making the assumption that the consonantal acoustic intervals identified during annotation represent the achievement of the articulatory plateau (target to release).
We modelled the target achievement of the pre-vocalic consonant (C2 in the figure) as the zero point. To generate the release of C2, the average pre-vocalic (acoustic) consonant duration in VCV words was used as an estimate (
The vowel duration estimation was a bit more complicated as both the right-edge and c-centre alignment theories posit that a part of the vowel plateau is masked by the acoustics of consonants. Consequently, the acoustic onset of the vowel in our measurements cannot represent the true target achievement of the vowel. In order to estimate the vowel duration, we used the acoustic duration of the vowel in the VCV words, and then added half the average duration of the consonant in such words (since, the right-edge and c-centre point coincide in such words, this is appropriate). For right-edge alignment, this estimated vowel duration (d V = 192 ms) was added from the right-edge point along with some normally-distributed variation estimated as the standard deviation of the acoustic vowel durations (σ V = 32 ms). For c-centre alignment, the same estimated vowel duration (d V = 192 ms) was added from the c-centre point along with some normally-distributed variation estimated as the standard deviation of the acoustic vowel durations (σ V = 32 ms).
Beyond this, we also estimated poly-constituent shortening on the relevant segments based on the proportional change from VCV to VCCV words: pre-vocalic consonant proportional change (0.88), and vowel duration proportional change (0.8).
We first wanted to verify that we are able to reproduce the main insights in Shaw and Gafos (2015) and Shaw et al. (2009), even when we factor in poly-constituent shortening. In Figure 12, we vary the standard deviation of the vowel duration (σ V ) from 0 to 20 ms, in steps of 1. For each standard deviation value, we simulated a 1000 word pairs, and then calculated the RSD values over them.

Schematic representations of time stamp generation for c-centre alignment (left) and right-edge alignment (right). The x-axis in the figure represents time. The V marks the vowel, and C1-C2 represent word-initial consonants.
As can be seen in the figure, for right-edge alignment (right), after the vowel duration standard deviation goes past about 9, the RSD for c-centre-to-anchor interval is lower. In contrast, for c-centre alignment (left), the RSD for c-centre-to-anchor interval is always lower.
Given that the estimated standard deviation of the vowel plateau duration (σ V ) was about 32 ms in our data, the simulated results in Figure 13 already suggest that the RSD measurements for c-centre-to-anchor interval are expected to be lower than the RSD measurements for right-edge-to-anchor interval, irrespective of the underlying alignment for word-medial /fl/ sequences in Spanish. To further establish this fact, we simulated word-pairs with across word-pair variation that we observed in our own data. We estimated the standard deviations of the average duration of each segment in each word pair [pre-consonantal consonant (sd = 8 ms), pre-vocalic consonant (sd = 1.7 ms), and the vowel (sd = 7.4 ms)], and used these values to simulate a 1000 word-pairs separately for right-edge and c-centre alignments. These results are shown in Figure 14. As gleaned from the previous simulation, for our data, independent of the underlying organisation, the RSD for the c-centre-to-anchor interval duration is expected to be generally lower than the RSD for the right-edge-to-anchor interval duration.

Simulated Relative Standard Deviations values for c-centre and right-edge alignment with increasing standard deviation of vowel duration (replicating the simulation results in Shaw and Gafos (2015) and Shaw et al. (2009)).

Simulated Relative Standard Deviations values based on observed parameter values in our data.
To summarise, we found that for our data, the RSD for c-centre-to-anchor interval duration was lower; however, our modelling results suggest that, when we include an estimate of poly-constituent shortening, both complex and simplex onsets are expected to show generally lower RSD values for c-centre-to-anchor interval duration. The above results suggest that, even when we factor in poly-constituent shortening, RSD measurements are difficult to interpret when the value corresponding to c-centre-to-anchor interval duration is lower. Note, in contrast, when the RSD measurements are generally lower for the right-edge-to-anchor interval duration, then one can in fact infer a right-edge alignment (as is true in Moroccan and Jazani Arabic in prior research).
Above we say “difficult”, not impossible, since there is some information above to suggest that word-medial /fl/ sequences may have a simplex organisation. Note, in Figure 14, the simulated RSD values for both the c-centre-to-anchor and right-edge-to-anchor interval durations under a right-edge alignment scenario are in the same range as the observed values in our data. In contrast, for the c-centre alignment scenario, the simulated RSD values for right-edge-to-anchor interval duration are far above our observed values, suggesting that if the underlying organisation were indeed c-centre alignment for word-medial /fl/ sequences, then we would have observed far more overlap of the pre-vocalic consonant and the vowel (and subsequent shortening of right-edge-to-anchor interval duration).
9 Conclusions
In the current paper, we presented two experiments and explored two different types of measures to examine c-centre-to-anchor versus right-edge-to-anchor interval stability for word-initial and word-medial consonant sequences in Spanish: (a) interval stability measured in terms of RSDs, and (b) pre-vocalic consonant duration.
The findings in both experiments showed that the c-centre-to-anchor interval was more stable that right-edge-to-anchor for both word-initial and word-medial /fl/ and word-medial /sl/ and /sm/ sequences. Additionally, acoustic durations of the prevocalic consonants (that is, /l/ in /fl/ and /sl/ and /m/ in /sm/) exhibited shortening when part of a consonantal sequence. Before discussing the potential implications of these findings, it is worth noting that we have no a priori reasons to believe that the results pertain only to the dialect under study, namely, north-central Peninsular Spanish. There is no independent evidence that we are aware of that the observed patterns are part of a larger systematic sound change in this dialect area. This suggests to us that our findings are very likely generalisable to other dialects of Spanish.
At first blush, our findings raise intriguing implications regarding the traditional analysis of the syllabic affiliation of consonant sequences in Spanish. If we assume that there is a universal mapping of phonetic signatures to syllabic structure and that the proposed phonetic diagnostics allow us to infer the latter, our results suggest that word-medial consonant sequences (e.g., /sm sl/) that are traditionally assumed to be heterosyllabic may in fact be tautosyllabic. This interpretation is also suggested by Aldrich and Simonet (2019), who highlighted the possibility of such syllabic reanalysis in their analysis of vowel compression. Their investigation found no clear differences in the durations of initial vowels in stimuli containing various consonant sequences (pVpa, pVCpa, and pVCCpa) – i.e., the relevant coda consonants (C) and coda clusters (CC), as per standard analyses, did not result in shorter initial vowels, when compared to the initial vowel in an open syllable. This stands in contrast to observations in English by Katz (2012) and Munhall et al. (1992), where substantial vowel shortening occurs in the presence of a coda consonant. Moreover, it is crucial to acknowledge that much of the evidence regarding syllabic affiliation in Spanish stems from speaker intuitions, which have generally exhibited consistency. However, reliance solely on meta-linguistic tasks, such as speaker intuitions, may introduce confounding factors, particularly concerning word-edge and morpheme-edge judgments across languages. Furthermore, the reliance on speaker intuitions prompts consideration of the extent to which these judgments are informed by orthographic knowledge and other word-edge phonotactics as opposed to their knowledge of syllable structure per se. This distinction is particularly relevant for a language like Spanish, where orthographic representations, extensively emphasized in school for literacy purposes, may influence speaker intuitions about syllabic structure – to our knowledge, no study has systematically explored this potential confound.
Despite the logical possibility raised above, following the general advice of Shaw and Gafos (2015), we caution against interpreting our experimental results as evidence for a particular onset structure analysis, without precise quantitative modelling or appropriate controls. There are two main reasons for this. First, the prevocalic consonant (C2) shortening we observed could actually be from poly-constituent shortening, which extends beyond the syllable. As a reminder, we not only found that the target pre-vocalic consonant shortened in the presence of a preceding consonant (C1C2, which would be standardly taken as evidence of a complex syllable), but we also observed similar shortening outside the syllable. Therefore, the shortening cannot be attributed to syllable structure without more precise specification of the effects. Second, we showed with simulations, using estimated parameter values from our own data, that a lower RSD for the c-centre-to-anchor intervals is possible when the actual underlying temporal organisation is one of either c-centre alignment or right-edge alignment. right-edge alignment languages, under the right circumstances, can be observed to have more c-centre-to-anchor interval stability. This generally replicates the findings of Shaw and Gafos (2015) and Shaw et al. (2009), but extends them to show that the issue persists even when we factor in poly-constituent shortening in the simulations. Both the above reasons instead suggest that there is a need for quantitatively more precise theoretical understanding of the phenomenon of poly-constituent shortening in order to make progress on understanding the connection between syllable structure and temporal organisation. The challenge for future work is that poly-constituent shortening has been observed to be different across different consonant sequences in the same language (Katz 2012). Therefore, a more general statement of the effect is not possible given current knowledge; instead, appropriate controls need to be used with the experiment to estimate the effect of poly-constituent shortening as relevant to the specific stimuli used.
When we looked at prevocalic consonant (C2) shortening effects, there was one case where the effect size was larger than the rest, namely the word-initial /fl/ case. As can be seen in Table 5, the shortening effect on the /l/ in /fl/ sequences is about 42 ms – in post-hoc tests, this was statistically clearly larger than any of the other shortening effects. In contrast, the shortening effects observed in the post-hoc comparisons probing the effect of poly-constituent shortening (Table 10), which are all less than 18 ms. So, one could argue that while the word-medial cases may be due to poly-constituent shortening, the word-initial case constitutes some evidence of a complex onset related shortening effect. However, even here, caution is warranted. In the word-initial case, there is the possibility of another well-studied factor, namely, domain-initial lengthening. One could argue that the word-initial /l/ in words such as ⟨lato⟩ is subject to a domain-initial lengthening effect (Fougeron and Keating 1997). Consequently, the additional shortening of /l/ observed in the case of word-initial /fl/ could simply be the result of the /l/ being non-initial. As a result, the larger effect size of C2 shortening seen in the word-initial /fl/ context could simply be result of not adjusting for domain-initial lengthening and poly-constituent shortening. In short, all the observed patterns in our studies can be explained as an interaction between poly-constituent shortening and domain-initial lengthening, while maintaining that all the consonant sequences studied have right-edge alignment. Despite the observation of a c-centre-to-anchor stability, there is no need (and in fact some evidence against) the possibility of the consonant sequences having an underlying c-centre alignment.
An important observation stems from our results and the above discussion – poly-constituent shortening and domain-initial lengthening themselves appear to be language-specific (or minimally, in need of further study). While it is possible to account for some of the c-centre-to-anchor stability patterns as a result of not adjusting for poly-constituent shortening or domain-initial lengthening, crucially, it can’t be the case that such effects are there in all languages (at least, not within the same phonological domain). Crucially, as noted earlier, there are consonant sequences, in Moroccan Arabic, Jazani Arabic and Italian, where there is right-edge-to-anchor stability – i.e., the addition of a preceding consonant to a word-initial C2V sequence to create #C1C2V sequences does not result in any shortening of the pre-vocalic consonant (C2) or for that matter the vowel, which is the reason there is right-edge-to-anchor stability. If poly-constituent shortening and domain-initial lengthening played a role in these two languages as they do in Spanish, then the right-edge-to-anchor stability effect would not have been possible. In fact, Durvasula (2023) shows that there is no clear change in the pre-vocalic consonant duration due to the addition of consonants word-initially in Jazani Arabic. We point this issue out here because if indeed poly-constituent shortening and domain-initial lengthening have to be understood better in their own right, the quest to probe for consistent temporal stability patterns related to syllable structure is further complicated.
There are two important implications that stem from the discussion in the previous paragraphs. First, we have a new possibility for why such stability patterns have been inconsistently observed in specific languages and segmental contexts – it is possible that other phonetic factors (including poly-constituent shortening and domain-initial lengthening) have not been appropriately adjusted for. Second, and more importantly, it is possible to view all observations of c-centre alignment in the literature as an interaction of the language-specific effects of poly-constituent shortening and domain-initial lengthening; if so, it is possible that c-centre alignment is always simply a mirage that stems from such an interaction, and that right-edge alignment is the only alignment that is underlying present. This latter implication is a possibility that we think is particularly exciting given that it suggests uniform temporal stability pattern across languages, once other factors are controlled for – we leave to explore in future work.
-
Author contributions: The corresponding author Karthik Durvasula and co-author Silvina Bongiovanni are responsible for the conceptualization and design of the study, and carried out participant recruitment, data collection and analysis and writing of the manuscript. Both authors have contributed equally to this manuscript.
-
Competing interests: The authors have no conflicts of interest to declare.
-
Research ethics: The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board at Michigan State University (study 00005619). Informed consent was obtained from all subjects involved in the study.
Fillers, experiment 1
Fillers, experiment 1.
| Real word pair | Glosses | Nonce word pair |
|---|---|---|
|
|
milk.masc.pl, leyes.masc.pl |
|
|
|
male.masc.sg, May.masc.sg |
|
|
|
pity.fem.sg, comb.3sg |
|
|
|
reindeer.masc.sg, kingdom.masc.sg |
|
|
|
step.masc.sg |
|
|
|
well.masc.sg |
|
|
|
silo.masc.sg |
|
|
|
zeal.masc.sg |
|
Fillers, experiment 2.
| Real word pair | Glosses |
|---|---|
|
|
married.masc.sg, hunted.masc.sg |
|
|
do.IMP.2sg, have.2sg |
|
|
graze.3sg, pink.sg |
|
|
mug.fem.sg, rate.fem.sg |
|
|
seat.masc.sg, relinguish.3sg |
|
|
located.masc.sg, make an appointment.1sg |
|
|
temple.masc.sg, one hundred.sg |
|
|
voice.fem.sg, you.pro.2sg |
|
|
riverbed.masc.sg, cause.subj.1sg |
|
|
recently.adj.sg, resent.3sg |
|
|
close.3sg, mountain.fem.sg |
|
|
towards.prep, Asia |
|
|
time.fem.sg, see.2sg |
|
|
strain.fem.sg, know.subj.3sg |
|
|
one hundred.sg, feel.1sg |
Fillers, experiment 2
References
Aldrich, Alexander C. & Miquel Simonet. 2019. Duration of syllable nuclei in Spanish. Studies in Hispanic and Lusophone Linguistics 12(2). 247–280. https://doi.org/10.1515/shll-2019-2012.Search in Google Scholar
Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar
Boersma, Paul & David Weenink. 2023. Praat: Doing phonetics by computer [computer program] Version 6.3.09, retrieved 2 March 2023 from http://www.praat.org/.Search in Google Scholar
Bolozky, Shmuel. 1997. Israeli Hebrew phonology. In A. S. Kaye (ed.), Phonologies of Asia and Africa (including the Caucasus). Vol. 1, 287–311. Winona Lake, Ind.: Eisenbrauns.Search in Google Scholar
Bradley, Travis G. 2006. Spanish complex onsets and the phonetics-phonology interface. In Fernando Martinez Gil & Sonia Colina (eds.), Optimality-theoretic studies in Spanish phonology, 15–38. Amsterdam: John Benjamins.10.1075/la.99.02braSearch in Google Scholar
Browman, Catherine P. & Louis H. Goldstein. 1988. Some notes on syllable structure in articulatory phonology. Phonetica 45. 140–155. https://doi.org/10.1159/000261823.Search in Google Scholar
Brunner, Jana, Christian Geng, Stavroula Sotiropoulou, & Adamantios I. Gafos. 2014. Timing of German onset and word boundary clusters. Laboratory Phonology 5. 403–454. https://doi.org/10.1515/lp-2014-0014.Search in Google Scholar
Cho, Taehong, Patricia Keating, Cécile Fougeron & Chai-shune Hsu. 2003. Domain-initial strengthening in four languages. In Laboratory phonology VI: Phonetic interpretation, 145–163. Cambridge: Cambridge University Press.Search in Google Scholar
Colantoni, Laura & Jeffrey Steele. 2005. Phonetically-driven epenthesis asymmetries in French and Spanish obstruent-liquid clusters. In R. Gess & E. Rubin (eds.), Experimental and theoretical approaches to romance linguistics, 77–96. Amsterdam; Philadelphia: John Benjamins Pub. Co.10.1075/cilt.272.06colSearch in Google Scholar
Colina, Sonia. 2009. Spanish phonology: A syllabic perspective. Washington, DC: Georgetown University Press.Search in Google Scholar
Colina, Sonia. 2012. Syllable structure. In Erin O’Rourke, José Ignacio Hualde & Antxon Olarrea (eds.), The handbook of Hispanic linguistics, 133–151. Wiley Online Library.10.1002/9781118228098.ch7Search in Google Scholar
Colina, Sonia. 2016. On onset clusters in Spanish: Voiced obstruent underspecification and/f. In Rafael A. Nuñez Cedeño (ed.), The syllable and stress: Studies in honor of James W. Harris, 107–137. Boston: Mouton de Gruyter.10.1515/9781614515975-006Search in Google Scholar
Cuenca, María Heliodora. 1997. Análisis instrumental de la duración de las vocales en español. Philologia hispalensis 11(1). 295–307.10.12795/PH.19961997.v11.i01.20Search in Google Scholar
Davis, Stuart Michael. 1987. Italian onset structure and the distribution of “il” and “lo”. In Proceedings of the annual Eastern States conference on linguistics, 64–74. Columbus, OH: State University.Search in Google Scholar
Dell, François. 1995. Consonant clusters and phonological syllables in French. Lingua 95(1). 5–26. url: http://www.sciencedirect.com/science/article/pii/0024384195900993.10.1016/0024-3841(95)90099-3Search in Google Scholar
Durvasula, Karthik. 2023. A simple acoustic measure of onset complexity. In Radek Skarnitzl & Jan Volín (eds.), Proceedings of the 20th international Congress of phonetic, sciences (ICPhS 2023), 2010–2013. Prague, The Czech Republic: Guarant International url: https://drive.google.com/file/d/15U2l2y4_-9lyZAgmiccQYXYj9zBi_CAu/view.Search in Google Scholar
Durvasula, Karthik, Mohammed Qasem Ruthan, Sarah Heidenreich & Yen-Hwei Lin. 2021. Probing syllable structure through acoustic measurements: Case studies on American English and Jazani Arabic. Phonology 38(2). 173–202. https://doi.org/10.1017/S0952675721000142.Search in Google Scholar
Dushoff, Jonathan, Morgan P. Kain & Benjamin M. Bolker. 2019. I can see clearly now: Reinterpreting statistical significance. Methods in Ecology and Evolution 10(6). 756–759. https://doi.org/10.1111/2041-210x.13159.Search in Google Scholar
Farnetani, Edda & Shiro Kori. 1986. Effects of syllable and word structure on segmental durations in spoken Italian. Speech Communication 5(1). 17–34. https://doi.org/10.1016/0167-6393(86)90027-0.Search in Google Scholar
Fougeron, Cécile & Patricia Keating. 1997. Articulatory strengthening at edges of prosodic domains. Journal of the Acoustic Society of America 101(6). 3728–3740. https://doi.org/10.1121/1.418332.Search in Google Scholar
Fowler, Carol A. 1981. A relationship between coarticulation and compensatory shortening. Phonetica 38(1-3). 35–50. https://doi.org/10.1159/000260013.Search in Google Scholar
Franke, Mona, Philip Hoole & Simone Falk. 2023. Temporal organization of syllables in paced and unpaced speech in children and adolescents who stutter. Journal of Fluency Disorders 76. 105975. https://doi.org/10.1016/j.jfludis.2023.105975.Search in Google Scholar
Gafos, Adamantios I., Simon Charlow, Jason A. Shaw & Philip Hoole. 2014. Stochastic time analysis of syllable-referential intervals and simplex onsets. Journal of Phonetics 44. 152–166. https://doi.org/10.1016/j.wocn.2013.11.007.Search in Google Scholar
Gelman, Andrew & John Carlin. 2014. Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science 9(6). 641–651. https://doi.org/10.1177/1745691614551642.Search in Google Scholar
Gili y Gaya, Samuel. 1921. La r simple en la pronunciación espanola [The tap in Spanish pronunciation]. Revista de Filologıa Espanola 8. 271–280.Search in Google Scholar
Goldstein, Louis H., Ioana Chitoran & Elisabeth Selkirk. 2007. Syllable structure as coupled oscilator models: Evidence from Georgian vs. Tashlhiyt Berber. In Proceedings of the 16th international Congress of phonetic sciences, 241–244. Saarbrücken: Saarland University.Search in Google Scholar
Harris, James Wesley. 1983. Syllable structure and stress in Spanish. A nonlinear analysis. Cambridge, MA: MIT Press.Search in Google Scholar
Hermes, Anne, Doris Mücke & Martine Grice. 2013. Gestural coordination of Italian word-initial clusters: The case of ’impure s. Phonology 30(1). 1–25. https://doi.org/10.1017/S095267571300002X.Search in Google Scholar
Hermes, Anne, Doris Mücke & Bastian Auris. 2017. The variability of syllable patterns in Tashlhiyt Berber and Polish. Journal of Phonetics 64. 127–144. https://doi.org/10.1016/j.wocn.2017.05.004.Search in Google Scholar
Hlavac, Marek. 2018. stargazer: Well-Formatted Regression and summary statistics tables. R package version 5.2.2. Bratislava, Slovakia: Central European Labour Studies Institute (CELSI) url: https://CRAN.R-project.org/package=stargazer.Search in Google Scholar
Hualde, José Ignacio. 1991. On Spanish syllabification. Current studies in Spanish linguistics 475(493). 182–198.Search in Google Scholar
Hualde, José Ignacio. 2005. The sounds of Spanish. Cambridge, UK: Cambridge University Press.Search in Google Scholar
Katz, Jonah. 2012. Compression effects in English. Journal of Phonetics 40(3). 390–402. https://doi.org/10.1016/j.wocn.2012.02.004.Search in Google Scholar
Kuznetsova, Alexandra, Per B. Brockhoff & Rune H. B. Christensen. 2017. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82(13). 1–26. https://doi.org/10.18637/jss.v082.i13.Search in Google Scholar
Marchini, Gilly & Michael Ramsammy. 2022a. Dialect-specific acoustic correlates of stress in Spanish: The role of vowel compression and syllable structure. University of Pennsylvania Working Papers in Linguistics 28(1). 12.Search in Google Scholar
Marchini, Gilly & Michael Ramsammy. 2022b. Vowel compression in altiplateau Mexican Spanish. Isogloss. Open Journal of Romance Linguistics 8(4). 1–27. https://doi.org/10.5565/rev/isogloss.171.Search in Google Scholar
Marin, Stefania & Marianne Pouplier. 2010. Temporal organization of complex onsets and codas in American English: Testing the predictions of a gestural coupling model. Motor Control 14(3). 380–407. https://doi.org/10.1123/mcj.14.3.380.Search in Google Scholar
Marin, Stefania & Marianne Pouplier. 2014. Articulatory synergies in the temporal organization of liquid clusters in Romanian. Journal of Phonetics 42. 24–36. https://doi.org/10.1016/j.wocn.2013.11.001.Search in Google Scholar
Martínez-Gil, Fernando. 2001. Sonority as a primitive phonological feature. In K. Zagona, J. Herschensohn & E. Mallen (eds.), Features and interfaces in romance: Essays in honor of heles Contreras, 203–222. Amsterdam & Philadelphia: John Benjamins.10.1075/cilt.222.14marSearch in Google Scholar
McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner & Sonderegger Morgan. 2017. Montreal forced aligner: Trainable text-speech alignment using kaldi. In 18th Annual Conference of the International Speech Communication Association (Interspeech 2017), 498–502. Stockholm, Sweden: International Speech Communication (ISCA).10.21437/Interspeech.2017-1386Search in Google Scholar
Morales-Front, Alfonso. 2018. The Spanish syllable. In Kimberly L. Geeslin (ed.), The Cambridge handbook of Spanish linguistics, 190–210. Cambridge, UK: Cambridge University Press.10.1017/9781316779194.010Search in Google Scholar
Mücke, Doris, Anne Hermes & Sam Tilsen. 2020. Incongruencies between phonological theory and phonetic measurement. Phonology 37(1). 133–170. https://doi.org/10.1017/S0952675720000068.Search in Google Scholar
Munhall, Kevin, Carol Fowler, Sarah Hawkins & Elliot Saltzman. 1992. “Compensatory shortening” in monosyllables of spoken English. Journal of Phonetics 20(2). 225–239. https://doi.org/10.1016/s0095-4470(19)30624-2.Search in Google Scholar
Pouplier, Marianne. 2012. The gestural approach to syllable structure: Universal, language- and cluster-specific aspects. In Susanne Fuchs, Melanie Weirich, Daniel Pape & Pascal Perrier (eds.), Speech planning and dynamics, 63–96. Frankfurt am Main: Peter Lang.Search in Google Scholar
Prieto, Monica. 2002. Acoustic correlates of the syllable: Evidence from Spanish. Urbana-Champaign, IL: University of Illinois at Urbana-Champaign PhD thesis.Search in Google Scholar
R Core Team. 2021. R: A Language and Environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing url: http://www.R-project.org.Search in Google Scholar
Real Academia Española. 2011. Nueva gramática de la lengua española. Vol 3: Fonética y fonología. Madrid: Espasa Calpe.Search in Google Scholar
RStudio Team. 2020. RStudio: Integrated development Environment for R. RStudio. Boston, MA: PBC url: http://www.rstudio.com/.Search in Google Scholar
Saporta, Sol & Heles Contreras. 1962. A phonological Grammar of Spanish. Seattle: University of Washington Press.Search in Google Scholar
Selkirk, Elliot & Karthik Durvasula. 2013. Acoustic correlates of consonant gesture timing in English. San Francisco, USA: Paper presented at the 166th Meeting of the Acoustical Society of America.10.1121/1.4831423Search in Google Scholar
Shaw, Jason A. & Adamantios I. Gafos. 2015. Stochastic time models of syllable structure. PLoS One 10(5). e0124714. https://doi.org/10.1371/journal.pone.0124714.Search in Google Scholar
Shaw, Jason A., Adamantios I. Gafos, Philip Hoole & Chakir Zeroual. 2009. Syllabification in Moroccan Arabic: Evidence from patterns of temporal stability in articulation. Phonology 26(1). 187–215. https://doi.org/10.1017/S0952675709001754.Search in Google Scholar
Shaw, Jason A., Adamantios I. Gafos, Philip Hoole & Chakir Zeroual. 2011. Dynamic invariance in the phonetic expression of syllable structure: A case study of Moroccan Arabic consonant clusters. Phonology 28(3). 455–490. https://doi.org/10.1017/s0952675711000224.Search in Google Scholar
Sotiropoulou, Stavroula, Mark Gibson & Adamantios I. Gafos. 2020. Global organization in Spanish onsets. Journal of Phonetics 82. 100995. url: http://www.sciencedirect.com/science/article/pii/S0095447020300863.10.1016/j.wocn.2020.100995Search in Google Scholar
Tilsen, Sam, Draga Zec, Christina Bjorndahl, Becky Butler, Marie-Josee L’Esperance, Alison Fisher, Linda Heimisdottir, Margaret Renwick & Chelsea Sanker. 2012. A cross-linguistic investigation of articulatory coordination in word-initial consonant clusters. Cornell Working Papers in Phonetics and Phonology 2012. 51–81.Search in Google Scholar
Wickham, Hadley. 2017. tidyverse: Easily install and load the ‘tidyverse’. R package version 1.2.1. url: https://CRAN.R-project.org/package=tidyverse.10.32614/CRAN.package.tidyverseSearch in Google Scholar
Wiese, Richard. 1996. The phonology of German. Oxford: Clarendon.Search in Google Scholar
© 2025 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.