The ubiquity of word-internal pauses

Ludger Paschen

doi:10.1515/lingty-2024-0093

Article Open Access

The ubiquity of word-internal pauses

Ludger Paschen

Published/Copyright: December 19, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Linguistic Typology

Abstract

This paper presents a cross-linguistic survey of 738 word-internal pauses in 44 languages from DoReCo, a multilingual corpus of spontaneous speech. Word-internal pauses are attested in a variety of constructions and may occur at any morphological juncture, but are particularly frequent in polysynthetic verb forms and around prefixes and proclitics. Around one fifth of all word-internal pauses in the sample occur within minimal morphological forms rather than at morphological junctures. Word-internal pauses are sometimes ambiguous with regard to the domains they delineate and do not necessarily align with phonological word boundaries. The ubiquity of word-internal pauses and the fact that they can disrupt minimal morphological forms suggest a high degree of flexibility in how natural speech is produced, and calls into question the status of pausability as a reliable criterion for wordhood.

Keywords: pause; morphology; phonetics; suffixing preference; endangered languages

1 Introduction

Two of the most basic and universal building blocks of human language are words and pauses. While there is considerable debate about the concept of “wordhood” (Aikhenvald et al. 2020b; Haspelmath 2011; Tallman and Auderset 2023), words and pauses are largely considered mutually exclusive: pauses divide the stream of speech into smaller chunks that often correspond to syntactic clauses of various sizes, while words are typically not interruptible by pauses. There appear to be only two scenarios in which word-internal pausing is acceptable. First, in disfluent speech, speakers may pause briefly after a portion of a word and then begin the word anew, a phenomenon referred to as “false start”, “repair”, or “reset”. Second, in some (polysynthetic) languages, speakers may occasionally insert a pause at certain morph junctures without rendering the interrupted word form ungrammatical.

The latter observation has been made independently for several languages of northern Australia. Evans et al. (2008) report that “intraword” pauses occur in around 4 % of polysynthetic verbs in Dalabon (ngal1292, Gunwinyguan). These pauses only occur after certain prefixes, and their placement obeys moraic and syllabic minimality restrictions for both the detached and the remaining units. Evans et al. (2008) note that these pauses are on average shorter than other pauses and that verb forms can be interrupted by more than one word-internal pause at the same time. Example (1) illustrates a sequence of three detached prefixes in a complex verb form from Dalabon (Ponsonnet 2024), with the word-internal pause indicated as <<wip>>.

(1)

Dalabon (Ponsonnet 2024: doreco_ngal1292_20100723_001_MT)
balahlng `<<wip>>` worhdi
bala-	h-	lng-	`<<wip>>`	worhdi
3pl-	r-	seq-	`<<wip>>`	stand:pr
‘they’re (all) standing there’					(see (3) for full example)

Bundgaard-Nielsen and Baker (2020) report on results from an acceptability experiment with speakers of Wubuy (nung1290), another polysynthetic Gunwinyguan language. In the study, speakers were found to rate word-internal pauses as more acceptable when they coincided with morpheme boundaries than when they occurred within minimal morphological forms. Baker (2018) emphasizes the difficulty of distinguishing words from phrases in Gunwinyguan languages and cites the pausability criterion as one of the reasons why the distinction between words and phrases is complicated for this group of languages. Mansfield (2023) discusses word-internal pausing in the context of delineating phonological words in three polysynthetic northern Australian languages: Bininj Gun-wok (gunw1252, Gunwinyguan), Ngalakgan (ngal1293, Gunwinyguan), and Murrinhpatha (murr1259, Southern Daly). The argument put forward in Mansfield (2023) is that pausability within the verbal complex implies that polysynthetic words map to prosodic phrases, which in turn comprise several prosodic words that can be separated by pauses.

The notion of word-internal pauses obviously depends on cross-linguistically valid concepts of wordhood and pauses. The latter, although it comes with its own complications, can be operationalized in acoustic terms and identified in natural speech recordings with relative ease (see below). The former, however, remains a debated concept that directly relates to the broader question of how clearly morphology and syntax can be separated. Although most linguists have intuitions about what counts as a word, reaching a consensus on how they should be defined has proven difficult (Wray 2015). A (non-exhaustive) list of criteria circulating for wordhood is discussed in Haspelmath (2011) and includes potential pauses, free occurrence, mobility, uninterruptibility, non-selectivity, non-coordinatability, anaphoric islandhood, nonextractability, morphophonological idiosyncrasies, and deviations from bi-uniqueness. Yet, as Haspelmath (2011) argues, neither individual criteria nor their combination provide a satisfactory definition of wordhood. Similarly, in a quantitative study of eight languages, Tallman and Auderset (2023) reach rather pessimistic conclusions regarding the possibility of a cross-linguistically robust division between morphology and syntax, and thus of a generally valid definition of wordhood. By contrast, Dixon and Aikhenvald (2003) and Aikhenvald et al. (2020b) maintain that wordhood is a useful comparative concept, diagnosable on phonological, morphological, syntactic, and semantic grounds, whereby the criteria and their weightings may differ across languages.

An important refinement is the distinction between phonological and grammatical words (Aikhenvald et al. 2020b; Hildebrandt 2015). Phonological words (or prosodic words) are defined as minimally pronounceable units that delineate a domain in which certain segmental rules (such as assimilation) or suprasegmental processes (such as stress assignment) apply (Hall 1999; Hildebrandt 2015; Nespor and Vogel 2007). Grammatical words, by contrast, are defined in purely morphosyntactic terms. For example, Aikhenvald et al. (2020b) describe grammatical words as units occupying a position between morphemes and phrases with their own lexical bases and conventionalized meanings. While Aikhenvald et al. (2020b) state that phonological and grammatical words coincide “[i]n every language […] in most cases” (p. 7), there is ample evidence that mismatches are, in fact, common (Downing 1999; Elkins 2020; Harris 2002). The very notion of distinguishing between phonological and grammatical words has also been questioned. Haspelmath (2023) proposes that words should be defined solely on the ground of grammatical terms such as “morph”, “clitic” or “root” (p. 285), without recourse to prosodic criteria. Similarly, Tallman (2020) challenges the typological usefulness of the dichotomy between grammatical and phonological words.

There is no universal agreement about the status of pausability as a criterion for wordhood, but there is a certain consensus that pause insertion crucially relies on the distinction between grammatical and phonological words. Dixon and Aikhenvald (2003) argue that pausability is a necessary but not a sufficient criterion for wordhood, and that pausing is possible within grammatical but not phonological words (see van Gijn and Zúñiga 2014; Aikhenvald et al. 2020b for similar arguments). Dixon and Aikhenvald (2003) write that speakers may insert pauses at morph boundaries, as in the (hypothetical) English example un (.) suitable (Dixon and Aikhenvald 2003: 11). They further hypothesize that the likelihood of encountering word-internal pauses increases with the degree of synthesis and the overall length of a word. Haspelmath (2011), on the other hand, explicitly rejects pauses as a necessary or sufficient criterion for wordhood. Haspelmath (2011: 38f) states that “linguists cannot easily ask speakers about their intuitions concerning pauses”, a claim that probably needs to be revised in light of studies such as Bundgaard-Nielsen and Baker (2020) and Nakamoto (2024) which demonstrate that speakers do have intuitions about the acceptability of pauses and fillers. Tallman (2020) and Haspelmath (2023) do not mention pauses in their discussions of wordhood.

The idea that pauses are a helpful diagnostic for phonological wordhood has been echoed in numerous case studies (Aikhenvald et al. (2020a)), including Allison (2020) for Makary Kotoko (mpad1242, Afro-Asiatic), Jarkey (2020) for Japanese (nucl1643, Japonic), or Enfield (2020) for Lao (laoo1244, Tai-Kadai). White (2020) claims that in Hmong (hmon1333, Hmong-Mien), “pauses do not surface inside phonological words, even in running speech where false starts and pauses for planning often occur, and even for the most hesitant speakers” (p. 234). A similar situation is described by Woodbury (2024) for Cup’ik (hoop1234, Eskimo-Aleut), where speakers tend to start the word again rather than continuing where they left off when a pause or speech error occurs within a word. In contrast, repairs at preverb boundaries appear to be fairly common in Cree (cree1272, Algic), occuring at about the frequency as between two independent words (Russell 1999). A language where pause insertion appears to be more flexible is Chamacoco (cham1315, Zamucoan), where certain enclitics can be detached from their host by pauses (Ciucci 2020).

Both grammatical and phonological wordhood are complex notions that can only be established through detailed analysis of a language’s morphosyntax and phonology. Consequently, cross-linguistic surveys of wordhood are heavily constrained by the available data. This limitation is particularly acute for spontaneous speech data, where systematic documentation exists for only a minority of the world’s languages. In this context, language documentation data is instrumental in advancing cross-linguistic studies into actual language use and corpus-based typology (Schnell et al. 2022). This study will use data from the DoReCo corpus (Seifart et al. 2024), a resource with annotated speech data from more than 50 languages originating from language documentation projects (see Section 2). Crucially, DoReCo contains time-aligned annotations of words, morphs, and word-internal pauses. Consequently, the unit of study will be the orthographic word: a unit separated by spaces from other words in transcribed texts. Transcriptions in DoReCo usually follow practical orthographies, which are often developed in collaboration with the speaker community (Cahill 2018; Lüpke 2011).^[1] In an ideal scenario, these orthographies should reflect the innate knowledge of speakers about what a word is, but in reality, speakers rarely converge on a clear intuition of wordhood (Wray 2015). Also, writing conventions established by non-native-speaker linguists may directly disagree with speaker intuitions (Russell 1999). Orthographic words may thus align more closely with grammatical words in some languages than others. However, there is no real alternative to investigating units transcribed as orthographic words and their relation to the occurrence of pauses if data from DoReCo is to be used.

Pauses are an integral part of human communication. Both the occurrence and duration of pauses are linked to prosodic and syntactic structure (Andreeva et al. 2020; Krivokapić 2007; Peck and Becker 2024; Ross 2011). Pauses are also recognized as multifaceted pragmatic devices (Koshik 2005; Sacks 1989). Research into disfluent speech has revealed that speakers show considerable variation in where they insert silent pauses and filler particles (such as uhm), sometimes referred to as “filled pauses”, into an utterance (Belz 2023; Gósy 2023). Despite the attention given to word-external pauses in the literature, and the fact that pausability is regularly discussed with relation to wordhood, the cross-linguistic distribution of word-internal pauses is still poorly understood.

Pauses have been argued to play a crucial role in understanding the suffixing preference, i.e. the cross-linguistic tendency for bound morphs to appear after their hosts. Himmelmann (2014) argues that pauses tend to occur more frequently after function words than before them, making grammaticalization as suffixes more likely. Another strand of research highlights a psycholinguistic account, arguing that the suffixing preference reflects the preferred order of elements in language processing (Asao 2015; Berg 2020; Cutler et al. 1985), though such an explanation has also been met with some skepticism (Harris and Samuel 2024). The placement of word-internal pauses might reveal further insights into the suffixing preference. If cross-linguistically, word-internal pauses occur predominantly after prefixes (as in the Gunwinyguan languages cited above), this would add another piece to the puzzle, linking the suffixing preference to a prosodically looser connection between preposed morphs and their hosts.

Zingler (2022) discusses various criteria for defining words, clitics, affixes, and other dependent morph types. The author argues that clitics are best defined in terms of their syntactic distribution and prosodic dependence. He adds that another potential parameter of interest in relation to the latter criterion is pausability, but acknowledges the difficulty of obtaining naturalistic speech data from a broad enough set of languages to find evidence for word-internal pauses. Zingler (2022: 18) writes:

Hence, the inclusion of “pausability” as a parameter of wordhood would apparently create yet another class of elements that are neither prototypical words nor prototypical affixes. With regard to this potential parameter, however, it has to be concluded that there are simply not enough spoken corpora of underdocumented languages to perform a meaningful cross-linguistic assessment of the notion of “pausability.”

While it is true that to date, languages for which annotated speech data is available are vastly outnumbered by those without such resources, recent years have seen considerable advancements in the areas of language documentation and making fieldwork data from low-resourced languages usable for cross-linguistic studies (Babinski et al. 2019; Chodroff et al. 2025; Schnell et al. 2022).^[2] Based on the DoReCo corpus, the present paper aims to offer a cross-linguistic overview of word-internal pauses as they occur in natural speech data. The survey will highlight the ubiquity and diversity of word-internal pauses and the contexts in which they appear. The study is mostly explorative, but it hopes to inform ongoing debates about wordhood, the contiguity of phonological words, and the suffixing preference.

Word-internal pauses have been referred to as “intraword pauses” and “within-word pauses” in the literature, and I will use these terms interchangeably, even though I will mostly refer to them as “word-internal pauses”. I will distinguish two formal types of word-internal pauses. The first, disruptive pauses, occur between two morphs, e.g. between a prefix and a root (1). This type of pause has been previously discussed for languages such as Dalabon and Wubuy. A more extreme case is presented by pauses that I refer to as ultra-disruptive (2). These pauses occur within a morph, splitting it into two parts that have no independent meaning on their own.

(2)

Svan (Gippert 2024: doreco_svan1243_svv35_02)
la `<<wip>>` šxs
la	`<<wip>>`	šx	-s
Lashketi₁	`<<wip>>`	Lashketi₂	-dat
‘in Lashketi’				(see (20) for full example)

The following sections introduce the corpus and the methodology (Section 2), illustrate word-internal pauses with a variety of examples from the corpus (Section 3), present quantitative analyses (Section 4), and offer some further discussion (Section 5) and a conclusion (Section 6).

2 Methods and data

Data for this study come from the DoReCo corpus version 2.0 (Seifart et al. 2024). DoReCo 2.0 contains transcribed and annotated speech data from 53 languages covering a broad range of populations from all inhabited continents. DoReCo datasets include time-aligned transcriptions of spontaneous speech, predominantly personal and traditional narratives, with most datasets comprising around 10,000 word tokens. The majority of datasets also include time-aligned tokenization at the morph level with accompanying glosses and part-of-speech tags. The DoReCo workflow involves a number of manual and automatic processing steps (see Paschen et al. 2020 for details), as well as manual labeling of silent pauses, disfluencies (including filled pauses and false starts), code-switching, and other discourse events. The corpus distinguishes two types of silent pauses: “regular” speech pauses occurring before and after orthographic words, and word-internal pauses occurring within orthographic words. Speech pauses were first detected by the WebMAUS forced aligner (Kisler et al. 2012) and then manually verified by the DoReCo team, including resizing, adding or removing pause intervals when they were not correctly identified by the forced aligner. The criteria for pausehood were a visible stretch of non-verbalization in the acoustic signal and an auditory impression of prosodic discontinuity. Word-internal pauses were manually annotated by the DoReCo team, again based on acoustic and auditory criteria. There is no lower or upper threshold for pause duration in DoReCo; however, short stretches of silence occurring naturally during the closure phase of stop consonants are not labeled as pauses in the corpus. Word-internal pauses are labeled <<wip>> in DoReCo, and this abbreviation will also be used occasionally throughout this paper. Note that word-internal pauses, as they are annotated in DoReCo, only refer to silent pauses. Hesitations or fillers accompanying word-internal pauses, if any, are annotated separately (see Section 4.7). Detachment by intervening lexical or clausal material is not considered in this study.

Of the 53 datasets in DoReCo 2.0, 44 (83 %) contain at least one <<wip>> label. There were a total of 738 <<wip>> annotated across those 44 languages. These 44 languages, coming from 25 genealogical families, are listed in Table 1, with their approximate geographic locations shown in Figure 1. Since not all word units with a <<wip>> label were fully annotated, missing glosses for the respective words were added manually, using available published resources and consulting with language experts when necessary.^[3] Part-of-speech tags were standardized to represent three broad categories: nouns, verbs, and other parts of speech. For the quantitative analyses in Section 4, data was processed with R 4.4.0 (R Core Team 2024) using RStudio (RStudio Team 2024).^[4]

Table 1:

Languages with at least one occurrence of a word-internal pause in DoReCo 2.0, with glottocodes and top-level families from Glottolog (Hammarström et al. 2024), and total number of word tokens in the core datasets.

	Language	Glottocode	Family	Words	Source
1.	Anal	anal1239	Sino-Tibetan	13,190	Ozerov (2024)
2.	Arapaho	arap1274	Algic	4,612	Cowell (2024)
3.	Asimjeeg Datooga	tsim1256	Nilotic	8,782	Griscom (2024)
4.	Baïnounk Gubëeher	bain1259	Atlantic-Congo	11,423	Cobbinah (2024)
5.	Beja	beja1238	Afro-Asiatic	15,427	Vanhove (2024)
6.	Bora	bora1263	Boran	8,364	Seifart (2024a)
7.	Cabécar	cabe1245	Chibchan	10,613	Quesada et al. (2024)
8.	Cashinahua	cash1254	Pano-Tacanan	9,655	Reiter (2024)
9.	Daakie	port1286	Austronesian	11,880	Krifka (2024)
10.	Dalabon	ngal1292	Gunwinyguan	3,340	Ponsonnet (2024)
11.	Dolgan	dolg1241	Turkic	8,775	Däbritz et al. (2024)
12.	English (S. England)	sout3282	Indo-European	8,914	Schiborr (2024)
13.	Evenki	even1259	Tungusic	8,310	Kazakevich and Klyachko (2024)
14.	Fanbyak	orko1234	Austronesian	10,162	Franjieh (2024)
15.	Gorwaa	goro1270	Afro-Asiatic	10,667	Harvey (2024)
16.	Gurindji	guri1247	Pama-Nyungan	5,639	Meakins (2024)
17.	Hoocąk	hoch1243	Siouan	7,325	Hartmann (2024)
18.	Jahai	jeha1242	Austroasiatic	7,504	Burenhult (2024)
19.	Jejuan	jeju1234	Koreanic	7,270	Kim (2024)
20.	Kakabe	kaka1265	Mande	9,528	Vydrina (2024)
21.	Kamas	kama1351	Uralic	10,047	Gusev et al. (2024)
22.	Light Warlpiri	ligh1234	(Mixed)	8,669	O’Shannessy (2024a)
23.	Lower Sorbian	lowe1385	Indo-European	10,238	Bartels and Szczepański (2024)
24.	Mojeño Trinitario	trin1278	Arawakan	7,746	Rose (2024)
25.	Movima	movi1243	(Isolate)	10,180	Haude (2024)
26.	Nafsan	sout2856	Austronesian	10,232	Thieberger (2024)
27.	Nisvai	nisv1234	Austronesian	11,858	Aznar (2024)
28.	Northern Alta	nort2875	Austronesian	8,553	Garcia-Laguia (2024)
29.	Pnar	pnar1238	Austroasiatic	8,502	Ring (2024)
30.	Resígaro	resi1247	Arawakan	7,979	Seifart (2024b)
31.	Ruuli	ruul1235	Atlantic-Congo	8,243	Witzlack-Makarevich et al. (2024)
32.	Sadu	sadu1234	Sino-Tibetan	11,752	Xu and Bai (2024)
33.	Sanzhi Dargwa	sanz1248	Nakh-Dagh.	5,140	Forker and Schiborr (2024)
34.	Savosavo	savo1255	(Isolate)	11,378	Wegener (2024)
35.	Svan	svan1243	Kartvelian	9,883	Gippert (2024)
36.	Tabaq (Karko)	kark1256	Nubian	9,268	Hellwig et al. (2024)
37.	Tabasaran	taba1259	Nakh-Dagh.	5,057	Bogomolova et al. (2024)
38.	Teop	teop1238	Austronesian	12,134	Mosel (2024)
39.	Totoli	toto1304	Austronesian	9,755	Bardají et al. (2024)
40.	Urum	urum1249	Turkic	10,032	Skopeteas et al. (2024)
41.	Warlpiri	warl1254	Pama-Nyungan	7,117	O’Shannessy (2024b)
42.	Yali	apah1238	Nuclear TNG	7,476	Riesberg (2024)
43.	Yongning Na	yong1270	Sino-Tibetan	7,479	Michaud (2024)
44.	Yurakaré	yura1255	(Isolate)	20,236	Gipper and Ballivián Torrico (2024)

Figure 1:

Map showing the location of the 44 languages included in the sample. Image created with Cartopy (Met Office 2023) and Matplotlib (Hunter 2007).

3 A cross-linguistic survey of word-internal pauses

This section presents concrete examples of word-internal pauses to illustrate the diversity of constructions in which word-internal pauses occur. The first subsection discusses disruptive word-internal pauses appearing before or after affixes, roots, and clitics. The second subsection deals with ultra-disruptive pauses, which are overall smaller in number, making up ca. 20 % of the sample (see Section 4). The third subsection considers cases which are ambiguous between a morph-internal and a morph-external analysis.

3.1 Disruptive word-internal pauses

Disruptive word-internal pauses are defined as those pauses that occur between two morphs within a word. In the 44-language sample, disruptive pauses are attested in the vicinity of prefixes, suffixes, infixes, proclitics, enclitics, roots, and within reduplicated forms. In other words, the sample includes cases of detachment for all broad morph types that are typically coded according to standard glossing conventions.^[5] Disruptive pauses may appear at the boundary between elements of the same morphological type (e.g. between two prefixes) or between elements of different types (e.g. a prefix and a root). In the remainder of this section, several examples of word-internal pauses occurring around affix, clitic and root elements will be shown. In addition, cases of pauses occurring with non-concatenative morphology (reduplication, infixation) and in code-mixing contexts will also be elucidated. The examples were selected to cover a wide range of languages and morphological systems, illustrating the diversity of word-internal pausing.

3.1.1 Prefix detachment

Prefix detachment has been described for a number of languages from northern Australia, including Dalabon (Evans et al. 2008). The DoReCo corpus contains an annotated Dalabon dataset (Ponsonnet 2024), and, unsurprisingly, numerous examples of detached prefixes can be found in the corpus. Two representative examples are given in (3) and (4), with the respective spectrograms shown in Figures 2 and 3.^[6] In (3), a pause of 280 ms appears before the verbal stem worhdi ‘stand:pr’, separating it from the prefixal domain which in this case spans three prefixes. As can be seen from Figure 2, the pause interval is silent and its duration is slightly shorter than either of the two word chunks. Examples like (3) are abundant in the corpus, and mirror previous descriptions of Dalabon in the literature.^[7]

Figure 2:

A pause separating three prefixes and a verbal stem in Dalabon.

Figure 3:

Multiple pauses within a complex verb form in Dalabon.

(3)

Dalabon (Ponsonnet 2024: doreco_ngal1292_20100723_001_MT)
mak barraboni kanidjah [balahlng `<<wip>>` worhdi]
bala-	h-	lng-	`<<wip>>`	worhdi
3pl-	r-	seq-	`<<wip>>`	stand:pr
‘them two didn’t keep going, they’re (all) standing there’

Grammatical words may, in some cases, be interrupted by multiple pauses. This was reported by Baker (2002) for Ngalakgan and by Evans et al. (2008) for Dalabon and can also be observed for various languages in DoReCo. Example (4) illustrates how the prefixal domain in Dalabon can be detached from a complex stem, and that stem can in turn be split at another morph juncture. In this case, the incorporated noun kodj ‘head’ is split from the verb root and the past perfective suffix. The durations of the two <<wip>> are relatively long, with around 700 ms and 500 ms each (cf. Figure 3).

(4)

Dalabon (Ponsonnet 2024: doreco_ngal1292_20110614_007_LB)
njing yibung dahnaHnan [djah `<<wip>>` kodj `<<wip>>` boledminj]
dja-	h-	`<<wip>>`	kodj	`<<wip>>`	boled	-minj
2sg-	r-	`<<wip>>`	head	`<<wip>>`	turn_around	-pp
‘you’re looking at him and you’ve turned your head around’

Dalabon is not the only language where prefixal detachment is frequently observed. Another language where prefixes are regularly separated from the rest of polysynthetic verb forms is Arapaho. In (5), a 300 ms pause appears after the prefix ne’- ‘then-’, separating the prefixal domain from a reduplicated verb stem (Figure 4).

Figure 4:

A pause separating two prefixes and a verbal stem in Arapaho.

(5)

Arapaho (Cowell 2024: doreco_arap1274_32b)
tih’etniixoo’utei’i, [heetne’ `<<wip>>` ceece’einou’u]
heet-	ne’-	`<<wip>>`	cee∼	ce’ein	-ou’u
fut-	then-	`<<wip>>`	red∼	put_in_bag	-3pl
‘once they are dried out, then they will put them in containers’

Almost all of the disruptive pauses in Arapaho occur after prefixes, and many involve the past imperfective marker nih’ii-. A typical example is given in (6), where a relatively long pause of 980 ms separates nih’ii- from the rest of the word (Figure 5). The label <<fs>hi> indicates a false start with the segmental string [hi].

Figure 5:

A pause separating the prefix nih’ii- and a verbal stem in Arapaho.

(6)

Arapaho (Cowell 2024: doreco_arap1274_92e)
teecxo’ `<<fs>hi>` nuhu’ beh’eihoho’, betebihoho’, [nih’ii `<<wip>>` niistoneiθi’]
hooo
nih’ii-	`<<wip>>`	niiston	-iθi’
pst.ipfv-	`<<wip>>`	make_for	-3pl/1s
‘A long time ago the old men and old women, they made a bed for them.’

The final example of prefixal detachment comes from Lower Sorbian. Like other Slavic languages, Lower Sorbian has an extensive system of verbal prefixes with a semantic range from compositional to highly lexicalized meanings. In example (7), two inflected verb forms are disrupted by a silent pause after the perfective prefix wu- (Figure 6). The two short phrases, with their similar pauses and meanings, create a parallel structure that can best be interpreted as an emphatic rhetorical device, echoing the abrupt motion of a fishing rod being swiftly retracted. The Lower Sorbian DoReCo dataset attests a total of eight word-internal pauses, all occurring after verbal prefixes. The detachable nature of verbal prefixes in Lower Sorbian may in part be influenced by German, the major contact language, where detachment (and movement) of prefixes is highly systematic. The role of code-mixing with regards to the occurrence of <<wip>> will be discussed in more detail below.

Figure 6:

Two instances of a pause separating the perfective prefix wu- from participle stems in Lower Sorbian.

(7)

Lower Sorbian (Bartels and Szczepański 2024: doreco_lowe1385_MEW-126-20130830)
a wšykne ryby su se, jo su [wu `<<wip>>` łojone abo wu `<<wip>>` wuźone]
wu-	`<<wip>>`	łojone	abo	wu-	`<<wip>>`	wuźone
pfv-	`<<wip>>`	catch:ptcp	or	pfv-	`<<wip>>`	fish:ptcp
‘And all the fish were caught or fished out.’

3.1.2 Proclitic detachment

Prefixes are not the only preposed morph type that can be prosodically detached. Pausing between a proclitic and its host is frequently observed in the DoReCo corpus, especially in Jahai and Pnar, two languages from the Austroasiatic phylum. Jahai has an extensive system of proclitics that can be hosted by verbs, noun phrases, and other constituents. Certain proclitics such as ba= ‘goal=’, k= ‘loc=’, and ya= ‘irr=’ occur detached from their hosts numerous times in the sample. Out of the 127 word-internal pauses attested for Jahai in DoReCo 2.0, all but three occur at a clitic boundary. The 124 pauses after proclitics in Jahai account for around one sixth of all word-internal pauses in the sample. Example (8) shows the source proclitic can= separated from its nominal host by a word-internal pause of 320 ms (Figure 7).

Figure 7:

A word-internal pause after the proclitic can= in Jahai.

(8)

Jahai (Burenhult 2024: doreco_jeha1242_NarrLandscape2)
ʔoʔ sət [can `<<wip>>` cabaŋ] wɔŋ ʔoʔ, habis
can=	`<<wip>>`	cabaŋ
source=	`<<wip>>`	branch
‘it has already dried up from its small branch, it’s gone’

Given the abundance of proclitic detachment in Jahai, it appears reasonable to ask whether these elements should be reanalyzed as prepositions or particles. Burenhult (2005) argues that next to their promiscuity, the two main reasons to consider them clitics rather than words are (i) their inability to bear stress, and (ii) their non-adherence to minimum word size, which in Jahai is CVC. For can=, the only proclitic that constitutes a CVC syllable and would thus obey minimality, Burenhult rejects assigning wordhood on phonetic grounds: “The one clitic that does represent a heavy syllable with a phonemic nucleus and which might therefore be considered a possible word on canonical grounds – the prepositional proclitic allomorph /can=/ ‘source’ […] – does not qualify as a word as it cannot receive stress and does not behave phonetically in a word-like manner, as its final nasal segment is realised phonetically as a simple nasal [-n] and not as the prestopped allophone [-^dn] typical of word-final position […].” (Burenhult 2005: 66).

Curiously, proclitics with no underlying vowels can also be detached in Jahai, in which case the epenthetic vowel that would surface in the non-split word form surfaces in the clitic and may even be lengthened.^[8] Consider example (9), where the identification clitic l= is detached from its nominal host.^[9] The epenthetic vowel [ə], which is expected to be inserted in the first (unstressed) syllable of a full sesquisyllabic word form, surfaces despite the prosodic detachment of the clitic (Figure 8).^[10]

Figure 8:

A word-internal pause after the proclitic l= in Jahai.

(9)

Jahai (Burenhult 2024: doreco_jeha1242_NarrLandscape2)
tə̃h [ lə= `<<wip>>` hɛ̃ɲ] ʔoʔ ʔə̃h, tə̃h l=ʔnteŋ ʔoʔ
l=	`<<wip>>`	hɛ̃ɲ
id=	`<<wip>>`	tooth
‘this is its teeth, this is its ears’

A similar situation is presented in Pnar, a Khasi-Palaung language of India and Bangladesh. In Pnar, most <<wip>> occur between a gender proclitic and a nominal host. Ring (2014) cites lack of stressability and obligatoriness as the main diagnostics for clitichood in Pnar. The example in (10) showcases a word-internal pause between a neuter gender clitic and a ‘non-visible’ deictic. At 1020 ms, this pause is exceptionally long, and the detached part is not prosodically integrated into the rest of the utterance (Figure 9).

Figure 9:

Detachment of a proclitic from a demonstrative in Pnar.

(10)

Pnar (Ring 2024: doreco_pnar1238_04_Areal_History)
tæ chna kichnong kibru chnong chna yung ki ya [i= `<<wip>>` tæ] æm itu iyung i khien wan
i=	`<<wip>>`	tæ
n=	`<<wip>>`	nvis
‘the village people made a home for them, a small home there to stay’

3.1.3 Suffix detachment

The examples shown so far have illustrated pausability following prefixes and proclitics. However, pausability before suffixes is also attested in the 44-language sample. Consider example (11) from Warlpiri. Warlpiri is a non-polysynthetic Pama-Nyungan (Desert Nyungic) language spoken in the Northern Territory of Australia. The crucial difference between Dalabon and Warlpiri is that most of the pauses in Warlpiri occur between a suffix and a nominal stem, rather than within the prefix domain or between a prefix and a verbal stem. In (11), the derivational suffix -wita ‘small’ is separated from the nominal stem kurdu ‘child’ by a silent pause of 260 ms (Figure 10). wita can also appear as an independent noun meaning ‘child’ (Hale 1995), suggesting it has only recently been grammaticalized. The final vowel of kurdu is considerably lengthened compared to the vowel in the initial syllable, indicating a major prosodic break accompanied by final lengthening. Most word-internal pauses in Warlpiri occur between a nominal stem and an inflectional suffix such as -rla ‘-loc’ or -pala ‘-du’, or between a nominal stem and a derivational suffix such as -wita ‘-small’ or -wiri ‘-big’.

Figure 10:

A pause separating a derivational suffix from a nominal stem in Warlpiri.

(11)

Warlpiri (O’Shannessy 2024b: doreco_warl1254_2010ERGstoryNWA04)
yanurnu [kurdu `<<wip>>` -wita]
kurdu	`<<wip>>`	-wita
child	`<<wip>>`	-small
‘the little child came’

Disruptive pauses before inflectional endings such as case suffixes are attested for several languages in the corpus. In example (12) from Urum, a pause of 250 ms intervenes between two nominal suffixes, creating a short gap between the plural suffix -lar and the instrumental/comitative case suffix -(I)nAn/-(I)nIn (Figure 11). It stands to reason that pausing at this juncture is facilitated by the fact that the remaining nominal element turklar constitutes a well-formed word in Urum (nominatives are not overtly marked in Urum), and the sentence could in theory have been continued with turklar as the subject; with the short pause, the speaker likely bought themself time to decide which role the first noun form should have in the sentence. This suggests that word-internal pauses can and do occur before suffixes that are attached to forms which constitute fully-fledged grammatical words on their own.^[11]

Figure 11:

A pause separating an inflectional suffix from a suffixed noun in Urum.

(12)

Urum (Skopeteas et al. 2024: doreco_urum1249_UUM-TXT-AN-00000-B12)
orda [turklar `<<wip>>` ınan] bašlandi dögüš
turk	-lar	`<<wip>>`	-ınan
Turk	-pl	`<<wip>>`	-instr
‘Then war began with Turkey.’

3.1.4 Enclitic detachment

Let us now consider a case of enclitic detachment, which is a fairly rare phenomenon in the DoReCo corpus (see Section 4). In Hoocąk, /r/ has a predictable allophone [n] after nasal vowels within the same word (13ab). This rule only applies when /r/ is adjacent to a nasal vowel and no segments intervene (13c).

(13)

xawanį	=ra	→	xawanįna	[Hartmann 2024]
be_lost	=nmlz

cąąnį	=regi	→	cąąnįnegi
autumn	=sim/loc

wąąk	=ra	→	wąąkra
man	=nmlz

In the Hoocąk DoReCo dataset (Hartmann 2024), there are cases of a word-internal pause occurring between a suffix or a clitic starting with /r/ and a stem ending with a nasalized vowel. In these cases, despite the pause, the rhotic systematically surfaces as a nasal. One example is presented in (14), where the nominalizer =ra is realized as =na (Figure 12).

Figure 12:

A word-internal pause fails to block nasalization in Hoocąk.

(14)

Hoocąk (Hartmann 2024: doreco_hoch1243_ED_05)
hegų eegi hąąke ųųsge wažą [wažą `<<wip>>` na] roohąxjį hikoroho waši waši yaakikoroho
wažą	`<<wip>>`	=na ( < =ra)
something	`<<wip>>`	=nmlz
‘there are a lot of things… the way I dressed for dancing…’

What makes the behavior of nasalization in Hoocąk stand out is that the phonological rule appears to apply across a word-internal pause, which is generally assumed to correspond to a prosodic word boundary. This raises the question of how strongly word-internal pauses and prosodic words are correlated. This issue will be taken up again in Section 5 below.

3.1.5 Root detachment

The previous examples have illustrated several pauses between affixes or clitics and lexical roots. However, disruptive pauses may also occur between two adjacent roots. Potential grammatical contexts for disrupted root-root sequences are compounds, serial verbs, and incorporation constructions. One example of an incorporation construction involving a <<wip>> was shown in (4) above.

An example of a pause occuring between two elements of a nominal compound from Dalabon is shown in (15) and Figure 13. The disrupted unit consists of two nominal elements and a comitative suffix, the literal translation being ‘with tears in [her] eyes’. The nominal root mumu ‘eyes’ is further part of several other expressions conveying a range of emotions, e.g. mumu-bruk (eyes-dry) ‘be serious’ and mumu-kol (eyes-pretend) ‘seduce, flirt’ (Ponsonnet 2014: 348). In example (15), the element mumu ‘eye’ has a similar duration as the pause, while the second element malu ‘tears’ is slightly reduced.

Figure 13:

A pause occurring within a nominal compound in Dalabon.

(15)

Dalabon (Ponsonnet 2024: doreco_ngal1292_20110518a_002_QB)
kahdjalngdjudminj [mumu `<<wip>>` malu-dorrungh] kahyurdminj nahda
mumu	`<<wip>>`	malu	-dorrungh
eyes	`<<wip>>`	tears	-com
‘she loses it, starts crying, and runs away’

3.1.6 Non-concatenative morphology: reduplication and infixation

Apart from concatenative morphology, which involves the addition of segments to the left or right of a stem without any changes to the stem itself, many languages make use of non-concatenative processes such as mutation, reduplication, and infixation as exponents of morphological categories (Bermúdez-Otero 2012). In (5) above, we saw an example of a <<wip>> separating two prefixes from a reduplicated stem in Arapaho. Pauses may, however, also intervene directly between a reduplicant and a base. One such case from Yurakaré is shown in (16) and in Figure 14. Yurakaré has a process of CV(CV)-reduplication expressing intensification (Van Gijn 2006: 53–58). The pause between the base and the reduplicant has a duration of 270 ms, and the reduplicant is clearly identifiable as a separate prosodic domain due to the presence of frication noise after the final reduplicant vowel, which is an optional process at the end of phonological phrases in Yurakaré. This frication noise is visible in the spectrogram and has also been indicated orthographically by an extra <j> in the transcription (adyijadyindyi) by the corpus creators (Gipper and Ballivián Torrico 2024).

Figure 14:

A pause separating base and reduplicant in Yurakaré.

(16)

Yurakaré (Gipper and Ballivián Torrico 2024: doreco_yura1255_SocCog-YUZ104-1)
[adyij `<<wip>>` adyindyi] tütü nish isapattu tütü lacha
adyi∼	`<<wip>>`	adyindyi	-ø
ints∼	`<<wip>>`	sad	-3sg.sbj
‘He is very sad. He also doesn’t have any shoes.’

Another intricate case of disruptive pauses interacting with non-concatenative morphology is presented in (17) from Movima (Figure 15). Movima has various processes of reduplication used for nominalization, marking of inalienability, and other functions (Haude 2006: 84–95). Depending on the function, reduplication in Movima can be prefixing, suffixing, or infixing. For example, the reduplicative morpheme expressing inalienability is usually suffixing, but may also be infixed into certain stem forms (see Yu 2007 for further discussion of affixes with variable placement). In (17), the stem kamay ‘scream’ is augmented by a middle voice (md) marker, which always appears before the final syllable of the stem it attaches to, leading to infixation in polysyllabic stems. The form kama:may constitutes a phonological word domain, as evidenced by lengthening of the vowel in the penultimate syllable, which is one of the diagnostics for wordhood in Movima (Haude 2006: 47).^[12] A pause of 220 ms occurs between the infixed reduplicant and the second part of the stem. The stem is thus, in a sense, interrupted twice: first, by the (regular process of) infixation, and second, by the insertion of a word-internal pause following the infix.

Figure 15:

A pause separating a reduplicated infix from its base in Movima.

(17)

Movima (Haude 2024: doreco_movi1243_HRR_120808_tigregente)
[kama: `<<wip>>` may]
ka	`<ma:`∼`>`	`<<wip>>`	may
scream₁	<md∼	`<<wip>>`	scream₂
‘[the calf] screamed’

A notable feature of (16) and (17) is that in these reduplicated word forms, the pause does not disrupt a sequence of otherwise unrelated elements, but a pair of closely connected morphs: without reference to the base, the reduplicant cannot exist. In this context, it is worth pointing out that Bundgaard-Nielsen and Baker (2020) mention they observed one speaker of Wubuy sporadically insert pauses between the two elements of pseudo-reduplicated forms, i.e. forms where neither copy can occur on its own. The fact that both productive and inherent reduplication allow pause insertion between base and reduplicant suggests that the reduplicated forms in (16) and (17) may be represented as whole units in the mental lexicon (see Testelets and Lander 2017 for a related argument), and invites further research into the processing of different types of reduplication.

3.1.7 Code-mixing contexts

Multilingualism is not unusual across the world’s speech communities, and at least half of the global population is estimated to be multilingual (Tucker 1999). Code-mixing and code-switching are fairly common with multilingual speaker communities and may occur at the phrasal, lexical, and also the morphological level (Meakins et al. 2019). Unsurprisingly, morphological boundaries between two elements from different source languages are possible loci for word-internal pauses, and there are indeed 55 pauses separating two morphs from different source languages attested in the 44-language sample. One such example comes from Evenki, a Tungusic language exposed to extensive contact with Russian. The disruptive pause in (18) occurs between the Russian noun zoow’ətt’ehnikum ‘veterinary technician school’ and the Evenki accusative case marker -mə. The 420 ms pause disrupts a sequence of consonants that would otherwise have coalesced into a geminate consonant (Figure 16).

Figure 16:

A word-internal pause in a code-mixing context in Evenki.

(18)

Evenki (Kazakevich and Klyachko 2024: doreco_even1259_2008_Kislokan_Udygir_Valentina_LAv)
[zoow’ətt’ehnikum `<<wip>>` mə] manam
zoow’ətt’ehnikum	`<<wip>>`	-mə
veterinary_technician_school(ru)	`<<wip>>`	-acc(ev)
‘[I] graduated from the veterinary technician school.’

Another example of a pause interrupting a word with mixed content comes from Gurindji, a Pama-Nyungan language of Australia. Language mixing is extremely common in Gurindji, even in the non-Kriol variety (Meakins and McConvell 2021). In (19), the English noun grave is integrated into a sentence that otherwise contains only Gurindji grammar and lexical items. The English noun grave hosts a Gurindji topic clitic =ma. Figure 17 shows that the clitic is separated from the noun by a 200 ms pause.

Figure 17:

A word-internal pause in a code-mixing context in Gurindji.

(19)

Gurindji (Meakins 2024: doreco_guri1247_FM14_a206)
yalangkarni ngulu wuyarni na [grave `<<wip>>` =ma] yalangkarni
grave	`<<wip>>`	=ma
grave(en)	`<<wip>>`	=top
‘They placed her in a grave right there then.’

3.2 Ultra-disruptive word-internal pauses

While the majority of word-internal pauses coincide with a morphological boundary, there is a considerable number of word-internal pauses that split a minimal morph into two smaller parts that do not have a meaning on their own. These ultra-disruptive pauses are less systematic than disruptive pauses and there appear to be no constraints on their position or the shape of the split parts. Ultra-disruptive pauses seem to mostly occur in highly disfluent stretches of speech. One example of an ultra-disruptive pause comes from Svan. Svan allows highly complex consonant clusters in the coda, but syllables in Svan must contain a vocalic nucleus (Tuite 1998). In example (20), an ultra-disruptive <<wip>> of 300 ms appears in the middle of the nominal root lašx, separating the initial CV sequence from the rest of the root and the following dative suffix -s. The resulting triconsonantal sequence šxs is not a well-formed phonological word in Svan (Figure 18). This pause may be related to the speaker contemplating the choice between multiple conflicting case endings, similar to the Urum example in (12).

Figure 18:

An ultra-disruptive pause in Svan. Note the creakiness in the final part of the vowel, and a glottal pulse occurring early into the pause interval.

(20)

Svan (Gippert 2024: doreco_svan1243_svv35_02)
xûizge, uḳûe ešṭusḳûa zä li, [la `<<wip>>` šxs]
la	`<<wip>>`	šx	-s
Lashketi₁	`<<wip>>`	Lashketi₂	-dat
‘It is already 16 years, I live in Lashketi.’

Another ultra-disruptive pause is presented in example (21) from Sanzhi Dargwa. While this pause does not create an illicit syllable structure, its placement is nonetheless remarkable: the pause occurs before the final segment of χurejg ‘food’, separating a single consonant /g/ (and the following suffixes) from the rest of the nominal root. At 550 ms, this pause is relatively long (Figure 19), creating an auditory impression of non-cohesiveness between the fragments before and after the pause.

Figure 19:

An ultra-disruptive pause in Sanzhi Dargwa.

(21)

Sanzhi Dargwa (Forker and Schiborr 2024: doreco_sanz1248_sanzhi_bazhuk)
c’il rik’ulcar dam arillalij [χurej `<<wip>>` glij] wiq:anda
χ urej	`<<wip>>`	g	-li	-j
food₁	`<<wip>>`	food₂	-obl	-dat
“‘Then,” she said, “I will carry him as food for myself during the day.”’

3.3 Borderline cases

The word-internal pauses discussed so far could with relative certainty be identified as occurring within grammatical words, and classified as either disruptive or ultra-disruptive. However, as with all categorical distinctions, there is a group of cases that do not lend themselves to a straightforward classification. For example, in Pnar, the clitichood of some elements glossed as proclitics is not always as clear as for the class of gender proclitics discussed above in example (10). Locative proclitics in particular appear to have a borderline status between clitics and prepositions, as Ring (2014: 1206) admits: “It is not clear whether these morphemes are better identified as clitics – like pronominals they tend to lose stress when followed directly by a nominal, though occasionally they appear as stressed in my data. Here I gloss them as clitics, but native speakers may or may not consider them separate words.” In the Pnar DoReCo dataset (Ring 2024), only two detached locative elements are attested, out of a total of 43 detached elements which are mostly gender and number proclitics.

A concrete example of an ambiguous pause regarding the classification as disruptive or ultra-disruptive comes from Daakie, an Austronesian language of Vanuatu (22). The word-internal pause in question spans 240 ms and separates pyang and bisi in pyangbisi, a verb meaning ‘watch, observe, look attentively; study’ (Krifka 2017: 169). It is not clear whether pyangbisi can be further decomposed into smaller meaningful elements. The dictionary in Krifka (2017) also contains entries for both pyang and bisi individually. The former is translated as ‘fire’ or ‘feast’, while -bisi is described as a suffix meaning ‘hold, keep’. It is entirely possible that a compositional meaning ‘to keep a fire’ from the combination of pyang and -bisi has evolved into a meaning ‘to observe’ via semantic change, as one aspect of keeping a fire is to watch it attentively and for an extended period of time. However, whether pyangbisi is actually related to pyang or -bisi is unknown (Manfred Krifka pers. comm.). It can therefore not be said with certainty whether labeling this pause as a <<wip>> is justified. For the purpose of the quantitative analysis in Section 4, I have opted to treat it as ultra-disruptive, based on the fact that the lemma pyangbisi is listed in Krifka (2017), but there remains a degree of ambiguity with cases like this.

(22)

Daakie (Krifka 2024: doreco_port1286_09-03-11_Andri_2)
kolom du kolom [pyang `<<wip>>` bisi] too kiye van kolom lehe vih soo
pyang	`<<wip>>`	bisi	/	pyang	`<<wip>>`	-bisi
watch₁	`<<wip>>`	watch₂	/	fire	`<<wip>>`	-keep
‘They kept checking out the garden, they saw a banana tree.’

4 Quantitative analysis

This section offers some statistical insights on word-internal pauses as they appear in the corpus. Despite the fact that the sample is not typologically balanced, the quantitative analyses do reveal a number of interesting cross-linguistic trends.

4.1 Disruptive and ultra-disruptive pauses

Table 2 gives an overview of how many disruptive and ultra-disruptive pauses occur in the sample, and indicates the percentage of word-internal pauses among all silent pauses for each language (“wip ratio”). In total, there were 738 <<wip>> attested in the sample. Among the languages with the highest <<wip>> proportions were highly synthetic and polysynthetic languages of Australia and North America (Dalabon, Arapaho, Hoocąk), as well as two Austroasiatic languages with frequent detachment of proclitics (Jahai, Pnar).

Table 2:

Percentage of word-internal pauses among all pauses for each of the 44 datasets, along with the raw counts of disruptive and ultra-disruptive pauses. Total counts include borderline cases which could not be classified as disruptive or ultra-disruptive.

	Language	wip ratio	Disr.	Ultra-disr.	Total wip count
1.	Jahai	4.75 %	121	3	127
2.	Dalabon	3.63 %	73	2	75
3.	Arapaho	2.81 %	102	24	128
4.	Pnar	2.17 %	41	1	43
5.	Nisvai	1.83 %	34	6	40
6.	Warlpiri	1.51 %	44	7	55
7.	Hoocąk	1.34 %	32	16	51
8.	Svan	0.57 %	12	2	14
9.	Cashinahua	0.56 %	15	2	17
10.	Bora	0.38 %	10	11	21
11.	Nafsan (South Efate)	0.31 %	8	3	11
12.	Lower Sorbian	0.30 %	8	0	8
13.	Gurindji	0.28 %	8	0	8
14.	Sanzhi Dargwa	0.24 %	2	1	3
15.	Evenki	0.23 %	8	2	10
16.	Tabaq	0.23 %	5	3	8
17.	Dolgan	0.23 %	3	3	6
18.	Sadu	0.22 %	1	3	4
19.	Tabasaran	0.22 %	1	2	3
20.	Kamas	0.19 %	7	4	11
21.	Ruuli	0.19 %	5	2	7
22.	Jejuan	0.19 %	6	0	6
23.	Movima	0.19 %	4	2	6
24.	Baïnounk Gubëeher	0.17 %	7	2	9
25.	Resígaro	0.17 %	4	2	6
26.	Northern Alta	0.16 %	4	1	5
27.	Urum	0.14 %	4	3	7
28.	Light Warlpiri	0.14 %	3	0	3
29.	Fanbyak	0.13 %	1	4	5
30.	Daakie	0.13 %	0	3	3
31.	Savosavo	0.11 %	3	0	3
32.	Anal	0.10 %	2	2	4
33.	Asimjeeg Datooga	0.10 %	1	1	2
34.	Yongning Na	0.10 %	2	0	2
35.	Cabécar	0.09 %	0	3	3
36.	Mojeño Trinitario	0.08 %	1	2	3
37.	Beja	0.07 %	3	0	3
38.	Teop	0.07 %	2	0	2
39.	English (S. England)	0.07 %	0	1	1
40.	Yurakaré	0.06 %	5	4	9
41.	Gorwaa	0.06 %	0	3	3
42.	Yali	0.03 %	1	0	1
43.	Totoli	0.03 %	1	0	1
44.	Kakabe	0.02 %	1	0	1
	Total		595	130	738

Figure 20 visualizes the prevalence of word-internal pauses across the 44 languages and the proportion of disruptive versus ultra-disruptive pauses. The counts (Table 2) follow a Zipfian distribution: there were two languages with over 100 <<wip>>, 10 languages with at least 10 but fewer than 100 <<wip>>, and the remaining languages had fewer than 10 <<wip>> occurrences. A similar picture emerges when looking at the percentages of word-internal pauses among all silent pauses: in two datasets, word-internal pauses made up more than 3 % of pauses, while in the majority of languages, they made up less than 0.5 %. The majority of pauses coincided with morphological boundaries: 78 % of all word-internal pauses were disruptive, and 22 % were ultra-disruptive. Only in a small subset of languages (e.g. Tabasaran, Daakie) did ultra-disruptive pauses outnumber disruptive pauses.

Figure 20:

Percentage of word-internal pauses among all silent pauses across the 44-language sample. Black coloring indicates ultra-disruptive pauses and gray coloring indicates disruptive pauses.

4.2 Morph type

Word-internal pauses may appear within and next to a variety of different morph types. In the sample analyzed, there was a clear preference for disruptive pauses to appear within the (pre)-stem domain: there were considerably more pauses between proclitics, prefixes, and root elements than between roots, suffixes, and enclitics. Figure 21 shows the percentage of morphological junctures disrupted by a pause for each of the relevant morph type combinations.^[13] In the sample analyzed, junctures involving proclitics had the highest likelihood of involving an inserted pause, followed by prefixes. At 1 %, pauses between root elements were also relatively common. Junctures involving suffixes or enclitics were far less likely to be disrupted by a word-internal pause.

Figure 21:

Percentage of word-internal pauses occurring at root, affix, and clitic junctures. The y-axis indicates what percentage of the respective juncture types are interrupted by a pause in the sample.

Figure 22 shows the respective ratios for ultra-disruptive pauses. The percentage values are considerably lower than for disruptive pauses due to the relative scarcity of ultra-disruptive pauses. Ultra-disruptive pauses are most common with roots and prefixes, while suffixes and enclitics are less likely to be internally disrupted. In the sample analyzed, no instance of a disrupted proclitic was attested.

Figure 22:

The percentage of ultra-disruptive pauses for five morph types. The y-axis indicates what percentage of the respective morph types are interrupted by a pause in the sample.

4.3 Parts of speech

As the examples in Section 3 showed, there is no general restriction as to which word classes can be affected by word-internal pauses. In the sample analyzed, 51 % of word-internal pauses occur with verbs, 30 % with nouns, and 16 % with other parts of speech.^[14] Figure 23 depicts the distribution of word-internal pauses occurring with verbs, nouns, and other parts of speech, for the 10 languages with the highest <<wip>> count overall.

Figure 23:

Number of word-internal pauses occurring in verbs (black), nouns (dark gray), and other (light gray) parts of speech.

As Figure 23 shows, word-internal pauses tend to occur predominantly within verbs if a language has highly synthetic verbal morphology (e.g. Dalabon, Arapaho, Hoocąk). In moderately synthetic languages, nouns are at least equally strong, if not stronger, attractors of prosodic detachment. Note also the stark contrast between the two Australian languages Dalabon (Gunwinyguan, polysynthetic) and Warlpiri (Pama-Nyungan, moderately synthetic), with the former showing <<wip>> mostly after verbal prefixes and the latter before nominal suffixes.

4.4 Morphological synthesis

It has been speculated that word-internal pausing becomes more likely the higher the degree of morphological synthesis in a language (Dixon and Aikhenvald 2003). The data presented here support this claim, as polysynthetic languages are among those with the highest number of word-internal pauses. To quanitfy the correlation between the degree of synthesis and the prevalence of word-internal pauses, Figure 24 plots synthesis scores against the percentage of word-internal pauses for a subset of 32 languages for which morphological annotations are available in DoReCo. Synthesis scores were calculated based on the average number of morphs across all word types in each dataset (see Payne 2017 for a discussion of various indices in morphological typology). Disfluencies, other labeled content, and word chunks in the vicinity of word-internal pauses were filtered before calculating the synthesis scores. The graph reveals a weak positive correlation (Pearson’s Correlation Coefficient R = 0.24) between the degree of synthesis and the percentage of word-internal pauses. However, the confidence bands show a high degree of overlap, suggesting that we cannot be confident in there being a real effect in this correlation.

Figure 24:

Scatterplot showing the relation between degree of synthesis and occurrences of word-internal pauses including clitics.

The lack of a strong positive correlation could be due to the fact that the sample contained several languages with lower synthesis but frequent pausing after proclitics (e.g., Jahai). Therefore, the percentage of word-internal pauses occurring at root or affix but not clitic boundaries was also plotted against the synthesis scores (Figure 24). Here, a much stronger positive correlation (Pearson’s Correlation Coefficient R = 0.43) was found. This shows that there is a strong correlation between word-internal pausing and the degree of affixal synthesis that gets partially obscured by the frequent detachment of clitics in certain less synthetic languages (Figure 25).

Figure 25:

Scatterplot showing the relation between degree of synthesis and occurrences of word-internal pauses not involving a clitic.

4.5 Pause duration

Pause duration is often considered a proxy of boundary strength (Andreeva et al. 2020), with stronger boundaries corresponding to higher-level domains such as phrases and weaker boundaries aligning with lower-level domains such as words. Just like regular silent pauses, the duration of word-internal pauses varies considerably across speakers and communicative contexts. Figure 26 displays the distribution of regular speech pause and word-internal pause duration across the 44-language sample. Both distributions peak at 200–300 ms and are right-skewed, but there are notable differences between the two pause types. With word-internal pauses, more than half (62 %) have a duration of less than 500 ms, with about a quarter (25 %) ranging between 500 ms and 1000 ms, while only a minority (13 %) are longer than 1000 ms. Word-external pauses, on the other hand, show a less steep decline in frequency with longer durations, i.e. there are relatively more regular pauses with longer durations compared to word-internal pauses. 38 % of regular pauses have a duration of less than 500 ms, 29 % fall between 500 ms and 1000 ms, and 33 % are above 1000 ms. These results echo previous findings that most silent pauses have a duration of 500 ms or less (Campione and Véronis 2002). For regular pauses, contrary to word-internal pauses, durations > 1,000 ms are still fairly common. The different distributions align well with the view that pause duration is coupled with domain size: regular pauses may divide the speech stream into chunks of any size, ranging from prosodic words to intonational phrases or turns, while word-internal pauses tend to operate at the level of prosodic words or smaller phrases at most.

Figure 26:

Density lines for duration measurements of regular (black) and word-internal (gray) speech pauses.

4.6 Pre-pausal lengthening

Word-internal pauses can sometimes be ambiguous with regards to the prosodic status of the two separated elements. Speech pauses do, however, always constitute some sort of prosodic break, and in some of the examples in Section 3, we saw substantial lengthening of vowels in the vicinity of the word-internal pause (see Figures 8, 10, and 18). This lengthening could simply be a manifestation of disfluent speech, but it could also be directly related to the presence of a prosodic boundary. The duration of segments preceding a prosodic boundary is often increased, a process known as final lengthening (Fletcher 2010) that has been demonstrated to be robust across a wide range of phonologically diverse languages (Paschen et al. 2022). Figure 27 shows the duration of vowels, pooled from the 44 datasets in the sample, in three positions: syllables before regular speech pauses, syllables before word-internal pauses, and all other (i.e. non-final) positions.^[15] Figure 27 shows that vowels before word-internal pauses were considerably longer than non-final vowels, and also longer than vowels in regular pre-pausal position. In non-final positions, median vowel duration was 70 ms, while before regular speech pauses, it was 93 ms. Before word-internal pauses, median vowel duration was 137 ms.

Figure 27:

Vowel duration in syllables before pauses and in non-final syllables.

A linear mixed model (23) with position, segment, local speech rate, and syllable structure as fixed effects, as well as speaker and language as random variables, revealed significant effects of position (p < 0.001 ***).^[16] In sum, there was significant lengthening of vowels before speech pauses, and significantly more lengthening before word-internal pauses compared to regular pauses. This suggests that word-internal pauses do not simply constitute prosodic breaks, but are also associated with a general slower and less fluent stretch of speech.

(23)

lmer(log(duration) ∼ position + segment + speech_rate + syllable_structure + (1 | language/speaker)

4.7 Disfluencies

Word-internal pausing, especially in the case of ultra-disruptive pauses, may point at difficulties with speech planning and a generally disfluent stretch of speech, especially in languages where speakers use word-internal pauses only sporadically. Section 3 provided some discussion of word-internal pauses accompanied by disfluencies such as false starts or fillers (see example (6)). To determine whether false starts and fillers are more common in the vicinity of word-internal pauses compared to regular silent pauses, the log-likelihood over the two event likelihoods was determined. Temporal proximity between a pause and a disfluency event was operationalized as a distance of up to two segments. The likelihood of encountering a disfluency in the vicinity of a silent pause was 0.012, while the corresponding likelihood for word-internal pauses was 0.014.

(24)

log-likelihood_{pause_over_wip} = log(likelihood_pause / likelihood_wip)

The resulting log-likelihood value, calculated in (24), is −0.158, indicating a slight preference for encountering a false start or filled pause near a word-internal pause. This means that the likelihood of a disfluency occurring near a word-internal pause increases by approximately 17 % compared to a location near a regular silent pause. Thus, word-internal pausing and other types of disfluencies show a slight tendency to occur together, which is expected given that some word-internal pauses likely stem from difficulties in speech planning.

4.8 Interim summary

The results presented in this section can be summarized as follows. The distribution of word-internal pauses aligns with a Zipfian pattern: in two languages, more than 3 % of all silent pauses are word-internal pauses; in seven languages, the ratio is between 0.5 % and 3 %, and for the majority of languages, the ratio is below 0.5 %. Word-internal pauses predominantly occur at morphological boundaries, especially in the pre-stem domain, though some also appear within morphemes, typically affecting root elements. Across word classes, word-internal pauses are observed in all parts of speech; however, they tend to be more frequent in verbs for polysynthetic languages. Additionally, the likelihood of encountering word-internal pauses increases with the level of morphological synthesis. In terms of duration, word-internal pauses cluster around 200–300 ms, with only a minority exceeding 500 ms. This contrasts with regular silent pauses, which display a more even distribution, and suggests a closer connection between the two parts split by a word-internal pause. Significant vowel lengthening was observed prior to word-internal pauses, likely influenced by final lengthening. Disfluency markers were also more commonly found near word-internal pauses.

5 Discussion

Word-internal pauses are not geographically or genealogically restricted but appear globally across languages. Much of the existing literature on intraword pauses has focused on polysynthetic languages of Australia (Baker 2018; Evans et al. 2008; Mansfield 2023), but the phenomenon is far from unique to this region. There are also considerable differences among Australian languages. Out of the languages in the sample, the polysynthetic language Dalabon (Gunwinyguan) demonstrates systematic word-internal pauses, particularly within complex verb forms. By contrast, in Gurindji, Warlpiri (both Pama-Nyungan), and Light Warlpiri (a mixed language), word-internal pauses occur with lower frequency and often outside the verbal complex.

Word-internal pauses are also highly relevant for the well-known cross-linguistic suffixing preference. This typological asymmetry has been attributed to ease of processing in that incremental parsing relies most heavily on the beginning of words, favoring the presentation of lexical stems before affixes (Berg 2020; Cutler et al. 1985). A related argument has been brought forward by St. Clair et al. (2009) for language acquisition. By contrast, Himmelmann (2014) emphasizes the role of prosody, arguing that the preferential placement of pauses behind function words facilitates fusion of postposed material with the preceding stem. The data presented in this study show that prefixes are markedly more prone than suffixes to be separated from their bases by a pause. This finding can inform the debate surrounding the suffixing preference: not only are postposed elements prosodically favored to become grammaticalized, as argued by Himmelmann (2014), but preposed elements appear to be less likely to be grammaticalized due to their less tight prosodic integration. The greater prosodic independence of prefixes is evidenced by the distribution of word-internal pauses, but has also been noted in relation to numerous phonological processes such as harmony, tone spread, and syllabification (Elkins 2020). Related to this, prefixes have also been argued to be more lexical in nature than suffixes (Berg 2020), which again speaks to their relative independence. The abundance of pauses occurring around prefixes and their scarcity around suffixes may also hint at speech preparation being anchored to the lexical stem, which carries the highest load in terms of processing, and the relative difficulty related to processing prefixes (Asao 2015). By extension, the same considerations may apply to the enclitic preference (Asao 2015; Cysouw 2005), as the present study revealed a larger proportion of pauses occurring after proclitics than before enclitics.

The skewed distribution of word-internal pauses thus appears to be relevant to the suffixing preference in terms of an inhibition of fusion. Reinöhl and Casaretto (2018) present a similar argument for local particles in Indo-Aryan, which they argue were prevented from grammaticalizing into adpositions due to prosodic chunking and accentuation preferences. A related question is whether recurrent pausing after prefixes might even lead to degrammaticalization, as suggested by Evans et al. (2008). While the precise mechanisms through which pausing could foster splitting and (re-)lexicalization of prefix material remain unclear, at least at a cross-linguistic level, the present study underscores the need to integrate prosodic and psycholinguistic perspectives into accounts of typological asymmetries.

Another important outcome of the present study is the indeterminacy of phonological domains, as there was often conflicting evidence as to which domains word chunks separated by disruptive pauses correspond to. While it is often assumed that word-internal pauses automatically delineate phonological words (Evans et al. 2008; Mansfield 2023), the examples discussed in the previous sections demonstrated that this assumption is false. In example (16) from Yurakaré, phonetic evidence indicated that a detached prefix constitutes a prosodic phrase. Similarly, example (11) from Warlpiri showed considerable vowel lengthening before a word-internal pause, which can be taken as evidence for a major prosodic boundary corresponding to the phrase level. In contrast, example (9) from Jahai showed the insertion of an epenthetic vowel as though the word was uninterrupted by a pause, and example (17) from Movima included a word-level rule applying to an interrupted word form as if no pause was present, suggesting that the entire grammatical word including the internal pause is part of the same prosodic word domain. The same is true of example (14) from Hoocąk, where the word-internal pause appears to be invisible to a segmental allophony rule. In the Svan example in (20), by contrast, one of the detached elements consists solely of consonants and can hardly be considered a phonological word. Moreover, as shown in Section 4.5, the duration of word-internal pauses varied substantially, and shorter pauses (200–300 ms) were generally less likely to delineate two distinct prosodic constituents. All of the above make it clear that the relation between word-internal pauses and phonological constituents is far from trivial, and there is no universal mapping between word-internal pauses and phonological word boundaries.

The question of why speakers pause within words at all remains unanswered for now, and would need to be approached with an experimental study design. In general, the occurrence of pauses is as a multifactorial phenomenon. First, pauses arise from physiological necessity, since speech is naturally constrained by the respiratory cycle (Werner 2023). Second, pauses reflect syntactic and prosodic structure, with boundaries in speech often coinciding with clause or phrase junctures (Krivokapić 2007; Peck and Becker 2024). Third, pauses can be related to cognitive load and are sometimes grouped together with filled pauses as manifestations of disfluent speech (Belz 2023). Fourth, they serve as interactional signals regulating turn-taking (ten Bosch et al. 2005). Finally, pauses can be used for rhetorical or emphatic purposes, drawing attention to key information or functioning as conventionalized pragmatic devices (Fuchs and Paschen 2021). Word-internal pauses appear to be mostly related to cognitive demands, and most of them also respect morphological boundaries. Ideally, the factors leading to word-internal pausing ought to be further scrutinized in controlled experiments with speakers of various languages.

A compelling yet difficult-to-resolve question is whether some ultra-disruptive pauses could occur at locations that were former morphological boundaries. If so, these pauses might not be as randomly distributed as an account purely in terms of speech planning difficulties may suggest. However, such morph junctures may no longer be available without knowledge of earlier states of a given language. While many ultra-disruptive pauses likely stem from real-time speech planning difficulties, others may reflect deeper, diachronic processes. Further investigation into this phenomenon could yield valuable insights into both synchronic processing and historical morphology.

6 Conclusions

Zingler (2022) raised the question of whether pausability might be a practical criterion for wordhood (see Section 1). The findings here suggest a tentative “no”. While some languages such as Arapaho, Dalabon, and Jahai show a clear preference for pausing at prefix and proclitic boundaries, speakers across diverse linguistic and cultural backgrounds demonstrate considerable latitude in choosing where to pause within words, and may even do so within minimal morphological forms. Pausability appears to be overall too variable and influenced by speaker preference to serve as a definitive marker of wordhood, or domain-hood on any level. As such, the overarching implication of this study is that in natural speech, words are non-monolithic, flexible units. Speakers show adaptability in segmenting their speech at junctures that sometimes align with, and sometimes diverge from, established phonological or grammatical boundaries in order to accommodate to the dynamic nature of spoken language.

Corresponding author: Ludger Paschen [ˈlʊd̥gɐ ˈpʰaːʃən], Leibniz-Zentrum Allgemeine Sprachwissenschaft (ZAS), Pariser Str. 1, 10719 Berlin, Germany; and Universität Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany, E-mail: paschen@leibniz-zas.de

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: 496083013

Acknowledgments

This research was funded by a grant from the Deutsche Forschungsgemeinschaft (DFG PA 2368/1-1, project number: 496083013). I am grateful to the audiences of the workshop “Efficiency in grammar: Patterns and Explanations” at University of Freiburg and the 21st International Congress of Linguists (ICL 2024) in Poznań for helpful discussion and feedback. I am especially grateful to Aleksandr Schamberger for his support at various stages of this study. Finally, I would like to thank three anonymous reviewers for many valuable comments. Any remaining errors remain my own.

Abbreviations

advz: adverbializer
en: English
ev: Evenki
id: identification
instr: instrumental
ints: intensifier
md: middle voice
nvis: non-visible
pfv: perfective aspect
pp: past perfective
pr: present tense
r: realis
red: reduplication
seq: sequential
sim: simultaneous aspect
ru: Russian
<<wip>>: word-internal pause

References

Aikhenvald, Alexandra Y., R. M. W. Dixon & Nathan M. White (eds.). 2020a. Phonological word and grammatical word: A cross-linguistic typology. Oxford: Oxford University Press.10.1093/oso/9780198865681.001.0001Search in Google Scholar

Aikhenvald, Alexandra Y., R. M. W. Dixon & Nathan M. White. 2020b. The essence of ‘word’. In Alexandra Y. Aikhenvald, R. M. W. Dixon & Nathan M. White (eds.), Phonological word and grammatical word: A cross-linguistic typology, 1–24. Oxford: Oxford University Press.10.1093/oso/9780198865681.003.0001Search in Google Scholar

Allison, Sean. 2020. The notion of ‘word’ in Makary Kotoko. In Alexandra Y. Aikhenvald, R. M. W. Dixon & Nathan M. White (eds.), Phonological word and grammatical word: A cross-linguistic typology, 260–284. Oxford: Oxford University Press.10.1093/oso/9780198865681.003.0009Search in Google Scholar

Andreeva, Bistra, Bernd Möbius & James Whang. 2020. Effects of surprisal and boundary strength on phrase-final lengthening. In Proc. speech prosody 2020, 146–150. Tokyo, Japan.10.21437/SpeechProsody.2020-30Search in Google Scholar

Asao, Yoshihiko. 2015. Left-right asymmetries in words: A processing-based account. Buffalo: State University of New York dissertation.Search in Google Scholar

Aznar, Jocelyn. 2024. Nisvai DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/nisv1234.Search in Google Scholar

Babinski, Sarah, Rikker Dockum, J. Hunter Craft, Anelisa Fergus, Dolly Goldenberg & Claire Bowern. 2019. A Robin Hood approach to forced alignment: English-trained algorithms and their use on Australian languages. Proceedings of the Linguistic Society of America 4(3). 1–12. https://doi.org/10.3765/plsa.v4i1.4468.Search in Google Scholar

Baker, Brett. 2002. How referential is agreement? The interpretation of polysynthetic dis-agreement morphology in Ngalakgan. In Nicholas Evans & Hans-Jürgen Sasse (eds.), Problems of polysynthesis, 51–86. Berlin: Akademie Verlag.Search in Google Scholar

Baker, Brett. 2018. Super-complexity and the status of ‘word’ in Gunwinyguan languages of Australia. In Geert Booij (ed.), The construction of words: Advances in construction morphology, vol. 4 (Studies in Morphology), 255–286. Cham: Springer International Publishing.10.1007/978-3-319-74394-3_10Search in Google Scholar

Bardají, Maria, Christoph Bracks, Claudia Leto, Datra Hasan, Sonja Riesberg, Winarno S. Alamudi & Nikolaus P. Himmelmann. 2024. Totoli DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/toto1304.Search in Google Scholar

Bartels, Hauke & Marcin Szczepański. 2024. Lower Sorbian DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/lowe1385.Search in Google Scholar

Belz, Malte. 2023. Defining filler particles: A phonetic account of the terminology, form, and grammatical classification of “filled pauses”. Languages 8(1). 57. https://doi.org/10.3390/languages8010057.Search in Google Scholar

Berg, Thomas. 2020. Ordering biases in cross-linguistic perspective: The interaction of serial order and structural level. Linguistic Typology 24(2). 353–397. https://doi.org/10.1515/lingty-2019-2031.Search in Google Scholar

Bermúdez-Otero, Ricardo. 2012. The architecture of grammar and the division of labour in exponence. In Jochen Trommer (ed.), The morphology and phonology of exponence, 8–83. Oxford: Oxford University Press.10.1093/acprof:oso/9780199573721.003.0002Search in Google Scholar

Bogomolova, Natalia, Dmitry Ganenkov & Nils Norman Schiborr. 2024. Tabasaran DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/taba1259.Search in Google Scholar

ten Bosch, Louis, Nelleke Oostdijk & Lou Boves. 2005. On temporal aspects of turn taking in conversational dialogues. Speech Communication 47(1). 80–86. https://doi.org/10.1016/j.specom.2005.05.009.Search in Google Scholar

Bundgaard-Nielsen, Rikke L. & Brett J. Baker. 2020. Pause acceptability indicates word-internal structure in Wubuy. Cognition 198. 104167. https://doi.org/10.1016/j.cognition.2019.104167.Search in Google Scholar

Burenhult, Niclas. 2005. A grammar of Jahai, vol. 566 (Pacific Linguistics). Canberra: The Australian National University.Search in Google Scholar

Burenhult, Niclas. 2024. Jahai DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/jeha1242.Search in Google Scholar

Cahill, Michael. 2018. Orthography design and implementation for endangered languages. In Kenneth L. Rehg & Lyle Campbell (eds.), The Oxford handbook of endangered languages, 327–348. Oxford: Oxford University Press.10.1093/oxfordhb/9780190610029.013.17Search in Google Scholar

Campione, Estelle & Jean Véronis. 2002. A large-scale multilingual study of silent pause duration. In Proc. speech prosody 2002, 199–202. Aix-en-Provence, France.10.21437/SpeechProsody.2002-35Search in Google Scholar

Chodroff, Eleanor, Emily P. Ahn & Hossep Dolatian. 2025. Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment. Language Documentation & Conservation 19. 201–223.Search in Google Scholar

Ciucci, Luca. 2020. Wordhood in Chamacoco. In Alexandra Y. Aikhenvald, R. M. W. Dixon & Nathan M. White (eds.), Phonological word and grammatical word: A cross-linguistic typology, 78–120. Oxford: Oxford University Press.10.1093/oso/9780198865681.003.0004Search in Google Scholar

Cobbinah, Alexander Yao. 2024. Baïnounk Gubëeher DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/bain1259.Search in Google Scholar

Cowell, Andrew. 2024. Arapaho DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/arap1274.Search in Google Scholar

Cutler, Anne, John A. Hawkins & Gary Gilligan. 1985. The suffixing preference: A processing explanation. Linguistics 23(5). 723–758. https://doi.org/10.1515/ling.1985.23.5.723.Search in Google Scholar

Cysouw, Michael. 2005. Morphology in the wrong place: A survey of preposed enclitics. In Wolfgang U. Dressler, Dieter Kastovsky, Oskar E. Pfeiffer & Franz Rainer (eds.), Morphology and its demarcations. Selected papers from the 11th Morphology Meeting, Vienna, February 2004, 17–37. Amsterdam & Philadelphia: John Benjamins.10.1075/cilt.264.02cysSearch in Google Scholar

Däbritz, Chris Lasse, Nina Kudryakova, Eugénie Stapert & Alexandre Arkhipov. 2024. Dolgan DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/dolg1241.Search in Google Scholar

Dixon, R. M. W. & Alexandra Y. Aikhenvald (eds.). 2003. Word: A cross-linguistic typology. Cambridge, MA: Cambridge University Press.10.1017/CBO9780511486241Search in Google Scholar

Downing, Laura J. 1999. Prosodic stem ≠ prosodic word in Bantu. In Tracy Alan Hall & Ursula Kleinhenz (eds.), Studies on the phonological word, vol. 174 (Current Issues in Linguistic Theory), 73–98. Amsterdam & Philadelphia: John Benjamins.10.1075/cilt.174.05dowSearch in Google Scholar

Elkins, Noah. 2020. Prefix independence: Typology and theory. Los Angeles: University of California MA thesis.Search in Google Scholar

Enfield, Nick J. 2020. Word in Lao. In Alexandra Y. Aikhenvald, R. M. W. Dixon & Nathan M. White (eds.), Phonological word and grammatical word: A cross-linguistic typology, 176–212. Oxford: Oxford University Press.10.1093/oso/9780198865681.003.0007Search in Google Scholar

Evans, Nicholas, Janet Fletcher & Belinda Ross. 2008. Big words, small phrases: Mismatches between pause units and the polysynthetic word in Dalabon. Linguistics 46(1). 89–129. https://doi.org/10.1515/LING.2008.004.Search in Google Scholar

Fletcher, Janet. 2010. The prosody of speech: Timing and rhythm. In William J. Hardcastle, John Laver & Fiona E. Gibbon (eds.), The handbook of phonetic sciences, 2nd edn., 521–602. Chichester: Blackwell Publishing.10.1002/9781444317251.ch15Search in Google Scholar

Forker, Diana & Nils Norman Schiborr. 2024. Sanzhi Dargwa DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/sanz1248.Search in Google Scholar

Franjieh, Michael. 2024. Fanbyak DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/orko1234.Search in Google Scholar

Fuchs, Susanne & Ludger Paschen. 2021. (Non)conventional aspects of language and their relation to general linguistics. Theoretical Linguistics 47(1–2). 75–84. https://doi.org/10.1515/tl-2021-2007.Search in Google Scholar

Garcia-Laguia, Alexandro. 2024. Northern Alta DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/nort2875.Search in Google Scholar

van Gijn, Rik & Fernando Zúñiga. 2014. Word and the Americanist perspective. Morphology 24. 135–160. https://doi.org/10.1007/s11525-014-9242-z.Search in Google Scholar

Gipper, Sonja & Jeremías Ballivián Torrico. 2024. Yurakaré DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/yura1255.Search in Google Scholar

Gippert, Jost. 2024. Svan DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/svan1243.Search in Google Scholar

Gósy, Mária. 2023. Occurrences and durations of filled pauses in relation to words and silent pauses in spontaneous speech. Languages 8(1). 79. https://doi.org/10.3390/languages8010079.Search in Google Scholar

Griscom, Richard. 2024. Asimjeeg Datooga DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/tsim1256.Search in Google Scholar

Gusev, Valentin, Tiina Klooster, Beáta Wagner-Nagy & Alexandre Arkhipov. 2024. Kamas DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/kama1351.Search in Google Scholar

Hale, Ken. 1995. An elemental Warlpiri dictionary. Alice Springs: IAD Press.10.1515/9783110142631.2.21.1430Search in Google Scholar

Hall, Tracy Alan. 1999. The phonological word: A review. In Tracy Alan Hall & Ursula Kleinhenz (eds.), Studies on the phonological word, vol. 174 (Current Issues in Linguistic Theory), 1–22. Amsterdam & Philadelphia: John Benjamins.10.1075/cilt.174.02halSearch in Google Scholar

Hammarström, Harald, Thom Castermans, Robert Forkel, Kevin Verbeek, Michel A. Westenberg & Bettina Speckmann. 2018. Simultaneous visualization of language endangerment and language description. Language Documentation & Conservation 12. 359–392.Search in Google Scholar

Hammarström, Harald, Robert Forkel, Martin Haspelmath & Sebastian Bank. 2024. Glottolog v5.0. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar

Harris, Alice. 2002. Endoclitics and the origins of Udi morphosyntax. Oxford: Oxford University Press.10.1093/oso/9780199246335.001.0001Search in Google Scholar

Harris, Alice C. & Arthur G. Samuel. 2024. Processing and production of clitics in Udi and European Portuguese: Testing a processing account of an extension of the suffixing preference. Journal of Linguistics 60. 825–857. https://doi.org/10.1017/S0022226724000045.Search in Google Scholar

Hartmann, Iren. 2024. Hoocąk DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/hoch1243.Search in Google Scholar

Harvey, Andrew. 2024. Gorwaa DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/goro1270.Search in Google Scholar

Haspelmath, Martin. 2011. The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica 45(1). 31–80. https://doi.org/10.1515/flin.2011.002.Search in Google Scholar

Haspelmath, Martin. 2023. Defining the word. WORD 69(3). 293–297. https://doi.org/10.1080/00437956.2023.2237272.Search in Google Scholar

Haude, Katharina. 2006. A grammar of Movima. Radboud Universiteit Nijmegen dissertation.Search in Google Scholar

Haude, Katharina. 2024. Movima DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/movi1243.Search in Google Scholar

Hellwig, Birgit, Gertrud Schneider-Blum & Khaleel Bakheet Khaleel Ismail. 2024. Tabaq (Karko) DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/kark1256.Search in Google Scholar

Hildebrandt, Kristine A. 2015. The prosodic word. In John R. Taylor (ed.), The Oxford handbook of the word, 221–245. Oxford: Oxford University Press.10.1093/oxfordhb/9780199641604.013.035Search in Google Scholar

Himmelmann, Nikolaus P. 2014. Asymmetries in the prosodic phrasing of function words: Another look at the suffixing preference. Language 90(4). 927–960. https://doi.org/10.1353/lan.2014.0105.Search in Google Scholar

Hunter, John D. 2007. Matplotlib: A 2D graphics environment. Computing in Science & Engineering 9(3). 90–95. https://doi.org/10.1109/MCSE.2007.55.Search in Google Scholar

Jarkey, Nerdia. 2020. Words in Japanese. In Alexandra Y. Aikhenvald, R. M. W. Dixon & Nathan M. White (eds.), Phonological word and grammatical word: A cross-linguistic typology, 39–77. Oxford: Oxford University Press.10.1093/oso/9780198865681.003.0003Search in Google Scholar

Johanson, Lars. 1998. The structure of Turkic. In Lars Johanson & Éva Á. Csató (eds.), The Turkic languages, 30–66. London & New York: Routledge.Search in Google Scholar

Kazakevich, Olga & Elena Klyachko. 2024. Evenki DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/even1259.Search in Google Scholar

Kim, Soung-U. 2024. Jejuan DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/jeju1234.Search in Google Scholar

Kisler, Thomas, Florian Schiel & Han Sloetjes. 2012. Signal processing via web services: The use case WebMAUS. In Proceedings of digital humanities 2012, 30–34.Search in Google Scholar

Koshik, Irene. 2005. Beyond rhetorical questions: Assertive questions in everyday interaction. Amsterdam & Philadelphia: John Benjamins.10.1075/sidag.16Search in Google Scholar

Krifka, Manfred. 2017. Daa ne Daakie kevene / Ol Wod blong Daakie / Daakie dictionary. with Abel Taho, Paul Tomo, Jip Abel Aeven, Lissing Boa, Simon Boa, Filip Bong, Adam Joshua, Ilson Magekon, Jemis Melip, Jack Paul, Jip Jack Samuel. CreateSpace Independent Publishing Platform.Search in Google Scholar

Krifka, Manfred. 2024. Daakie DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/port1286.Search in Google Scholar

Krivokapić, Jelena. 2007. Prosodic planning: Effects of phrasal length and complexity on pause duration. Journal of Phonetics 35. 162–179. https://doi.org/10.1016/j.wocn.2006.04.001.Search in Google Scholar

Lüpke, Friederike. 2011. Orthography development. In Peter K. Austin & Julia Sallabank (eds.), The Cambridge handbook of endangered languages, 312–336. Cambridge, MA: Cambridge University Press.10.1017/CBO9780511975981.016Search in Google Scholar

Mansfield, John. 2023. The prosodic structure of Australian polysynthetic verbs: Bininj Gun-wok, Murrinhpatha and Ngalakgan. In Ksenia Bogomolets & Harry van der Hulst (eds.), Word prominence in morphologically complex languages, 411–438. Oxford: Oxford University Press.10.1093/oso/9780198840589.003.0013Search in Google Scholar

Matisoff, James A. 1973. Tonogenesis in Southeast Asia. In Larry Hyman (ed.), Consonant types and tone, vol. 1 (Southern California Occasional Papers in Linguistics), 71–95. Los Angeles, CA: University of Southern California.Search in Google Scholar

Meakins, Felicity. 2024. Gurindji DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/guri1247.Search in Google Scholar

Meakins, Felicity & Patrick McConvell. 2021. A grammar of Gurindji, vol. 91 (Mouton Grammar Library). Berlin & Boston: Walter de Gruyter.10.1515/9783110746884Search in Google Scholar

Meakins, Felicity, Xia Hua, Cassandra Algy & Lindell Bromham. 2019. Birth of a contact language did not favor simplification. Language 95(2). 294–332. https://doi.org/10.1353/lan.2019.0032.Search in Google Scholar

Met Office. 2023. Cartopy: A cartographic python library with a matplotlib interface. Available at: https://scitools.org.uk/cartopy/docs/latest/.Search in Google Scholar

Michaud, Alexis. 2024. Yongning Na DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/yong1270.Search in Google Scholar

Mosel, Ulrike. 2024. Teop DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/teop1238.Search in Google Scholar

Nakamoto, Shun. 2024. Constituency in Ayautla Mazatec. In Adam J. R. Tallman, Sandra Auderset & Hiroto Uchihara (eds.), Constituency and convergence in the Americas, 231–264. Berlin: Language Science Press.Search in Google Scholar

Nespor, Marina & Irene Vogel. 2007. Prosodic phonology: With a new foreword. Berlin & Boston: De Gruyter Mouton.10.1515/9783110977790Search in Google Scholar

Ong, Walter J. 1982. Orality and literacy: The technologizing of the word. London & New York: Routledge.10.4324/9780203328064Search in Google Scholar

O’Shannessy, Carmel. 2024a. Light Warlpiri DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/ligh1234.Search in Google Scholar

O’Shannessy, Carmel. 2024b. Warlpiri DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/warl1254.Search in Google Scholar

Ozerov, Pavel. 2024. Anal DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/anal1239.Search in Google Scholar

Paschen, Ludger, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave & Frank Seifart. 2020. Building a time-aligned cross-linguistic reference corpus from language documentation data (DoReCo). In Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Heene Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference, May 2020, 2657–2666. Marseille: European Language Resources Association. Available at: https://www.aclweb.org/anthology/2020.lrec-1.324.Search in Google Scholar

Paschen, Ludger, Susanne Fuchs & Frank Seifart. 2022. Final lengthening and vowel length in 25 languages. Journal of Phonetics 94. 101179. https://doi.org/10.1016/j.wocn.2022.101179.Search in Google Scholar

Payne, Thomas E. 2017. Morphological typology. In Alexandra Y. Aikhenvald & R. M. W. Dixon (eds.), The Cambridge handbook of linguistic typology (Cambridge Handbooks in Language and Linguistics), 78–94. Cambridge, MA: Cambridge University Press.10.1017/9781316135716.003Search in Google Scholar

Peck, Naomi & Laura Becker. 2024. Syntactic pausing? Re-examining the associations. Linguistics Vanguard 10(1). 223–37. https://doi.org/10.1515/lingvan-2022-0156.Search in Google Scholar

Ponsonnet, Maïa. 2014. The language of emotions: The case of Dalabon (Australia), vol. 4 (Cognitive Linguistic Studies in Cultural Contexts). Amsterdam & Philadelphia: John Benjamins.10.1075/clscc.4Search in Google Scholar

Ponsonnet, Maïa. 2024. Dalabon DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/ngal1292.Search in Google Scholar

Puggaard-Rode, Rasmus. 2024. Praatpicture. A library for making flexible Praat Picture-style figures in R. In Cécile Fougeron & Pascal Perrier (eds.), Proceedings of the 13th International Seminar on Speech Production, 115–118.10.21437/issp.2024-34Search in Google Scholar

Quesada, Juan Diego, Stavros Skopeteas, Carolina Pasamonik, Carolin Brokmann & Florian Fischer. 2024. Cabécar DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/cabe1245.Search in Google Scholar

R Core Team. 2024. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Reinöhl, Uta & Antje Casaretto. 2018. When grammaticalization does NOT occur: Prosody-syntax mismatches in Indo-Aryan. Diachronica 35(2). 238–276. https://doi.org/10.1075/dia.17013.rei.Search in Google Scholar

Reiter, Sabine. 2024. Cashinahua DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/cash1254.Search in Google Scholar

Riesberg, Sonja. 2024. Yali (Apahapsili) DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/apah1238.Search in Google Scholar

Ring, Hiram. 2014. Pnar. In Mathias Jenny & Paul Sidwell (eds.), The handbook of Austroasiatic languages, 1186–1226. Leiden, NL: Brill.10.1163/9789004283572_027Search in Google Scholar

Ring, Hiram. 2024. Pnar DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/pnar1238.Search in Google Scholar

Rose, Françoise. 2024. Mojeño Trinitario DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/trin1278.Search in Google Scholar

Ross, Belinda Britt. 2011. Prosody and grammar in Dalabon and Kayardild. University of Melbourne dissertation.Search in Google Scholar

RStudio Team. 2024. RStudio: Integrated development environment for R. RStudio 2024.04.2 Build 764. https://www.rstudio.com/ (accessed 20 October 2024).Search in Google Scholar

Russell, Kevin. 1999. The “word” in two polysynthetic languages. In Tracy Alan Hall & Ursula Kleinhenz (eds.), Studies on the phonological word, vol. 174 (Current Issues in Linguistic Theory), 203–222. Amsterdam & Philadelphia: John Benjamins.10.1075/cilt.174.08rusSearch in Google Scholar

Sacks, Harvey. 1989. An analysis of the course of a joke’s telling in conversation. In Richard Bauman & Joel Sherzer (eds.), Explorations in the ethnography of speaking. Studies in the social and cultural foundations of language, 337–353. Cambridge, MA: Cambridge University Press.Search in Google Scholar

Schiborr, Nils Norman. 2024. English (Southern England) DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/sout3282.Search in Google Scholar

Schnell, Stefan, Geoffrey Haig & Frank Seifart. 2022. The role of language documentation in corpus-based typology. In Geoffrey Haig, Stefan Schnell & Frank Seifart (eds.), Doing corpus-based typology with spoken language data: State of the art, 1–25. Honolulu, HI: University of Hawai’i Press. Available at: htpps://hdl.handle.net/10125/74656.10.20378/irb-57602Search in Google Scholar

Seifart, Frank. 2024a. Bora DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/bora1263.Search in Google Scholar

Seifart, Frank. 2024b. Resígaro DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/resi1247.Search in Google Scholar

Seifart, Frank, Ludger Paschen & Matthew Stave (eds.). 2024. Language Documentation Reference Corpus (DoReCo). 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doi.org/10.34847/nkl.7cbfq779Search in Google Scholar

Skopeteas, Stavros, Violeta Moisidi, Nutsa Tsetereli, Johanna Lorenz & Stefanie Schröter. 2024. Urum DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/urum1249.Search in Google Scholar

St.Clair, Michelle C., Padraic Monaghan & Ramscar Michael. 2009. Relationships between language structure and language learning: The suffixing preference and grammatical categorization. Cognitive Science 33(7). 1317–1329. https://doi.org/10.1111/j.1551-6709.2009.01065.x.Search in Google Scholar

Tallman, Adam J. R. 2020. Beyond grammatical and phonological words. Language and Linguistics Compass 14(2). https://doi.org/10.1111/lnc3.12364.Search in Google Scholar

Tallman, Adam J. R. & Sandra Auderset. 2023. Measuring and assessing indeterminacy and variation in the morphology-syntax distinction. Linguistic Typology 27(1). 113–156. https://doi.org/10.1515/lingty-2021-0041.Search in Google Scholar

Testelets, Yakov G. & Yury Lander. 2017. Adyghe (Northwest Caucasian). In Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds.), The Oxford handbook of polysynthesis, 948–970. Oxford: Oxford University Press.10.1093/oxfordhb/9780199683208.013.51Search in Google Scholar

Thieberger, Nick. 2024. Nafsan (South Efate) DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/sout2856.Search in Google Scholar

Tilsen, Sam & Mark Tiede. 2023. Parameters of unit-based measures of speech rate. Speech Communication 150. 73–97. https://doi.org/10.1016/j.specom.2023.05.006.Search in Google Scholar

Tucker, G. Richard. 1999. A global perspective on bilingualism and bilingual education. In James E. Alatis & Ai-Hui Tan (eds.), Georgetown University round table on languages and linguistics, 332–340. Washington, DC: Georgetown University Press.Search in Google Scholar

Tuite, Kevin. 1998. A short descriptive grammar of the Svan language. Université de Montréal dissertation.Search in Google Scholar

Van Gijn, Rik. 2006. A grammar of Yurakaré. Radboud University Nijmegen, Faculty of Arts dissertation.Search in Google Scholar

Vanhove, Martine. 2024. Beja DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/beja1238.Search in Google Scholar

Vydrina, Alexandra. 2024. Kakabe DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/kaka1265.Search in Google Scholar

Wegener, Claudia. 2024. Savosavo DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/savo1255.Search in Google Scholar

Werner, Raphael Johannes. 2023. The phonetics of speech breathing: Pauses, physiology, acoustics, and perception. Universität des Saarlandes dissertation.Search in Google Scholar

White, Nathan M. 2020. Word in Hmong. In Alexandra Y. Aikhenvald, R. M. W. Dixon & Nathan M. White (eds.), Phonological word and grammatical word: A cross-linguistic typology, 213–259. Oxford: Oxford University Press.10.1093/oso/9780198865681.003.0008Search in Google Scholar

Witzlack-Makarevich, Alena, Saudah Namyalo, Anatol Kiriggwajjo & Zarina Molochieva. 2024. Ruuli DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/ruul1235.Search in Google Scholar

Woodbury, Anthony C. 2024. Constituency in Cup’ik and the problem of holophrasis. In Adam J. R. Tallman, Sandra Auderset & Hiroto Uchihara (eds.), Constituency and convergence in the Americas, 85–138. Berlin: Language Science Press.Search in Google Scholar

Wray, Alison. 2015. Why are we so sure we know what a word is? In John R. Taylor (ed.), The Oxford handbook of the word, 725–750. Oxford: Oxford University Press.10.1093/oxfordhb/9780199641604.013.032Search in Google Scholar

Xu, Xianming & Bibo Bai. 2024. Sadu DoReCo dataset. In Frank Seifart, Ludger Paschen & Matthew Stave (eds.), Language documentation reference corpus (DoReCo) 2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). Available at: https://doreco.huma-num.fr/languages/sadu1234.Search in Google Scholar

Yu, Alan C. L. 2007. A natural history of infixation, vol. 15 (Oxford Studies in Theoretical Linguistics). New York: Oxford University Press.Search in Google Scholar

Zingler, Tim. 2022. Clitics, anti-clitics, and weak words: Towards a typology of prosodic and syntagmatic dependence. Language and Linguistics Compass 16(5–6). e12453. https://doi.org/10.1111/lnc3.12453.Search in Google Scholar

Received: 2024-12-12

Accepted: 2025-10-23

Published Online: 2025-12-19

This work is licensed under the Creative Commons Attribution 4.0 International License.

https://doi.org/10.1515/lingty-2024-0093

Keywords for this article

pause; morphology; phonetics; suffixing preference; endangered languages

Creative Commons

BY 4.0