Abstract
Sentence length, defined by the number of words contained in a sentence, has always been of great concern in linguistic research. Many studies have been conducted on the distribution of sentence length in specific languages. To further explore the characteristics and patterns of sentence length and their relationship with dependency distance in Spanish, we use the SUD syntactic treebank and conduct a quantitative analysis within the theoretical framework of dependency grammar. It is found that the sentence length distribution of Spanish follows a positive negative binomial model, and there is no significant difference in sentence length distribution among different mean dependency distances (MDDs), but the distribution of the number of sentences in different sentence length intervals follows a normal distribution. In Spanish, sentence length and MDD interact with each other – the longer the sentence is, the greater the MDD is, and vice versa – which is consistent with previous research findings. Also, as sentence length increases in Spanish, short-distance dependencies decrease, but remain within a certain range of fluctuations, which confirms once again that language is a complex and self-adaptive system driven by humans.
Funding source: Graduate Research Training Program of Zhejiang University of Finance and Economics
Award Identifier / Grant number: 23XJKT062
Acknowledgment
We would like to thank for the editors and anonymous reviewers for their insightful and valuable comments on our present paper.
-
Research funding: This work is supported by the Postgraduate Training Program (PRTP) of Zhejiang University of Finance and Economics (23XJKT062).
References
Álvarez-Cañizo, Marta, Suárez-Coalla Paz & Fernando Cuetos. 2018. Reading prosody development in Spanish children. Reading and Writing 31. 35–52. https://doi.org/10.1007/s11145-017-9768-7.Search in Google Scholar
Andrea, Junyent, María Blume, María Fernandez Flecha & Talía Tijero Neyra. 2020. El vocabulario productivo y su relación con la gramática en niños hablantes de castellano peruano entre los 16 y los 30 meses. Interdisciplinaria 37(2). 143–158.10.16888/interd.2020.37.2.9Search in Google Scholar
Best, Karl-Heinz. 2002. The distribution of rhythmic units in German short prose. Glottometrics 3. 136–142.Search in Google Scholar
Bi, Yude & Hua Tan. 2024. Language transfer in L2 academic writings: A dependency grammar approach. Frontiers in Psychology 15. https://doi.org/10.3389/fpsyg.2024.1384629.Search in Google Scholar
Chen, Xinyin & Kim Gerdes. 2022. Dependency distances and their frequencies in Indo-European language. Journal of Quantitative Linguistics 29(1). 106–125. https://doi.org/10.1080/09296174.2020.1771135.Search in Google Scholar
Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. In 34th annual meeting on association for computational linguistics (ACL’ 96). Santa Cruz: Association for Computational Linguistics.10.3115/981863.981888Search in Google Scholar
Fan, Lu & Yue Jiang. 2019. Can dependency distance and direction be used to differentiate translational language from native language? Lingua 224. 51–59. https://doi.org/10.1016/j.lingua.2019.03.004.Search in Google Scholar
Ferrer-i-Cancho, Ramon & Haitao Liu. 2014. The risks of mixing dependency lengths from sequences of different length. Glottotheory 5(2). 143–155. https://doi.org/10.1515/glot-2014-0014.Search in Google Scholar
Ferrer-i-Cancho, Ramon, Carlos Gómez-Rodríguez, Juan Luis Esteban & Lluís Alemany-Puig. 2022. Optimality of syntactic dependency distances. Physical Review E 105(1). https://doi.org/10.1103/PhysRevE.105.014308.Search in Google Scholar
Ferrer-i-Cancho, Ramon. 2004. Euclidean distance between syntactically linked words. Physical Review E 70(5). https://doi.org/10.1103/PhysRevE.70.056135.Search in Google Scholar
Futrell, Richard, Kyle Mahowald & Edward Gibson. 2015. Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences of the United States of America 112(33). 10336–10341. https://doi.org/10.1073/pnas.1502134112.Search in Google Scholar
Gibson, Edward. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition 68(1). 1–76. https://doi.org/10.1016/s0010-0277(98)00034-1.Search in Google Scholar
Gómez-Rodríguez, Carlos, Morten H. Christiansen & Ramon Ferrer-i-Cancho. 2022. Memory limitations are hidden in grammar. Glottometrics 52. 39–64. https://doi.org/10.53482/2022_52_397.Search in Google Scholar
Heringer, Hans Jürgen, Bruno Strecker & Rainer Wimmer. 1980. Syntax: Fragen-Lösungen-Alternativen. Munich: Wilhelm Fink.Search in Google Scholar
Hudson, Richard. 1995. Measuring syntactic difficulty. https://dickhudson.com/wp-content/uploads/2013/07/Difficulty.pdf (accessed 26 August 2024).Search in Google Scholar
Jiang, Jingyang & Haitao Liu. 2015. The effects of sentence length on dependency distance, dependency direction and the implications – based on a parallel English-Chinese dependency treebank. Language Sciences 50. 93–104. https://doi.org/10.1016/j.langsci.2015.04.002.Search in Google Scholar
Kromann, Matthias T. 2006. Discontinuous grammar: A dependency-based model of human parsing and language learning. Frederiksberg: Copenhagen Business School.Search in Google Scholar
Lei, Lei & Ju Wen. 2020. Is dependency distance experiencing a process of minimization? A diachronic study based on the state of the union addresses. Lingua 239. https://doi.org/10.1016/j.lingua.2019.102762.Search in Google Scholar
Liu, Haitao. 2008. Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science 9(2). 159–191. https://doi.org/10.17791/jcs.2008.9.2.159.Search in Google Scholar
Liu, Haitao. 2018. Language as a human-driven complex adaptive system. Physics of Life Reviews 26–27. 149–151. https://doi.org/10.1016/j.plrev.2018.06.006.Search in Google Scholar
Liu, Haitao & Chunshan Xu. 2012. Quantitative typological analysis of Romance languages. Poznań Studies in Contemporary Linguistics 48(4). 597–625. https://doi.org/10.1515/psicl-2012-0027.Search in Google Scholar
Liu, Haitao, Chunshan Xu & Junying Liang. 2017. Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Reviews 21. 171–193. https://doi.org/10.1016/j.plrev.2017.03.002.Search in Google Scholar
Liu, Jinlu, Nan Yang & Haitao Liu. 2024. Distribution of sentence length of English complex sentences. Moderna Sprak 118(3). 51–69. https://doi.org/10.58221/mosp.v118i3.15574.Search in Google Scholar
Lu, Qian & Haitao Liu. 2016. Does dependency distance distribute regularly? Journal of Zhejiang University 4. 63–76.Search in Google Scholar
Lu, Qian, Chunshan Xu & Haitao Liu. 2016. Can chunking reduce syntactic complexity of natural languages? Complexity 21(S2). 33–41. https://doi.org/10.1002/cplx.21779.Search in Google Scholar
Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4). 474–496. https://doi.org/10.1075/ijcl.15.4.02lu.Search in Google Scholar
Medina, Almitra, Gilda Socarrás & Sridhar Krishnamurti. 2020. L2 Spanish listening comprehension: The role of speech rate, utterance length, and L2 oral proficiency. The Modern Language Journal 104(2). 439–456.10.1111/modl.12639Search in Google Scholar
Miller, George A. 1956. The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review 63(2). 81–97. https://doi.org/10.1037/h0043158.Search in Google Scholar
Miller, George A. & Noam Chomsky. 1963. Introduction to the formal analysis of natural languages. New York: Wiley.Search in Google Scholar
Pande, Hemlata & Hoshiyar S. Dhami. 2015. Determination of the distribution of sentence length frequencies for Hindi language texts and utilization of sentence length frequency profiles for authorship attribution. Journal of Quantitative Linguistics 22(4). 338–348. https://doi.org/10.1080/09296174.2015.1106269.Search in Google Scholar
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman.Search in Google Scholar
Rossi, Eleonora & Yanina Prystauka. 2020. Oscillatory brain dynamics of pronoun processing in native Spanish speakers and in late second language learners of Spanish. Bilingualism: Language and Cognition 23(5). 964–977. https://doi.org/10.1017/s1366728919000798.Search in Google Scholar
Sichel, Herbert S. 1974. On a distribution representing sentence-length in written prose. Journal of the Royal Statistical Society: Series A 137(1). 25–34. https://doi.org/10.2307/2345142.Search in Google Scholar
Sigurd, Bengt, Mats Eeg-Olofsson & Joost Van Weijer. 2004. Word length, sentence length and frequency – Zipf revisited. Studia Linguistica 58(1). 37–52. https://doi.org/10.1111/j.0039-3193.2004.00109.x.Search in Google Scholar
Wimmer, Gejza & Gabriel Altmann. 1999. Thesaurus of univariate discrete probability distributions. Germany: Stamm.Search in Google Scholar
Yan, Jianwei & Haitao Liu. 2019. Which annotation scheme is more expedient to measure syntactic difficulty and cognitive demand? In Xinyin Chen & Ramon Ferrer-i-Cancho (eds.), Proceedings of the first workshop on quantitative syntax, 16–24. Paris: Association for Computational Linguistics.10.18653/v1/W19-7903Search in Google Scholar
Yan, Jianwei & Haitao Liu. 2022. Semantic roles or syntactic functions: The effects of annotation scheme on the results of dependency measures. Studia Linguistica 76(2). 406–428. https://doi.org/10.1111/stul.12177.Search in Google Scholar
Zipf, George Kingsley. 1949. Human behaviour and the principle of least effort. Cambridge: Addison-Wesley.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial 2025
- Research Articles
- Vowel formant track normalization using discrete cosine transform coefficients
- Asymmetry in French speech-in-noise perception: the effects of native dialect and cross-dialectal exposure
- Direct pseudo-partitives in US English
- A baseline for object clitic climbing in Italian
- Semantic granularity in derivation
- Shared processing strategies as a mechanism for contact-induced change in flexible constituent order
- The (non)canonical status of the ka- passive in Balinese
- A comparative study of 时 si 2 /shi 2 in Meixian Hakka and Ancient Chinese using the Minimalist Program
- A quantitative method for syntactic gradience: words, phrases, and the constructions in between
- Yeah, but how? Operationalizing the functions of the discourse-pragmatic marker yeah
- Hotspots for acoustic politeness in Korean and Japanese deferential speech
- How fast is fast and how slow is slow in mental simulation? Two rating studies on Estonian speed adverbs
- Discourse effects in processing Chinese reflexive pronouns
- Attitudinal negotiation: the analysis of online commentary videos about an international event on Chinese social media platform bilibili.com
- Crosslinguistic constructions and strategies: where do concessive conditionals fit in?
- Recurring patterns in tone (chain) shift
- Null pronoun interpretation probed via thematic role ambiguity: a case in Korean
- Experimental investigation on quantifier scope in Chinese relative clauses
- Sensitivity to honorific agreement: a window into predictive processing
- The negative concord illusion: an acceptability study with Czech neg-words
- Expletive negation in Italian temporal clauses: an acceptability judgement and a self-paced reading study
- Effects of information structure on pronoun resolution: the number of pronouns matters
- The cognitive processing of nouns and verbs in second language reading: an eye-tracking study
- Comprehension of conversational implicatures in L3 Mandarin
- Effects of crosslinguistic influence in definiteness acquisition: comparing HL-English and HL-Russian bilingual children acquiring Hebrew
- Multimodal language processing in school-aged Mandarin-speaking children: the role of beat gesture in enhancing memory for discourse information
- My Memoji, my self: prosodic correlates of online performed code-switching via avatar
- Gender effects in Mandarin creaky voice evaluation: a matched-guise study
- Narrating the doctoral journey on Chinese social media: chronotopes and scales in user interaction on Xiaohongshu
- Salient Language in Context (SLIC): a web app for collecting real-time attention data in response to audio samples
- Children’s emerging sociolinguistic expectations around social roles: a triangulated approach
- Situating speakers in change: a methodology for quantifying degree and direction of change over the lifespan
- Testing the effect of speech separation on vowel formant estimates
- Researching dialects with high school students: a citizen science approach
- Sociolinguistic research projects as brands
- Do readers perceive various types of knowledge expressed through evidentials in news reports with different degrees of certainty?
- Quantitative relationship between distribution of sentence length and dependency distance in Spanish
- Large corpora and large language models: a replicable method for automating grammatical annotation
- Using ATLAS.ti for constructing and analysing multimodal social media corpora
- Exploring the effect of semantic diversity on boundary permeability in verb/noun heterosemy using deep contextualized word embedding
- Communicative pressures influence the use of adverbs as well as adjectives: evidence from a crosslinguistic investigation
- Non-signers favor two-handed gestures when expressing inherently plural meanings
- Encoding Chinese metaphorical motion: a typological perspective
- Frequency does not predict the processing speed of multi-morpheme sequences in Japanese
- Did he lead monologues or did he talk to himself? How typological distance between source and target language influences the preservation of metaphorical mappings in translation
- How long is too long? Production-internal and communicative constraints in the coding of conditionality in Spanish
- Long English objects and short Chinese objects: language diversity shaped by cognitive universality
- Corrigendum
- Corrigendum to: Sign recognition: the effect of parameters and features in sign mispronunciations
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial 2025
- Research Articles
- Vowel formant track normalization using discrete cosine transform coefficients
- Asymmetry in French speech-in-noise perception: the effects of native dialect and cross-dialectal exposure
- Direct pseudo-partitives in US English
- A baseline for object clitic climbing in Italian
- Semantic granularity in derivation
- Shared processing strategies as a mechanism for contact-induced change in flexible constituent order
- The (non)canonical status of the ka- passive in Balinese
- A comparative study of 时 si 2 /shi 2 in Meixian Hakka and Ancient Chinese using the Minimalist Program
- A quantitative method for syntactic gradience: words, phrases, and the constructions in between
- Yeah, but how? Operationalizing the functions of the discourse-pragmatic marker yeah
- Hotspots for acoustic politeness in Korean and Japanese deferential speech
- How fast is fast and how slow is slow in mental simulation? Two rating studies on Estonian speed adverbs
- Discourse effects in processing Chinese reflexive pronouns
- Attitudinal negotiation: the analysis of online commentary videos about an international event on Chinese social media platform bilibili.com
- Crosslinguistic constructions and strategies: where do concessive conditionals fit in?
- Recurring patterns in tone (chain) shift
- Null pronoun interpretation probed via thematic role ambiguity: a case in Korean
- Experimental investigation on quantifier scope in Chinese relative clauses
- Sensitivity to honorific agreement: a window into predictive processing
- The negative concord illusion: an acceptability study with Czech neg-words
- Expletive negation in Italian temporal clauses: an acceptability judgement and a self-paced reading study
- Effects of information structure on pronoun resolution: the number of pronouns matters
- The cognitive processing of nouns and verbs in second language reading: an eye-tracking study
- Comprehension of conversational implicatures in L3 Mandarin
- Effects of crosslinguistic influence in definiteness acquisition: comparing HL-English and HL-Russian bilingual children acquiring Hebrew
- Multimodal language processing in school-aged Mandarin-speaking children: the role of beat gesture in enhancing memory for discourse information
- My Memoji, my self: prosodic correlates of online performed code-switching via avatar
- Gender effects in Mandarin creaky voice evaluation: a matched-guise study
- Narrating the doctoral journey on Chinese social media: chronotopes and scales in user interaction on Xiaohongshu
- Salient Language in Context (SLIC): a web app for collecting real-time attention data in response to audio samples
- Children’s emerging sociolinguistic expectations around social roles: a triangulated approach
- Situating speakers in change: a methodology for quantifying degree and direction of change over the lifespan
- Testing the effect of speech separation on vowel formant estimates
- Researching dialects with high school students: a citizen science approach
- Sociolinguistic research projects as brands
- Do readers perceive various types of knowledge expressed through evidentials in news reports with different degrees of certainty?
- Quantitative relationship between distribution of sentence length and dependency distance in Spanish
- Large corpora and large language models: a replicable method for automating grammatical annotation
- Using ATLAS.ti for constructing and analysing multimodal social media corpora
- Exploring the effect of semantic diversity on boundary permeability in verb/noun heterosemy using deep contextualized word embedding
- Communicative pressures influence the use of adverbs as well as adjectives: evidence from a crosslinguistic investigation
- Non-signers favor two-handed gestures when expressing inherently plural meanings
- Encoding Chinese metaphorical motion: a typological perspective
- Frequency does not predict the processing speed of multi-morpheme sequences in Japanese
- Did he lead monologues or did he talk to himself? How typological distance between source and target language influences the preservation of metaphorical mappings in translation
- How long is too long? Production-internal and communicative constraints in the coding of conditionality in Spanish
- Long English objects and short Chinese objects: language diversity shaped by cognitive universality
- Corrigendum
- Corrigendum to: Sign recognition: the effect of parameters and features in sign mispronunciations