Home Linguistics & Semiotics How will text size influence the length of its linguistic constituents?
Article
Licensed
Unlicensed Requires Authentication

How will text size influence the length of its linguistic constituents?

  • Huiyuan Jin and Haitao Liu EMAIL logo
Published/Copyright: July 10, 2017

Abstract

As human language is a multi-level complex adaptive system, a text can be seen as emergent from the complex interactions between internal and external factors. Text types such as microblog have size restriction, an external variable which may affect relevant quantitative properties of the texts themselves. Such texts provide a good opportunity to investigate the interactions between the external and internal factors of human language from the perspective of complex adaptive system. This study focuses on how the size restriction of Chinese microblog texts affects the length of their sentences and clauses. Quantitative properties concerning sentence and clause length of Chinese microblog texts are analyzed and compared with those of texts with no size restriction (i.e., prose, news report and romantic fiction). Analysis of sentence length distribution shows that size restriction has an impact on sentence length measured by numbers of words and clauses. The correlation between sentence length and clause length is examined with Menzerath-Altmann law. The satisfying fitting result of microblog texts probably suggests that, language units are able to re-organize and re-regulate their structures according to relevant external factors, until they reach a balanced new system.


Haitao Liu Zhejiang University Ningbo Institute of Technology No. 1 Xuefu Road Ningbo, CN-315100 China

5

5 Acknowledgements

This study was partly supported by the Fundamental Research Funds for the Central Universities (Program of Big Data PLUS Language Universals and Cognition, Zhejiang University) and the MOE Project of the Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies.

References

Altmann, G. 1980. “Prolegomena to Menzerath's law”. Glottometrica 2. 1–10.Search in Google Scholar

Altmann, G. and Schwibbe, M. H. 1989. Das Menzerathsche Gesetz in Informations verarbeitenden Systemen. Hildesheim: Georg Olms Verlag.Search in Google Scholar

Asher, N. and A. Lascarides. 1994. “Intentions and information in discourse”. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics (ACL) 1994, New Mexico, 27–30 June 2004. 34–41.Search in Google Scholar

Au, Ch.P. 2005. Acquisition and evolution of phonological systems. (PhD thesis, City University of Hong Kong.)Search in Google Scholar

Butters, R.R. 2001. “Chance as cause of language variation and change”. Journal of English Linguistics 29 (3). 201–213.10.1177/00754240122005332Search in Google Scholar

Bybee, J. 2001. Phonology and language use. Cambridge: Cambridge University Press.10.1017/CBO9780511612886Search in Google Scholar

Bybee, J. 2007. Frequency of use and the organization of language. Oxford: Oxford University Press.10.1093/acprof:oso/9780195301571.001.0001Search in Google Scholar

Bybee, J and P. Hopper. 2001. Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.10.1075/tsl.45Search in Google Scholar

De Boer, B. 2000. “Self-organization in vowel systems”. Journal of phonetics 28 (4). 441–465.10.1006/jpho.2000.0125Search in Google Scholar

Gerlach, R. 1982. “Zur Überprüfung des Menzerathschen Gesetzes im Bereich der Morphologie”. Glottometrika 4. 95-102.Search in Google Scholar

Gong, T. 2011. “Simulating the coevolution of compositionality and word order regularity”. Interaction Studies 12 (1). 63–106.10.1075/is.12.1.03gonSearch in Google Scholar

Halliday, M.A.K. and C. Matthiessen. 2004. An introduction to functional grammar. (3rd ed.) London: Hodder Arnold.Search in Google Scholar

Heups, G. 1983. “Untersuchungen zum Verhältnis von Satzlänge zu Clauselänge am Beispiel deutscher Texte verschiedener Textklassen”. In: Köhler, G. and J. Boy (eds.), Glottometrika 5. Bochum: Brockmeyer. 113–133.Search in Google Scholar

Hjelmslev, L. 1961. Prolegomena to a theory of language. Madison: University of Wisconsin Press.Search in Google Scholar

Hřebíček, L. 1985. “Text as a unit and co-references”. In: Ballmer, T.T. (ed.), Linguistic dynamics. Discourses, procedures and evolution. Berlin/New York: de Gruyter. 190–198.10.1515/9783110850949-006Search in Google Scholar

Hřebíček, L. 1989. “The Menzerath-Altmann law on the semantic level”. Glottometrika 11. 47–56.Search in Google Scholar

Hřebíček, L. 1992. Text in communication: supra-sentence structures. Bochum: Brockmeyer.Search in Google Scholar

Hřebíček, L. 1995. Text levels. Language constructs, constituents and the Menzerath-Altmann law. Trier: Wissenschaftlicher Verlag Trier.Search in Google Scholar

Hřebíček, L. 1997. Lectures on text theory. Prague: Oriental Institute.Search in Google Scholar

Hřebíček, L. 1998. “Language fractals and measurement in texts”. Archív orientální 66 (3). 233–242.Search in Google Scholar

Hřebíček, L. and G. Altmann. 1993. “Prospects of text linguistics”. In: Altmann, G. and L. Hřebíček (eds.), Quantitative text analysis. Trier: Wissenschaftlicher Verlag Trier. 1–28.Search in Google Scholar

Hu, Y., K. Talamadupula and S. Kambhampati. 2013. “Dude, srsly? The Surprisingly formal nature of Twitter’s language”. Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM), Boston: 8–10, July 2013.10.1609/icwsm.v7i1.14443Search in Google Scholar

Huang, W. and H.T. Liu. 2009. “Application of quantitative characteristics of Chinese genres in text clustering”. Computer Engineering and Applications 45 (29). 25–27.Search in Google Scholar

Hudson, R. 2010. An introduction to word grammar. New York: Cambridge University Press.10.1017/CBO9780511781964Search in Google Scholar

Jiang, J.Y. and H.T. Liu. 2015. “The effects of sentence length on dependency distance, dependency direction and the implications: Based on a parallel English–Chinese dependency treebank. Language Sciences 50. 93–104.10.1016/j.langsci.2015.04.002Search in Google Scholar

Juhan, T. 1995. “Informational measures of causality”. Journal of Quantitative Linguistics 2. 11–14.10.1080/09296179508590028Search in Google Scholar

Ke, J.Y. 2004. Self-organization and language evolution: System, population and individual. (PhD thesis, City University of Hong Kong.)Search in Google Scholar

Keller, R. 1994. On language change: The invisible hand in language. London: Psychology Press.Search in Google Scholar

Kelih, E., P. Grzybek, G. Antić and E. Stadlober. 2006. “Quantitative text typology: The impact of sentence length”. In: Spiliopoulou, M., R. Kruse, C. Borgelt, A. Nürnberger and W. Gaul (eds.), From data and information analysis to knowledge engineering. Berlin: Springer Berlin Heidelberg. 382–389.10.1007/3-540-31314-1_46Search in Google Scholar

Köhler, R. 1986. Zur linguistischen Synergetik. Struktur und Dynamik der Lexik. Bochum: Brockmeyer.Search in Google Scholar

Köhler, R. 1995. “Masseinheiten, Dimensionen und fraktale Strukturen in der Linguistik, Zeit”. Empirische Textforschung 2. 5–6.Search in Google Scholar

Köhler, R. 2005. “Synergetic linguistics”. In: Köhler, R., G. Altmann and R.G. Piotrowski (eds.), Quantitative linguistics. An international handbook. Berlin, New York: de Gruyter. 760–774.Search in Google Scholar

Köhler, R. 2005. “Syntactic units and structures”. In: Köhler, R., G. Altmann and R.G. Piotrowski (eds.), Quantitative linguistics. An international handbook. Berlin, New York: de Gruyter. 274–291.Search in Google Scholar

Köhler, R. and G. Altmann. 2005. “Aims and methods of quantitative linguistics”. In: Altmann, G., V. Levickij and V. Perebyinis (eds.), Problems of quantitative linguistics. Berlin: RAM Verlag. 12–41.10.1515/9783110155785Search in Google Scholar

Köhler, R. 2007. Quantitative analysis of syntactic structures in the framework of synergetic linguistics. Berlin: Springer Berlin Heidelberg. 191–209.10.1007/978-3-540-37522-7_9Search in Google Scholar

Kretzschmar, W.A. 2009. The linguistics of speech. New York: Cambridge University Press.10.1017/CBO9780511576782Search in Google Scholar

Lamb, S.M. 1966. Outline of stratificational grammar. (Revised edition.) Washington: Georgetown University Press.Search in Google Scholar

Larsen-Freeman, D. and L. Cameron. 2008. Complex systems and applied linguistics. Oxford: Oxford University Press.Search in Google Scholar

Liu, H.T. 2014. “Language is more a human-driven system than a semiotic system. Comment on Modelling language evolution: Examples and predictions”. Physics of Life Reviews 11: 309–310.10.1016/j.plrev.2013.12.008Search in Google Scholar

Liu, H.T. and J. Cong. 2014. “Empirical characterization of modern Chinese as a multi-level system from the complex network approach”. Journal of Chinese Linguistics 42 (1). 1–38.Search in Google Scholar

Liu, H.T., C.S. Xu and J.Y. Liang. 2017. “Dependency distance: a new perspective on syntactic patterns in natural languages”. Physics of Life Reviews.10.1016/j.plrev.2017.03.002Search in Google Scholar

Lu, J.G. 2006. “Views on the grammatical position of clauses in Chinese”. Chinese linguistics 3. 2–14.Search in Google Scholar

Lu, Q., C.S. Xu and H.T. Liu. 2016. “Can chunking reduce syntactic complexity of natural languages”? Complexity 21 (S2). 33–41.10.1002/cplx.21779Search in Google Scholar

McEnery, T. and R. Xiao. 2004. “The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study”. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC) 2004, Lisbon, 24–30 May 2004. 1175–1178.Search in Google Scholar

Mel’čuk, I. 1988. Dependency syntax: Theory and practice. Albany: State University of New York PressSearch in Google Scholar

Menzerath, P. 1928. “Über einige phonetische Probleme”. In: Actes du premier congres international de linguistes. Leiden: W. Sijthoff’s Uitgeversmaatschappij. 104–105.Search in Google Scholar

Menzerath, P. 1954. Die Architektonik des deutschen Wortschatzes. Bonn: Dümmler.Search in Google Scholar

Miller, G. 1956. “The magical number seven plus or minus two: some limits on our capacity for processing information”. Psychological Review 63. 81–97.10.1525/9780520318267-011Search in Google Scholar

Pande, H. and H.S. Dhami. 2013. “On mathematical modeling of pattern of occurrence of various constitutional components of language”. International Journal of Mathematics and scientific computing 3 (1). 19–27.Search in Google Scholar

Roberts, A.H. 1965. A statistical linguistic analysis of American English. Berlin: Mouton de Gruyter.10.1515/9783112416426Search in Google Scholar

Rothe, U. 1983. “Wortlänge und Bedeutungsmenge. Eine Untersuchung zum Menzerathschen Gesetz an drei romanischen Sprachen”. Glottometrika 5. 101–112.Search in Google Scholar

Saussure, D.F. and W. Baskin. 2011. Course in general linguistics. New York: Columbia University Press.Search in Google Scholar

Schneider, E. 1997. “Chaos theory as a model for dialect variability and change?” In: Thomas, A.R. (ed.), Issues and methods in dialectology. Bangor: University of Wales Press. 22–36.Search in Google Scholar

Schroeder, M. 1991. Fractals, chaos, power laws. New York: Freeman.Search in Google Scholar

Schwarz, C. 1995. “The distribution of aggregates in texts”. Zeitschrift für empirische Textforschung 2. 62–66.Search in Google Scholar

Sgall, P., E. Hajicová and J. Panevová. 1986. The meaning of the sentence in its semantic and pragmatic aspects. New York: Springer Science and Business Media.Search in Google Scholar

Skyttner, L. 2005. General systems theory: Problems, perspectives, practice. Singapore: World scientific.10.1142/5871Search in Google Scholar

Teupenhayn, R. and G. Altmann. 1984. “Clause length and Menzerath’s law”. Glottometrika 6. 127–138.Search in Google Scholar

Von Bertalanffy, L. 1968. “The meaning of general system theory”. In: Von Bertalanffy, L. (ed.), General system theory: Foundations, development, applications. New York: Braziller. 30–53.Search in Google Scholar

Wang, S.Y. 2006. “Language is a complex adaptive system”. Journal of Tsinghua University (Philosophy and Social Sciences) 21 (6). 5–13.Search in Google Scholar

Xing, F. Y. 1995. “Argument on the theory of clause pivot”. Studies of Chinese Language 6. 420–428.Search in Google Scholar

Appendix I

We will introduce the detailed selecting process concerning the four corpora in our study here. First, Sina Microblog is the most influential and the most popular microblog operator in China’s mainland. Microblog texts from Sina microblog roll of the hour, which represent texts from the most influential micro bloggers, were selected into our research. Sina microblog roll of the hour includes microblog texts from the most influential micro bloggers, such as famous celebrities, media, governments, schools, and websites. These microblog texts, be they personal or official, are frequently transmitted and highly influential, which could accurately represent the language features of Chinese microblog. Table 5 listed the source of these Chinese microblog texts.

Table 5

Sample source of Chinese microblog texts.

SourceNo.TimeCategoryExamples
Most favorite general27408, 2015most favorite microblogOriental metropolis; Focus of thought
Most favorite media9278, 2015media microblogCCTV; music
Most favorite celebrity8258, 2015celebrity microblogLi Kaifu; He jiong
Most favorite websites8128, 2015websites microblogFengxing websites; Mogu Street
Most favorite schools8158, 2015schools microblogPeking Unversity; Zhejiang university
Most favorite governments8818, 2015governments microblogChinese government; China national radio
Total70008, 2015

Second, formal written language may include various text types, among which news texts may have typical characteristics of formal language. In the English corpus of Freiburg-Brown (abbreviated as Frown), there are three text types concerning the domain of news, i.e. press reportage, press editorial, and press reviews. Thus the corpus of news report was chosen from News Co-broadcasting. It is an official news program that being reported in Chinese Central Television 1. It is the most influential and comprehensive news reports, which covers the sub-types of the press reportage, press editorial, and press reviews. Thus news texts of News Co-broadcasting from 2 August 2015 to 18 August 2015 were chosen as the formal written samples in our study.

Third, the corpus of Chinese prose with the diverse stylistic features was built for comparison with microblog texts. Generally speaking, proses may also be divided into three sub-genres, i.e. argumentative prose, narrative prose, and lyric prose, among which the last two sub-genres are more popular than the argumentative type. Since texts of microblogs are restricted with 140 Chinese characters, thus short proses instead of long ones were selected for our research. We then selected altogether 51 pieces of short proses created by different 10 famous Chinese modern prose writers, as listed in Table 6.

Table 6

Sample source of short Chinese modern prose texts.

Writers of prose textsNumber of argumentative prose textsNumber of narrative prose textsNumber of lyric prose textsNumber of word tokens
Ai Qing0135,320
Ba Jin1127,934
Bing Xin01310,213
Feng Zikai0329,982
Lu Xun41111,021
Guo Moruo2118,829
Ji Xianlin22211,347
Qian Zhongshu2219,846
Shi Tiesheng23214,982
Yu Qiuyu22211,090
Total151719100,564

Last, the corpus of Chinese fiction with the colloquial stylistic features was built for comparison with microblog texts. In Frown, there are 6 sub-types in the domain of fiction, i.e. general fiction, mystery and detective fiction, science fiction, adventure fiction, romantic fiction, and humor. One distinguishing feature of fictions from other formal written types of language is that conversations play an important role in portraying the characteristics of figures. Among these sub-genres of fiction, romantic fiction is a popular type in the modern history of Chinese literature. More importantly, romantic fictions may have large amount of psychological and colloquial descriptions when portraying characters. Thus we selected the subgenre of romantic fictions from Chinese modern famous short fictions as our target corpus. More detailed sample source fiction is listed in the following Table 7.

Table 7

Sample source of short Chinese modern romantic fiction texts.

Writers of fictionRomantic fiction textsNumber of word tokens
Zhou Guoping217839
Shen Congwen18231
Zhang Jie17638
Shi Tiesheng15987
Chi Zijian18943
Zhang Ailing323899
Yu Hua15419
Xiao Hong17932
Wang Zengqi18361
Wang Anyi15775
Total10100,024
Published Online: 2017-7-10
Published in Print: 2017-6-27

© 2017 Faculty of English, Adam Mickiewicz University, Poznań, Poland

Downloaded on 13.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/psicl-2017-0008/html
Scroll to top button