Startseite Linguistik & Semiotik How will text size influence the length of its linguistic constituents?
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

How will text size influence the length of its linguistic constituents?

  • Huiyuan Jin und Haitao Liu EMAIL logo
Veröffentlicht/Copyright: 10. Juli 2017

Abstract

As human language is a multi-level complex adaptive system, a text can be seen as emergent from the complex interactions between internal and external factors. Text types such as microblog have size restriction, an external variable which may affect relevant quantitative properties of the texts themselves. Such texts provide a good opportunity to investigate the interactions between the external and internal factors of human language from the perspective of complex adaptive system. This study focuses on how the size restriction of Chinese microblog texts affects the length of their sentences and clauses. Quantitative properties concerning sentence and clause length of Chinese microblog texts are analyzed and compared with those of texts with no size restriction (i.e., prose, news report and romantic fiction). Analysis of sentence length distribution shows that size restriction has an impact on sentence length measured by numbers of words and clauses. The correlation between sentence length and clause length is examined with Menzerath-Altmann law. The satisfying fitting result of microblog texts probably suggests that, language units are able to re-organize and re-regulate their structures according to relevant external factors, until they reach a balanced new system.


Haitao Liu Zhejiang University Ningbo Institute of Technology No. 1 Xuefu Road Ningbo, CN-315100 China

5

5 Acknowledgements

This study was partly supported by the Fundamental Research Funds for the Central Universities (Program of Big Data PLUS Language Universals and Cognition, Zhejiang University) and the MOE Project of the Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies.

References

Altmann, G. 1980. “Prolegomena to Menzerath's law”. Glottometrica 2. 1–10.Suche in Google Scholar

Altmann, G. and Schwibbe, M. H. 1989. Das Menzerathsche Gesetz in Informations verarbeitenden Systemen. Hildesheim: Georg Olms Verlag.Suche in Google Scholar

Asher, N. and A. Lascarides. 1994. “Intentions and information in discourse”. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics (ACL) 1994, New Mexico, 27–30 June 2004. 34–41.Suche in Google Scholar

Au, Ch.P. 2005. Acquisition and evolution of phonological systems. (PhD thesis, City University of Hong Kong.)Suche in Google Scholar

Butters, R.R. 2001. “Chance as cause of language variation and change”. Journal of English Linguistics 29 (3). 201–213.10.1177/00754240122005332Suche in Google Scholar

Bybee, J. 2001. Phonology and language use. Cambridge: Cambridge University Press.10.1017/CBO9780511612886Suche in Google Scholar

Bybee, J. 2007. Frequency of use and the organization of language. Oxford: Oxford University Press.10.1093/acprof:oso/9780195301571.001.0001Suche in Google Scholar

Bybee, J and P. Hopper. 2001. Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.10.1075/tsl.45Suche in Google Scholar

De Boer, B. 2000. “Self-organization in vowel systems”. Journal of phonetics 28 (4). 441–465.10.1006/jpho.2000.0125Suche in Google Scholar

Gerlach, R. 1982. “Zur Überprüfung des Menzerathschen Gesetzes im Bereich der Morphologie”. Glottometrika 4. 95-102.Suche in Google Scholar

Gong, T. 2011. “Simulating the coevolution of compositionality and word order regularity”. Interaction Studies 12 (1). 63–106.10.1075/is.12.1.03gonSuche in Google Scholar

Halliday, M.A.K. and C. Matthiessen. 2004. An introduction to functional grammar. (3rd ed.) London: Hodder Arnold.Suche in Google Scholar

Heups, G. 1983. “Untersuchungen zum Verhältnis von Satzlänge zu Clauselänge am Beispiel deutscher Texte verschiedener Textklassen”. In: Köhler, G. and J. Boy (eds.), Glottometrika 5. Bochum: Brockmeyer. 113–133.Suche in Google Scholar

Hjelmslev, L. 1961. Prolegomena to a theory of language. Madison: University of Wisconsin Press.Suche in Google Scholar

Hřebíček, L. 1985. “Text as a unit and co-references”. In: Ballmer, T.T. (ed.), Linguistic dynamics. Discourses, procedures and evolution. Berlin/New York: de Gruyter. 190–198.10.1515/9783110850949-006Suche in Google Scholar

Hřebíček, L. 1989. “The Menzerath-Altmann law on the semantic level”. Glottometrika 11. 47–56.Suche in Google Scholar

Hřebíček, L. 1992. Text in communication: supra-sentence structures. Bochum: Brockmeyer.Suche in Google Scholar

Hřebíček, L. 1995. Text levels. Language constructs, constituents and the Menzerath-Altmann law. Trier: Wissenschaftlicher Verlag Trier.Suche in Google Scholar

Hřebíček, L. 1997. Lectures on text theory. Prague: Oriental Institute.Suche in Google Scholar

Hřebíček, L. 1998. “Language fractals and measurement in texts”. Archív orientální 66 (3). 233–242.Suche in Google Scholar

Hřebíček, L. and G. Altmann. 1993. “Prospects of text linguistics”. In: Altmann, G. and L. Hřebíček (eds.), Quantitative text analysis. Trier: Wissenschaftlicher Verlag Trier. 1–28.Suche in Google Scholar

Hu, Y., K. Talamadupula and S. Kambhampati. 2013. “Dude, srsly? The Surprisingly formal nature of Twitter’s language”. Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM), Boston: 8–10, July 2013.10.1609/icwsm.v7i1.14443Suche in Google Scholar

Huang, W. and H.T. Liu. 2009. “Application of quantitative characteristics of Chinese genres in text clustering”. Computer Engineering and Applications 45 (29). 25–27.Suche in Google Scholar

Hudson, R. 2010. An introduction to word grammar. New York: Cambridge University Press.10.1017/CBO9780511781964Suche in Google Scholar

Jiang, J.Y. and H.T. Liu. 2015. “The effects of sentence length on dependency distance, dependency direction and the implications: Based on a parallel English–Chinese dependency treebank. Language Sciences 50. 93–104.10.1016/j.langsci.2015.04.002Suche in Google Scholar

Juhan, T. 1995. “Informational measures of causality”. Journal of Quantitative Linguistics 2. 11–14.10.1080/09296179508590028Suche in Google Scholar

Ke, J.Y. 2004. Self-organization and language evolution: System, population and individual. (PhD thesis, City University of Hong Kong.)Suche in Google Scholar

Keller, R. 1994. On language change: The invisible hand in language. London: Psychology Press.Suche in Google Scholar

Kelih, E., P. Grzybek, G. Antić and E. Stadlober. 2006. “Quantitative text typology: The impact of sentence length”. In: Spiliopoulou, M., R. Kruse, C. Borgelt, A. Nürnberger and W. Gaul (eds.), From data and information analysis to knowledge engineering. Berlin: Springer Berlin Heidelberg. 382–389.10.1007/3-540-31314-1_46Suche in Google Scholar

Köhler, R. 1986. Zur linguistischen Synergetik. Struktur und Dynamik der Lexik. Bochum: Brockmeyer.Suche in Google Scholar

Köhler, R. 1995. “Masseinheiten, Dimensionen und fraktale Strukturen in der Linguistik, Zeit”. Empirische Textforschung 2. 5–6.Suche in Google Scholar

Köhler, R. 2005. “Synergetic linguistics”. In: Köhler, R., G. Altmann and R.G. Piotrowski (eds.), Quantitative linguistics. An international handbook. Berlin, New York: de Gruyter. 760–774.Suche in Google Scholar

Köhler, R. 2005. “Syntactic units and structures”. In: Köhler, R., G. Altmann and R.G. Piotrowski (eds.), Quantitative linguistics. An international handbook. Berlin, New York: de Gruyter. 274–291.Suche in Google Scholar

Köhler, R. and G. Altmann. 2005. “Aims and methods of quantitative linguistics”. In: Altmann, G., V. Levickij and V. Perebyinis (eds.), Problems of quantitative linguistics. Berlin: RAM Verlag. 12–41.10.1515/9783110155785Suche in Google Scholar

Köhler, R. 2007. Quantitative analysis of syntactic structures in the framework of synergetic linguistics. Berlin: Springer Berlin Heidelberg. 191–209.10.1007/978-3-540-37522-7_9Suche in Google Scholar

Kretzschmar, W.A. 2009. The linguistics of speech. New York: Cambridge University Press.10.1017/CBO9780511576782Suche in Google Scholar

Lamb, S.M. 1966. Outline of stratificational grammar. (Revised edition.) Washington: Georgetown University Press.Suche in Google Scholar

Larsen-Freeman, D. and L. Cameron. 2008. Complex systems and applied linguistics. Oxford: Oxford University Press.Suche in Google Scholar

Liu, H.T. 2014. “Language is more a human-driven system than a semiotic system. Comment on Modelling language evolution: Examples and predictions”. Physics of Life Reviews 11: 309–310.10.1016/j.plrev.2013.12.008Suche in Google Scholar

Liu, H.T. and J. Cong. 2014. “Empirical characterization of modern Chinese as a multi-level system from the complex network approach”. Journal of Chinese Linguistics 42 (1). 1–38.Suche in Google Scholar

Liu, H.T., C.S. Xu and J.Y. Liang. 2017. “Dependency distance: a new perspective on syntactic patterns in natural languages”. Physics of Life Reviews.10.1016/j.plrev.2017.03.002Suche in Google Scholar

Lu, J.G. 2006. “Views on the grammatical position of clauses in Chinese”. Chinese linguistics 3. 2–14.Suche in Google Scholar

Lu, Q., C.S. Xu and H.T. Liu. 2016. “Can chunking reduce syntactic complexity of natural languages”? Complexity 21 (S2). 33–41.10.1002/cplx.21779Suche in Google Scholar

McEnery, T. and R. Xiao. 2004. “The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study”. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC) 2004, Lisbon, 24–30 May 2004. 1175–1178.Suche in Google Scholar

Mel’čuk, I. 1988. Dependency syntax: Theory and practice. Albany: State University of New York PressSuche in Google Scholar

Menzerath, P. 1928. “Über einige phonetische Probleme”. In: Actes du premier congres international de linguistes. Leiden: W. Sijthoff’s Uitgeversmaatschappij. 104–105.Suche in Google Scholar

Menzerath, P. 1954. Die Architektonik des deutschen Wortschatzes. Bonn: Dümmler.Suche in Google Scholar

Miller, G. 1956. “The magical number seven plus or minus two: some limits on our capacity for processing information”. Psychological Review 63. 81–97.10.1525/9780520318267-011Suche in Google Scholar

Pande, H. and H.S. Dhami. 2013. “On mathematical modeling of pattern of occurrence of various constitutional components of language”. International Journal of Mathematics and scientific computing 3 (1). 19–27.Suche in Google Scholar

Roberts, A.H. 1965. A statistical linguistic analysis of American English. Berlin: Mouton de Gruyter.10.1515/9783112416426Suche in Google Scholar

Rothe, U. 1983. “Wortlänge und Bedeutungsmenge. Eine Untersuchung zum Menzerathschen Gesetz an drei romanischen Sprachen”. Glottometrika 5. 101–112.Suche in Google Scholar

Saussure, D.F. and W. Baskin. 2011. Course in general linguistics. New York: Columbia University Press.Suche in Google Scholar

Schneider, E. 1997. “Chaos theory as a model for dialect variability and change?” In: Thomas, A.R. (ed.), Issues and methods in dialectology. Bangor: University of Wales Press. 22–36.Suche in Google Scholar

Schroeder, M. 1991. Fractals, chaos, power laws. New York: Freeman.Suche in Google Scholar

Schwarz, C. 1995. “The distribution of aggregates in texts”. Zeitschrift für empirische Textforschung 2. 62–66.Suche in Google Scholar

Sgall, P., E. Hajicová and J. Panevová. 1986. The meaning of the sentence in its semantic and pragmatic aspects. New York: Springer Science and Business Media.Suche in Google Scholar

Skyttner, L. 2005. General systems theory: Problems, perspectives, practice. Singapore: World scientific.10.1142/5871Suche in Google Scholar

Teupenhayn, R. and G. Altmann. 1984. “Clause length and Menzerath’s law”. Glottometrika 6. 127–138.Suche in Google Scholar

Von Bertalanffy, L. 1968. “The meaning of general system theory”. In: Von Bertalanffy, L. (ed.), General system theory: Foundations, development, applications. New York: Braziller. 30–53.Suche in Google Scholar

Wang, S.Y. 2006. “Language is a complex adaptive system”. Journal of Tsinghua University (Philosophy and Social Sciences) 21 (6). 5–13.Suche in Google Scholar

Xing, F. Y. 1995. “Argument on the theory of clause pivot”. Studies of Chinese Language 6. 420–428.Suche in Google Scholar

Appendix I

We will introduce the detailed selecting process concerning the four corpora in our study here. First, Sina Microblog is the most influential and the most popular microblog operator in China’s mainland. Microblog texts from Sina microblog roll of the hour, which represent texts from the most influential micro bloggers, were selected into our research. Sina microblog roll of the hour includes microblog texts from the most influential micro bloggers, such as famous celebrities, media, governments, schools, and websites. These microblog texts, be they personal or official, are frequently transmitted and highly influential, which could accurately represent the language features of Chinese microblog. Table 5 listed the source of these Chinese microblog texts.

Table 5

Sample source of Chinese microblog texts.

SourceNo.TimeCategoryExamples
Most favorite general27408, 2015most favorite microblogOriental metropolis; Focus of thought
Most favorite media9278, 2015media microblogCCTV; music
Most favorite celebrity8258, 2015celebrity microblogLi Kaifu; He jiong
Most favorite websites8128, 2015websites microblogFengxing websites; Mogu Street
Most favorite schools8158, 2015schools microblogPeking Unversity; Zhejiang university
Most favorite governments8818, 2015governments microblogChinese government; China national radio
Total70008, 2015

Second, formal written language may include various text types, among which news texts may have typical characteristics of formal language. In the English corpus of Freiburg-Brown (abbreviated as Frown), there are three text types concerning the domain of news, i.e. press reportage, press editorial, and press reviews. Thus the corpus of news report was chosen from News Co-broadcasting. It is an official news program that being reported in Chinese Central Television 1. It is the most influential and comprehensive news reports, which covers the sub-types of the press reportage, press editorial, and press reviews. Thus news texts of News Co-broadcasting from 2 August 2015 to 18 August 2015 were chosen as the formal written samples in our study.

Third, the corpus of Chinese prose with the diverse stylistic features was built for comparison with microblog texts. Generally speaking, proses may also be divided into three sub-genres, i.e. argumentative prose, narrative prose, and lyric prose, among which the last two sub-genres are more popular than the argumentative type. Since texts of microblogs are restricted with 140 Chinese characters, thus short proses instead of long ones were selected for our research. We then selected altogether 51 pieces of short proses created by different 10 famous Chinese modern prose writers, as listed in Table 6.

Table 6

Sample source of short Chinese modern prose texts.

Writers of prose textsNumber of argumentative prose textsNumber of narrative prose textsNumber of lyric prose textsNumber of word tokens
Ai Qing0135,320
Ba Jin1127,934
Bing Xin01310,213
Feng Zikai0329,982
Lu Xun41111,021
Guo Moruo2118,829
Ji Xianlin22211,347
Qian Zhongshu2219,846
Shi Tiesheng23214,982
Yu Qiuyu22211,090
Total151719100,564

Last, the corpus of Chinese fiction with the colloquial stylistic features was built for comparison with microblog texts. In Frown, there are 6 sub-types in the domain of fiction, i.e. general fiction, mystery and detective fiction, science fiction, adventure fiction, romantic fiction, and humor. One distinguishing feature of fictions from other formal written types of language is that conversations play an important role in portraying the characteristics of figures. Among these sub-genres of fiction, romantic fiction is a popular type in the modern history of Chinese literature. More importantly, romantic fictions may have large amount of psychological and colloquial descriptions when portraying characters. Thus we selected the subgenre of romantic fictions from Chinese modern famous short fictions as our target corpus. More detailed sample source fiction is listed in the following Table 7.

Table 7

Sample source of short Chinese modern romantic fiction texts.

Writers of fictionRomantic fiction textsNumber of word tokens
Zhou Guoping217839
Shen Congwen18231
Zhang Jie17638
Shi Tiesheng15987
Chi Zijian18943
Zhang Ailing323899
Yu Hua15419
Xiao Hong17932
Wang Zengqi18361
Wang Anyi15775
Total10100,024
Published Online: 2017-7-10
Published in Print: 2017-6-27

© 2017 Faculty of English, Adam Mickiewicz University, Poznań, Poland

Heruntergeladen am 13.12.2025 von https://www.degruyterbrill.com/document/doi/10.1515/psicl-2017-0008/html
Button zum nach oben scrollen