Abstract
As human language is a multi-level complex adaptive system, a text can be seen as emergent from the complex interactions between internal and external factors. Text types such as microblog have size restriction, an external variable which may affect relevant quantitative properties of the texts themselves. Such texts provide a good opportunity to investigate the interactions between the external and internal factors of human language from the perspective of complex adaptive system. This study focuses on how the size restriction of Chinese microblog texts affects the length of their sentences and clauses. Quantitative properties concerning sentence and clause length of Chinese microblog texts are analyzed and compared with those of texts with no size restriction (i.e., prose, news report and romantic fiction). Analysis of sentence length distribution shows that size restriction has an impact on sentence length measured by numbers of words and clauses. The correlation between sentence length and clause length is examined with Menzerath-Altmann law. The satisfying fitting result of microblog texts probably suggests that, language units are able to re-organize and re-regulate their structures according to relevant external factors, until they reach a balanced new system.
5 Acknowledgements
This study was partly supported by the Fundamental Research Funds for the Central Universities (Program of Big Data PLUS Language Universals and Cognition, Zhejiang University) and the MOE Project of the Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies.
References
Altmann, G. 1980. “Prolegomena to Menzerath's law”. Glottometrica 2. 1–10.Search in Google Scholar
Altmann, G. and Schwibbe, M. H. 1989. Das Menzerathsche Gesetz in Informations verarbeitenden Systemen. Hildesheim: Georg Olms Verlag.Search in Google Scholar
Asher, N. and A. Lascarides. 1994. “Intentions and information in discourse”. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics (ACL) 1994, New Mexico, 27–30 June 2004. 34–41.Search in Google Scholar
Au, Ch.P. 2005. Acquisition and evolution of phonological systems. (PhD thesis, City University of Hong Kong.)Search in Google Scholar
Butters, R.R. 2001. “Chance as cause of language variation and change”. Journal of English Linguistics 29 (3). 201–213.10.1177/00754240122005332Search in Google Scholar
Bybee, J. 2001. Phonology and language use. Cambridge: Cambridge University Press.10.1017/CBO9780511612886Search in Google Scholar
Bybee, J. 2007. Frequency of use and the organization of language. Oxford: Oxford University Press.10.1093/acprof:oso/9780195301571.001.0001Search in Google Scholar
Bybee, J and P. Hopper. 2001. Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.10.1075/tsl.45Search in Google Scholar
De Boer, B. 2000. “Self-organization in vowel systems”. Journal of phonetics 28 (4). 441–465.10.1006/jpho.2000.0125Search in Google Scholar
Gerlach, R. 1982. “Zur Überprüfung des Menzerathschen Gesetzes im Bereich der Morphologie”. Glottometrika 4. 95-102.Search in Google Scholar
Gong, T. 2011. “Simulating the coevolution of compositionality and word order regularity”. Interaction Studies 12 (1). 63–106.10.1075/is.12.1.03gonSearch in Google Scholar
Halliday, M.A.K. and C. Matthiessen. 2004. An introduction to functional grammar. (3rd ed.) London: Hodder Arnold.Search in Google Scholar
Heups, G. 1983. “Untersuchungen zum Verhältnis von Satzlänge zu Clauselänge am Beispiel deutscher Texte verschiedener Textklassen”. In: Köhler, G. and J. Boy (eds.), Glottometrika 5. Bochum: Brockmeyer. 113–133.Search in Google Scholar
Hjelmslev, L. 1961. Prolegomena to a theory of language. Madison: University of Wisconsin Press.Search in Google Scholar
Hřebíček, L. 1985. “Text as a unit and co-references”. In: Ballmer, T.T. (ed.), Linguistic dynamics. Discourses, procedures and evolution. Berlin/New York: de Gruyter. 190–198.10.1515/9783110850949-006Search in Google Scholar
Hřebíček, L. 1989. “The Menzerath-Altmann law on the semantic level”. Glottometrika 11. 47–56.Search in Google Scholar
Hřebíček, L. 1992. Text in communication: supra-sentence structures. Bochum: Brockmeyer.Search in Google Scholar
Hřebíček, L. 1995. Text levels. Language constructs, constituents and the Menzerath-Altmann law. Trier: Wissenschaftlicher Verlag Trier.Search in Google Scholar
Hřebíček, L. 1997. Lectures on text theory. Prague: Oriental Institute.Search in Google Scholar
Hřebíček, L. 1998. “Language fractals and measurement in texts”. Archív orientální 66 (3). 233–242.Search in Google Scholar
Hřebíček, L. and G. Altmann. 1993. “Prospects of text linguistics”. In: Altmann, G. and L. Hřebíček (eds.), Quantitative text analysis. Trier: Wissenschaftlicher Verlag Trier. 1–28.Search in Google Scholar
Hu, Y., K. Talamadupula and S. Kambhampati. 2013. “Dude, srsly? The Surprisingly formal nature of Twitter’s language”. Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM), Boston: 8–10, July 2013.10.1609/icwsm.v7i1.14443Search in Google Scholar
Huang, W. and H.T. Liu. 2009. “Application of quantitative characteristics of Chinese genres in text clustering”. Computer Engineering and Applications 45 (29). 25–27.Search in Google Scholar
Hudson, R. 2010. An introduction to word grammar. New York: Cambridge University Press.10.1017/CBO9780511781964Search in Google Scholar
Jiang, J.Y. and H.T. Liu. 2015. “The effects of sentence length on dependency distance, dependency direction and the implications: Based on a parallel English–Chinese dependency treebank. Language Sciences 50. 93–104.10.1016/j.langsci.2015.04.002Search in Google Scholar
Juhan, T. 1995. “Informational measures of causality”. Journal of Quantitative Linguistics 2. 11–14.10.1080/09296179508590028Search in Google Scholar
Ke, J.Y. 2004. Self-organization and language evolution: System, population and individual. (PhD thesis, City University of Hong Kong.)Search in Google Scholar
Keller, R. 1994. On language change: The invisible hand in language. London: Psychology Press.Search in Google Scholar
Kelih, E., P. Grzybek, G. Antić and E. Stadlober. 2006. “Quantitative text typology: The impact of sentence length”. In: Spiliopoulou, M., R. Kruse, C. Borgelt, A. Nürnberger and W. Gaul (eds.), From data and information analysis to knowledge engineering. Berlin: Springer Berlin Heidelberg. 382–389.10.1007/3-540-31314-1_46Search in Google Scholar
Köhler, R. 1986. Zur linguistischen Synergetik. Struktur und Dynamik der Lexik. Bochum: Brockmeyer.Search in Google Scholar
Köhler, R. 1995. “Masseinheiten, Dimensionen und fraktale Strukturen in der Linguistik, Zeit”. Empirische Textforschung 2. 5–6.Search in Google Scholar
Köhler, R. 2005. “Synergetic linguistics”. In: Köhler, R., G. Altmann and R.G. Piotrowski (eds.), Quantitative linguistics. An international handbook. Berlin, New York: de Gruyter. 760–774.Search in Google Scholar
Köhler, R. 2005. “Syntactic units and structures”. In: Köhler, R., G. Altmann and R.G. Piotrowski (eds.), Quantitative linguistics. An international handbook. Berlin, New York: de Gruyter. 274–291.Search in Google Scholar
Köhler, R. and G. Altmann. 2005. “Aims and methods of quantitative linguistics”. In: Altmann, G., V. Levickij and V. Perebyinis (eds.), Problems of quantitative linguistics. Berlin: RAM Verlag. 12–41.10.1515/9783110155785Search in Google Scholar
Köhler, R. 2007. Quantitative analysis of syntactic structures in the framework of synergetic linguistics. Berlin: Springer Berlin Heidelberg. 191–209.10.1007/978-3-540-37522-7_9Search in Google Scholar
Kretzschmar, W.A. 2009. The linguistics of speech. New York: Cambridge University Press.10.1017/CBO9780511576782Search in Google Scholar
Lamb, S.M. 1966. Outline of stratificational grammar. (Revised edition.) Washington: Georgetown University Press.Search in Google Scholar
Larsen-Freeman, D. and L. Cameron. 2008. Complex systems and applied linguistics. Oxford: Oxford University Press.Search in Google Scholar
Liu, H.T. 2014. “Language is more a human-driven system than a semiotic system. Comment on Modelling language evolution: Examples and predictions”. Physics of Life Reviews 11: 309–310.10.1016/j.plrev.2013.12.008Search in Google Scholar
Liu, H.T. and J. Cong. 2014. “Empirical characterization of modern Chinese as a multi-level system from the complex network approach”. Journal of Chinese Linguistics 42 (1). 1–38.Search in Google Scholar
Liu, H.T., C.S. Xu and J.Y. Liang. 2017. “Dependency distance: a new perspective on syntactic patterns in natural languages”. Physics of Life Reviews.10.1016/j.plrev.2017.03.002Search in Google Scholar
Lu, J.G. 2006. “Views on the grammatical position of clauses in Chinese”. Chinese linguistics 3. 2–14.Search in Google Scholar
Lu, Q., C.S. Xu and H.T. Liu. 2016. “Can chunking reduce syntactic complexity of natural languages”? Complexity 21 (S2). 33–41.10.1002/cplx.21779Search in Google Scholar
McEnery, T. and R. Xiao. 2004. “The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study”. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC) 2004, Lisbon, 24–30 May 2004. 1175–1178.Search in Google Scholar
Mel’čuk, I. 1988. Dependency syntax: Theory and practice. Albany: State University of New York PressSearch in Google Scholar
Menzerath, P. 1928. “Über einige phonetische Probleme”. In: Actes du premier congres international de linguistes. Leiden: W. Sijthoff’s Uitgeversmaatschappij. 104–105.Search in Google Scholar
Menzerath, P. 1954. Die Architektonik des deutschen Wortschatzes. Bonn: Dümmler.Search in Google Scholar
Miller, G. 1956. “The magical number seven plus or minus two: some limits on our capacity for processing information”. Psychological Review 63. 81–97.10.1525/9780520318267-011Search in Google Scholar
Pande, H. and H.S. Dhami. 2013. “On mathematical modeling of pattern of occurrence of various constitutional components of language”. International Journal of Mathematics and scientific computing 3 (1). 19–27.Search in Google Scholar
Roberts, A.H. 1965. A statistical linguistic analysis of American English. Berlin: Mouton de Gruyter.10.1515/9783112416426Search in Google Scholar
Rothe, U. 1983. “Wortlänge und Bedeutungsmenge. Eine Untersuchung zum Menzerathschen Gesetz an drei romanischen Sprachen”. Glottometrika 5. 101–112.Search in Google Scholar
Saussure, D.F. and W. Baskin. 2011. Course in general linguistics. New York: Columbia University Press.Search in Google Scholar
Schneider, E. 1997. “Chaos theory as a model for dialect variability and change?” In: Thomas, A.R. (ed.), Issues and methods in dialectology. Bangor: University of Wales Press. 22–36.Search in Google Scholar
Schroeder, M. 1991. Fractals, chaos, power laws. New York: Freeman.Search in Google Scholar
Schwarz, C. 1995. “The distribution of aggregates in texts”. Zeitschrift für empirische Textforschung 2. 62–66.Search in Google Scholar
Sgall, P., E. Hajicová and J. Panevová. 1986. The meaning of the sentence in its semantic and pragmatic aspects. New York: Springer Science and Business Media.Search in Google Scholar
Skyttner, L. 2005. General systems theory: Problems, perspectives, practice. Singapore: World scientific.10.1142/5871Search in Google Scholar
Teupenhayn, R. and G. Altmann. 1984. “Clause length and Menzerath’s law”. Glottometrika 6. 127–138.Search in Google Scholar
Von Bertalanffy, L. 1968. “The meaning of general system theory”. In: Von Bertalanffy, L. (ed.), General system theory: Foundations, development, applications. New York: Braziller. 30–53.Search in Google Scholar
Wang, S.Y. 2006. “Language is a complex adaptive system”. Journal of Tsinghua University (Philosophy and Social Sciences) 21 (6). 5–13.Search in Google Scholar
Xing, F. Y. 1995. “Argument on the theory of clause pivot”. Studies of Chinese Language 6. 420–428.Search in Google Scholar
Appendix I
We will introduce the detailed selecting process concerning the four corpora in our study here. First, Sina Microblog is the most influential and the most popular microblog operator in China’s mainland. Microblog texts from Sina microblog roll of the hour, which represent texts from the most influential micro bloggers, were selected into our research. Sina microblog roll of the hour includes microblog texts from the most influential micro bloggers, such as famous celebrities, media, governments, schools, and websites. These microblog texts, be they personal or official, are frequently transmitted and highly influential, which could accurately represent the language features of Chinese microblog. Table 5 listed the source of these Chinese microblog texts.
Sample source of Chinese microblog texts.
| Source | No. | Time | Category | Examples |
|---|---|---|---|---|
| Most favorite general | 2740 | 8, 2015 | most favorite microblog | Oriental metropolis; Focus of thought |
| Most favorite media | 927 | 8, 2015 | media microblog | CCTV; music |
| Most favorite celebrity | 825 | 8, 2015 | celebrity microblog | Li Kaifu; He jiong |
| Most favorite websites | 812 | 8, 2015 | websites microblog | Fengxing websites; Mogu Street |
| Most favorite schools | 815 | 8, 2015 | schools microblog | Peking Unversity; Zhejiang university |
| Most favorite governments | 881 | 8, 2015 | governments microblog | Chinese government; China national radio |
| Total | 7000 | 8, 2015 | ||
Second, formal written language may include various text types, among which news texts may have typical characteristics of formal language. In the English corpus of Freiburg-Brown (abbreviated as Frown), there are three text types concerning the domain of news, i.e. press reportage, press editorial, and press reviews. Thus the corpus of news report was chosen from News Co-broadcasting. It is an official news program that being reported in Chinese Central Television 1. It is the most influential and comprehensive news reports, which covers the sub-types of the press reportage, press editorial, and press reviews. Thus news texts of News Co-broadcasting from 2 August 2015 to 18 August 2015 were chosen as the formal written samples in our study.
Third, the corpus of Chinese prose with the diverse stylistic features was built for comparison with microblog texts. Generally speaking, proses may also be divided into three sub-genres, i.e. argumentative prose, narrative prose, and lyric prose, among which the last two sub-genres are more popular than the argumentative type. Since texts of microblogs are restricted with 140 Chinese characters, thus short proses instead of long ones were selected for our research. We then selected altogether 51 pieces of short proses created by different 10 famous Chinese modern prose writers, as listed in Table 6.
Sample source of short Chinese modern prose texts.
| Writers of prose texts | Number of argumentative prose texts | Number of narrative prose texts | Number of lyric prose texts | Number of word tokens |
|---|---|---|---|---|
| Ai Qing | 0 | 1 | 3 | 5,320 |
| Ba Jin | 1 | 1 | 2 | 7,934 |
| Bing Xin | 0 | 1 | 3 | 10,213 |
| Feng Zikai | 0 | 3 | 2 | 9,982 |
| Lu Xun | 4 | 1 | 1 | 11,021 |
| Guo Moruo | 2 | 1 | 1 | 8,829 |
| Ji Xianlin | 2 | 2 | 2 | 11,347 |
| Qian Zhongshu | 2 | 2 | 1 | 9,846 |
| Shi Tiesheng | 2 | 3 | 2 | 14,982 |
| Yu Qiuyu | 2 | 2 | 2 | 11,090 |
| Total | 15 | 17 | 19 | 100,564 |
Last, the corpus of Chinese fiction with the colloquial stylistic features was built for comparison with microblog texts. In Frown, there are 6 sub-types in the domain of fiction, i.e. general fiction, mystery and detective fiction, science fiction, adventure fiction, romantic fiction, and humor. One distinguishing feature of fictions from other formal written types of language is that conversations play an important role in portraying the characteristics of figures. Among these sub-genres of fiction, romantic fiction is a popular type in the modern history of Chinese literature. More importantly, romantic fictions may have large amount of psychological and colloquial descriptions when portraying characters. Thus we selected the subgenre of romantic fictions from Chinese modern famous short fictions as our target corpus. More detailed sample source fiction is listed in the following Table 7.
Sample source of short Chinese modern romantic fiction texts.
| Writers of fiction | Romantic fiction texts | Number of word tokens |
|---|---|---|
| Zhou Guoping | 2 | 17839 |
| Shen Congwen | 1 | 8231 |
| Zhang Jie | 1 | 7638 |
| Shi Tiesheng | 1 | 5987 |
| Chi Zijian | 1 | 8943 |
| Zhang Ailing | 3 | 23899 |
| Yu Hua | 1 | 5419 |
| Xiao Hong | 1 | 7932 |
| Wang Zengqi | 1 | 8361 |
| Wang Anyi | 1 | 5775 |
| Total | 10 | 100,024 |
© 2017 Faculty of English, Adam Mickiewicz University, Poznań, Poland
Articles in the same Issue
- Frontmatter
- Bilingual development of theta in a child
- How will text size influence the length of its linguistic constituents?
- The dimensions of learner attitudes to general british and general american: a survey-based study
- Derived Imperfective Tensing in Kurpian
- Analysing intersubjective resources in Persian and English newspaper opinion/editorials
Articles in the same Issue
- Frontmatter
- Bilingual development of theta in a child
- How will text size influence the length of its linguistic constituents?
- The dimensions of learner attitudes to general british and general american: a survey-based study
- Derived Imperfective Tensing in Kurpian
- Analysing intersubjective resources in Persian and English newspaper opinion/editorials