From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA
-
Václav Cvrček
Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies., Zuzana Komrsková
Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language., David Lukeš
David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics., Petra Poukarová
Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language., Anna Řehořková
Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.und Adrian Jan Zasina
Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.
Abstract
This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning namely corpus design, feature selection and the parameters of factor analysis, especially the number of dimensions to extract. We report on these for their potential relevance to other researchers embarking on a similar journey. In order to demonstrate the viability of the model, we also present a brief interpretation of the resulting dimensions.
About the authors
Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.
Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.
David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.
Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.
Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.
Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.
Acknowledgements
This study was supported by the ERDF project “Language Variation in the CNC” no. CZ.02.1.01/0.0/0.0/16_013/0001758. The authors would like to thank both reviewers for valuable input.
References
Auer, Peter. 2009. On-line syntax: Thoughts on the temporality of spoken language. Language Sciences 31. 1–13.10.1016/j.langsci.2007.10.004Suche in Google Scholar
Bermel, Neil. 2014. Czech diglossia: Dismantling or dissolution? In Judit Árokay, Jadranka Gvozdanović & Darja Miyajima (eds.), Divided languages? Diglossia, translation and the rise of modernity in Japan, China and the Slavic World, 21–37. Dordrecht: Springer.10.1007/978-3-319-03521-5_2Suche in Google Scholar
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Suche in Google Scholar
Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5(4). 257–269.10.1093/llc/5.4.257Suche in Google Scholar
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.10.1017/CBO9780511519871Suche in Google Scholar
Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.10.1075/bct.87.02bibSuche in Google Scholar
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/CBO9780511814358Suche in Google Scholar
Biber, Douglas, Mark Davies, James K. Jones & Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1(1). 1–37.10.3366/cor.2006.1.1.1Suche in Google Scholar
Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44(2). 95–137.10.1177/0075424216628955Suche in Google Scholar
Čechová, Marie, Marie Krčmová & Eva Minářová (eds.). 2008. Současná stylistika [Contemporary stylistics]. Prague: Nakladatelství Lidové noviny.Suche in Google Scholar
Čermák, František. 2014. Lexis in spoken and written language. In František Čermák (ed.), Jazyk a slovník. Vybrané lingvistické studie [Language and dictionary. Selected studies in linguistics], 299–304. Prague: Karolinum.Suche in Google Scholar
Čmejrková, Světla & Jana Hoffmannová (eds.). 2011. Mluvená čeština: hledání funkčního rozpětí [Spoken Czech: In search of the range of its functions]. Prague: Academia.Suche in Google Scholar
Cvrček, Václav & Lucie Chlumská. 2015. Simplification in translated Czech: A new approach to type-token ratio. Russian Linguistics 39(3). 309–325.10.1007/s11185-015-9151-8Suche in Google Scholar
Cvrček, Václav, Vilém Kodýtek, Marie Kopřivová, Dominika Kováříková, Petr Sgall, Michal Šulc, Jan Táborský, Jan Volín & Waclawičová Martina. 2010. Mluvnice současné češtiny 1 [A grammar of contemporary Czech 1]. Prague: Karolinum.Suche in Google Scholar
Cvrček, Václav, Zuzana Komrsková, David Lukeš, Petra Poukarová, Anna Řehořková & Adrian J. Zasina. Forthcoming. Variabilita češtiny: multidimenzionální analýza [Variability in Czech: A multi-dimensional analysis]. Slovo a slovesnost.Suche in Google Scholar
Egbert, Jesse & Erin Schnur. 2018. The role of the text in corpus and discourse analysis. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse. A critical review, 158–170. New York: Routledge.10.4324/9781315179346-8Suche in Google Scholar
Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Suche in Google Scholar
Hoffmannová, Jana, Jiří Homoláč, Eliška Chvalovská, Lucie Jílková, Petr Kaderka, Petr Mareš & Kamila Mrázková (eds.). 2016. Stylistika mluvené a psané češtiny [The stylistics of spoken and written Czech]. Prague: Academia.Suche in Google Scholar
Karlík, Petr, Marek Nekula & Zdenka Rusínová (eds.). 1995. Příruční mluvnice češtiny [A reference grammar of Czech]. Prague: Nakladatelství Lidové noviny.Suche in Google Scholar
Kodýtek, Vilém. 2008. Variace v mluvené češtině v Čechách: sonda do ORAL2006. [Variation in spoken Czech in Bohemia: Exploring the ORAL2006 corpus]. In Marie Kopřivová & Martina Waclawičová (eds.), Čeština v mluveném korpusu, 132–141. Prague: Nakladatelství Lidové noviny.Suche in Google Scholar
Kodýtek, Vilém. Unpublished. A translation of Biber’s three-dimensional model of English into Czech. https://www.korpus.cz/biblio/2722 Suche in Google Scholar
Lee, David Y. W. 2001. Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5(3). 37–72.Suche in Google Scholar
Miller, Jim & Regina Weinert. 1998. Spontaneous spoken language. Oxford: Clarendon Press Oxford.10.1093/oso/9780198236566.001.0001Suche in Google Scholar
Mistrík, Jozef. 1989. Štylistika [Stylistics]. Bratislava: Slovenské pedagogické nakladateľstvo.Suche in Google Scholar
Oakes, Michael P. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Suche in Google Scholar
Petr, Jan, Miloš Dokulil, Karel Horálek, Jiřina Hůrková & Knappová Miloslava. 1986. Mluvnice češtiny 1 [A grammar of Czech 1]. Prague: Academia.Suche in Google Scholar
Popescu, Ioan-Iovitz, Karl-Heinz Best & Gabriel Altmann. 2007. On the dynamics of word classes in texts. Glottometrics 14. 58–71.Suche in Google Scholar
R Core Team. 2017. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.r-project.org/.Suche in Google Scholar
Revelle, William 2017. psych: procedures for personality and psychological research v1.7.8. Evanston: Northwestern University. https://cran.r-project.org/package=psych Suche in Google Scholar
Sgall, Petr, Jiří Hronek, Alexandr Stich & Ján Horecký (eds.). 1992. Variation in language. Code switching in Czech as a challenge for sociolinguistics. Amsterdam & Philadelphia: John Benjamins.10.1075/llsee.39Suche in Google Scholar
Zasina, Adrian J., David Lukeš, Zuzana Komrsková, Petra Poukarová & Anna Řehořková. 2018. Koditex (A corpus of diversified texts). Prague: Institute of the Czech National Corpus, Faculty of Arts, Charles University.Suche in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Attractors of variation in Hungarian inflectional morphology
- A diachronic perspective on near-synonymy: The concept of sweet-smelling in American English
- From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA
- An information-theoretic view on language complexity and register variation: Compressing naturalistic corpus data
- A corpus-based study of the Chinese synonymous approximatives shangxia, qianhou and zuoyou
- The lexical context in a style analysis: A word embeddings approach
- Linguistic Proficiency: A Quantitative Approach to Immigrant and Heritage Speakers of Danish
- DISCOver: DIStributional approach based on syntactic dependencies for discovering COnstructions
Artikel in diesem Heft
- Frontmatter
- Attractors of variation in Hungarian inflectional morphology
- A diachronic perspective on near-synonymy: The concept of sweet-smelling in American English
- From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA
- An information-theoretic view on language complexity and register variation: Compressing naturalistic corpus data
- A corpus-based study of the Chinese synonymous approximatives shangxia, qianhou and zuoyou
- The lexical context in a style analysis: A word embeddings approach
- Linguistic Proficiency: A Quantitative Approach to Immigrant and Heritage Speakers of Danish
- DISCOver: DIStributional approach based on syntactic dependencies for discovering COnstructions