From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA

Václav Cvrček; Zuzana Komrsková; David Lukeš; Petra Poukarová; Anna Řehořková; Adrian Jan Zasina

doi:10.1515/cllt-2018-0020

Article

From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA

Václav Cvrček
Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.
, Zuzana Komrsková
Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.
, David Lukeš
David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.
, Petra Poukarová
Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.
, Anna Řehořková
Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.
and Adrian Jan Zasina
Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.

Published/Copyright: October 23, 2018

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory Volume 17 Issue 2

Abstract

This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning namely corpus design, feature selection and the parameters of factor analysis, especially the number of dimensions to extract. We report on these for their potential relevance to other researchers embarking on a similar journey. In order to demonstrate the viability of the model, we also present a brief interpretation of the resulting dimensions.

Keywords: multi-dimensional analysis; register variation; methodology; Czech

About the authors

Václav Cvrček

Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.

Zuzana Komrsková

Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.

David Lukeš

David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.

Petra Poukarová

Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.

Anna Řehořková

Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.

Adrian Jan Zasina

Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.

Acknowledgements

This study was supported by the ERDF project “Language Variation in the CNC” no. CZ.02.1.01/0.0/0.0/16_013/0001758. The authors would like to thank both reviewers for valuable input.

References

Auer, Peter. 2009. On-line syntax: Thoughts on the temporality of spoken language. Language Sciences 31. 1–13.10.1016/j.langsci.2007.10.004Search in Google Scholar

Bermel, Neil. 2014. Czech diglossia: Dismantling or dissolution? In Judit Árokay, Jadranka Gvozdanović & Darja Miyajima (eds.), Divided languages? Diglossia, translation and the rise of modernity in Japan, China and the Slavic World, 21–37. Dordrecht: Springer.10.1007/978-3-319-03521-5_2Search in Google Scholar

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Search in Google Scholar

Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5(4). 257–269.10.1093/llc/5.4.257Search in Google Scholar

Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.10.1017/CBO9780511519871Search in Google Scholar

Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.10.1075/bct.87.02bibSearch in Google Scholar

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/CBO9780511814358Search in Google Scholar

Biber, Douglas, Mark Davies, James K. Jones & Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1(1). 1–37.10.3366/cor.2006.1.1.1Search in Google Scholar

Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44(2). 95–137.10.1177/0075424216628955Search in Google Scholar

Čechová, Marie, Marie Krčmová & Eva Minářová (eds.). 2008. Současná stylistika [Contemporary stylistics]. Prague: Nakladatelství Lidové noviny.Search in Google Scholar

Čermák, František. 2014. Lexis in spoken and written language. In František Čermák (ed.), Jazyk a slovník. Vybrané lingvistické studie [Language and dictionary. Selected studies in linguistics], 299–304. Prague: Karolinum.Search in Google Scholar

Čmejrková, Světla & Jana Hoffmannová (eds.). 2011. Mluvená čeština: hledání funkčního rozpětí [Spoken Czech: In search of the range of its functions]. Prague: Academia.Search in Google Scholar

Cvrček, Václav & Lucie Chlumská. 2015. Simplification in translated Czech: A new approach to type-token ratio. Russian Linguistics 39(3). 309–325.10.1007/s11185-015-9151-8Search in Google Scholar

Cvrček, Václav, Vilém Kodýtek, Marie Kopřivová, Dominika Kováříková, Petr Sgall, Michal Šulc, Jan Táborský, Jan Volín & Waclawičová Martina. 2010. Mluvnice současné češtiny 1 [A grammar of contemporary Czech 1]. Prague: Karolinum.Search in Google Scholar

Cvrček, Václav, Zuzana Komrsková, David Lukeš, Petra Poukarová, Anna Řehořková & Adrian J. Zasina. Forthcoming. Variabilita češtiny: multidimenzionální analýza [Variability in Czech: A multi-dimensional analysis]. Slovo a slovesnost.Search in Google Scholar

Egbert, Jesse & Erin Schnur. 2018. The role of the text in corpus and discourse analysis. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse. A critical review, 158–170. New York: Routledge.10.4324/9781315179346-8Search in Google Scholar

Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Search in Google Scholar

Hoffmannová, Jana, Jiří Homoláč, Eliška Chvalovská, Lucie Jílková, Petr Kaderka, Petr Mareš & Kamila Mrázková (eds.). 2016. Stylistika mluvené a psané češtiny [The stylistics of spoken and written Czech]. Prague: Academia.Search in Google Scholar

Karlík, Petr, Marek Nekula & Zdenka Rusínová (eds.). 1995. Příruční mluvnice češtiny [A reference grammar of Czech]. Prague: Nakladatelství Lidové noviny.Search in Google Scholar

Kodýtek, Vilém. 2008. Variace v mluvené češtině v Čechách: sonda do ORAL2006. [Variation in spoken Czech in Bohemia: Exploring the ORAL2006 corpus]. In Marie Kopřivová & Martina Waclawičová (eds.), Čeština v mluveném korpusu, 132–141. Prague: Nakladatelství Lidové noviny.Search in Google Scholar

Kodýtek, Vilém. Unpublished. A translation of Biber’s three-dimensional model of English into Czech. https://www.korpus.cz/biblio/2722 Search in Google Scholar

Lee, David Y. W. 2001. Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5(3). 37–72.Search in Google Scholar

Miller, Jim & Regina Weinert. 1998. Spontaneous spoken language. Oxford: Clarendon Press Oxford.10.1093/oso/9780198236566.001.0001Search in Google Scholar

Mistrík, Jozef. 1989. Štylistika [Stylistics]. Bratislava: Slovenské pedagogické nakladateľstvo.Search in Google Scholar

Oakes, Michael P. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Search in Google Scholar

Petr, Jan, Miloš Dokulil, Karel Horálek, Jiřina Hůrková & Knappová Miloslava. 1986. Mluvnice češtiny 1 [A grammar of Czech 1]. Prague: Academia.Search in Google Scholar

Popescu, Ioan-Iovitz, Karl-Heinz Best & Gabriel Altmann. 2007. On the dynamics of word classes in texts. Glottometrics 14. 58–71.Search in Google Scholar

R Core Team. 2017. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.r-project.org/.Search in Google Scholar

Revelle, William 2017. psych: procedures for personality and psychological research v1.7.8. Evanston: Northwestern University. https://cran.r-project.org/package=psych Search in Google Scholar

Sgall, Petr, Jiří Hronek, Alexandr Stich & Ján Horecký (eds.). 1992. Variation in language. Code switching in Czech as a challenge for sociolinguistics. Amsterdam & Philadelphia: John Benjamins.10.1075/llsee.39Search in Google Scholar

Zasina, Adrian J., David Lukeš, Zuzana Komrsková, Petra Poukarová & Anna Řehořková. 2018. Koditex (A corpus of diversified texts). Prague: Institute of the Czech National Corpus, Faculty of Arts, Charles University.Search in Google Scholar

Published Online: 2018-10-23

Published in Print: 2021-10-26

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/cllt-2018-0020

Keywords for this article

multi-dimensional analysis; register variation; methodology; Czech