Startseite From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA

  • Václav Cvrček

    Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.

    ORCID logo EMAIL logo
    , Zuzana Komrsková

    Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.

    ORCID logo
    , David Lukeš

    David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.

    ORCID logo
    , Petra Poukarová

    Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.

    ORCID logo
    , Anna Řehořková

    Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.

    ORCID logo
    und Adrian Jan Zasina

    Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.

    ORCID logo
Veröffentlicht/Copyright: 23. Oktober 2018

Abstract

This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning namely corpus design, feature selection and the parameters of factor analysis, especially the number of dimensions to extract. We report on these for their potential relevance to other researchers embarking on a similar journey. In order to demonstrate the viability of the model, we also present a brief interpretation of the resulting dimensions.

About the authors

Václav Cvrček

Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.

Zuzana Komrsková

Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.

David Lukeš

David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.

Petra Poukarová

Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.

Anna Řehořková

Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.

Adrian Jan Zasina

Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.

Acknowledgements

This study was supported by the ERDF project “Language Variation in the CNC” no. CZ.02.1.01/0.0/0.0/16_013/0001758. The authors would like to thank both reviewers for valuable input.

References

Auer, Peter. 2009. On-line syntax: Thoughts on the temporality of spoken language. Language Sciences 31. 1–13.10.1016/j.langsci.2007.10.004Suche in Google Scholar

Bermel, Neil. 2014. Czech diglossia: Dismantling or dissolution? In Judit Árokay, Jadranka Gvozdanović & Darja Miyajima (eds.), Divided languages? Diglossia, translation and the rise of modernity in Japan, China and the Slavic World, 21–37. Dordrecht: Springer.10.1007/978-3-319-03521-5_2Suche in Google Scholar

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Suche in Google Scholar

Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5(4). 257–269.10.1093/llc/5.4.257Suche in Google Scholar

Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.10.1017/CBO9780511519871Suche in Google Scholar

Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.10.1075/bct.87.02bibSuche in Google Scholar

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/CBO9780511814358Suche in Google Scholar

Biber, Douglas, Mark Davies, James K. Jones & Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1(1). 1–37.10.3366/cor.2006.1.1.1Suche in Google Scholar

Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44(2). 95–137.10.1177/0075424216628955Suche in Google Scholar

Čechová, Marie, Marie Krčmová & Eva Minářová (eds.). 2008. Současná stylistika [Contemporary stylistics]. Prague: Nakladatelství Lidové noviny.Suche in Google Scholar

Čermák, František. 2014. Lexis in spoken and written language. In František Čermák (ed.), Jazyk a slovník. Vybrané lingvistické studie [Language and dictionary. Selected studies in linguistics], 299–304. Prague: Karolinum.Suche in Google Scholar

Čmejrková, Světla & Jana Hoffmannová (eds.). 2011. Mluvená čeština: hledání funkčního rozpětí [Spoken Czech: In search of the range of its functions]. Prague: Academia.Suche in Google Scholar

Cvrček, Václav & Lucie Chlumská. 2015. Simplification in translated Czech: A new approach to type-token ratio. Russian Linguistics 39(3). 309–325.10.1007/s11185-015-9151-8Suche in Google Scholar

Cvrček, Václav, Vilém Kodýtek, Marie Kopřivová, Dominika Kováříková, Petr Sgall, Michal Šulc, Jan Táborský, Jan Volín & Waclawičová Martina. 2010. Mluvnice současné češtiny 1 [A grammar of contemporary Czech 1]. Prague: Karolinum.Suche in Google Scholar

Cvrček, Václav, Zuzana Komrsková, David Lukeš, Petra Poukarová, Anna Řehořková & Adrian J. Zasina. Forthcoming. Variabilita češtiny: multidimenzionální analýza [Variability in Czech: A multi-dimensional analysis]. Slovo a slovesnost.Suche in Google Scholar

Egbert, Jesse & Erin Schnur. 2018. The role of the text in corpus and discourse analysis. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse. A critical review, 158–170. New York: Routledge.10.4324/9781315179346-8Suche in Google Scholar

Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Suche in Google Scholar

Hoffmannová, Jana, Jiří Homoláč, Eliška Chvalovská, Lucie Jílková, Petr Kaderka, Petr Mareš & Kamila Mrázková (eds.). 2016. Stylistika mluvené a psané češtiny [The stylistics of spoken and written Czech]. Prague: Academia.Suche in Google Scholar

Karlík, Petr, Marek Nekula & Zdenka Rusínová (eds.). 1995. Příruční mluvnice češtiny [A reference grammar of Czech]. Prague: Nakladatelství Lidové noviny.Suche in Google Scholar

Kodýtek, Vilém. 2008. Variace v mluvené češtině v Čechách: sonda do ORAL2006. [Variation in spoken Czech in Bohemia: Exploring the ORAL2006 corpus]. In Marie Kopřivová & Martina Waclawičová (eds.), Čeština v mluveném korpusu, 132–141. Prague: Nakladatelství Lidové noviny.Suche in Google Scholar

Kodýtek, Vilém. Unpublished. A translation of Biber’s three-dimensional model of English into Czech. https://www.korpus.cz/biblio/2722 Suche in Google Scholar

Lee, David Y. W. 2001. Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5(3). 37–72.Suche in Google Scholar

Miller, Jim & Regina Weinert. 1998. Spontaneous spoken language. Oxford: Clarendon Press Oxford.10.1093/oso/9780198236566.001.0001Suche in Google Scholar

Mistrík, Jozef. 1989. Štylistika [Stylistics]. Bratislava: Slovenské pedagogické nakladateľstvo.Suche in Google Scholar

Oakes, Michael P. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Suche in Google Scholar

Petr, Jan, Miloš Dokulil, Karel Horálek, Jiřina Hůrková & Knappová Miloslava. 1986. Mluvnice češtiny 1 [A grammar of Czech 1]. Prague: Academia.Suche in Google Scholar

Popescu, Ioan-Iovitz, Karl-Heinz Best & Gabriel Altmann. 2007. On the dynamics of word classes in texts. Glottometrics 14. 58–71.Suche in Google Scholar

R Core Team. 2017. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.r-project.org/.Suche in Google Scholar

Revelle, William 2017. psych: procedures for personality and psychological research v1.7.8. Evanston: Northwestern University. https://cran.r-project.org/package=psych Suche in Google Scholar

Sgall, Petr, Jiří Hronek, Alexandr Stich & Ján Horecký (eds.). 1992. Variation in language. Code switching in Czech as a challenge for sociolinguistics. Amsterdam & Philadelphia: John Benjamins.10.1075/llsee.39Suche in Google Scholar

Zasina, Adrian J., David Lukeš, Zuzana Komrsková, Petra Poukarová & Anna Řehořková. 2018. Koditex (A corpus of diversified texts). Prague: Institute of the Czech National Corpus, Faculty of Arts, Charles University.Suche in Google Scholar

Published Online: 2018-10-23
Published in Print: 2021-10-26

© 2018 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 30.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/cllt-2018-0020/html
Button zum nach oben scrollen