CLLT ‘versus’ Corpora and IJCL: a (half serious) keyness analysis

Stefanie Wulff; Stefan Th. Gries

doi:10.1515/cllt-2024-0050

Article

CLLT ‘versus’ Corpora and IJCL: a (half serious) keyness analysis

Stefanie Wulff and Stefan Th. Gries

Published/Copyright: May 27, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory Volume 20 Issue 3

Abstract

In this introduction to the special issue celebrating CLLT’s 20th anniversary, we look back and forward in time. To look back, we present the results of a (tongue-in-cheek) corpus-linguistic analysis of about 10 years worth of data of research published in CLLT, IJCL, and Corpora in order to distill the “essence” of CLLT for the reader. As an added bonus, we use the opportunity to discuss ways to improve established ways of performing keyness analyses. To look forward, we asked six (teams of) researchers who all have shaped corpus linguistics and thus the journal to give us their take on what the most significant developments in the field have been, and where they see the most impactful opportunities and challenges arise. This introduction briefly summarizes their contributions.

Keywords: CLLT; association-based keyness; dispersion-based keyness; Kullback-Leibler divergence; log-likelihood ratio

Corresponding author: Stefanie Wulff, Department of Linguistics, University of Florida, Gainesville, FL 32611, USA; and UiT The Arctic University of Norway, Tromsø, Norway, E-mail: swulff@ufl.edu

References

Baker, Paul. 2004. Querying keywords: Questions in difference, frequency, and sense in keyword analysis. Journal of English Linguistics 32(4). 346–359. https://doi.org/10.1177/0075424204269894.Search in Google Scholar

Baron, Alistair, Paul Rayson & Dawn Archer. 2009. Word frequency and keyword statistics in historical corpus linguistics. Anglistik: International Journal of English Studies 20(1). 41–67.Search in Google Scholar

Brezina, Vaclav & Miriam Meyerhoff. 2014. Significant or random? A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics 19(1). 1–28. https://doi.org/10.1075/ijcl.19.1.01bre.Search in Google Scholar

Cvrček, Václav & Masako Fidler. 2022. No keyword is an island: In search of covert associations. Corpora 17(2). 259–290. https://doi.org/10.3366/cor.2022.0256.Search in Google Scholar

Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1). 61–74.Search in Google Scholar

Egbert, Jesse & Douglas Biber. 2019. Incorporating text dispersion into keyword analyses. Corpora 14(1). 77–104. https://doi.org/10.3366/cor.2019.0162.Search in Google Scholar

Gabrielatos, Costas. 2018. Keyness analysis: Nature, metrics and techniques. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review, 225–258. London & New York: Routledge.10.4324/9781315179346-11Search in Google Scholar

Gries, Stefan Th. 2021. A new approach to (key) keywords analysis: Using frequency, and now also dispersion. Research in Corpus Linguistics 9(2). 1–33. https://doi.org/10.32714/ricl.09.02.02.Search in Google Scholar

Gries, Stefan Th. 2022. What do (some of) our association measures measure (most)? Association? Journal of Second Language Studies 5(2). 171–205. https://doi.org/10.1075/jsls.21028.gri.Search in Google Scholar

Gries, Stefan Th. to appear. Frequency, dispersion, association, and keyness: Revising and tupleizing corpus-linguistic measures. Amsterdam & Philadelphia: John Benjamins.Search in Google Scholar

Gries, Stefan Th. under review. Cultural keywords in varieties research. World Englishes.Search in Google Scholar

Kilgarriff, Adam. 1997. Using word frequency lists to measure corpus homogeneity and similarity between corpora. In Proceedings 5th ACL workshop on very large corpora, 231–245.Search in Google Scholar

Millar, Neil & Brian S. Budgell. 2008. The language of public health – a corpus-based analysis. Journal of Public Health 16(5). 369–374. https://doi.org/10.1007/s10389-008-0178-9.Search in Google Scholar

Paquot, Magali & Yves Bestgen. 2009. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Andreas Jucker, Daniel Schreier & Marianne Hundt (eds.), Corpora: Pragmatics and discourse, 247–269. Amsterdam: Rodopi.10.1163/9789042029101_014Search in Google Scholar

Rayson, Paul & Amanda Potts. 2020. Analysing keyword lists. In Magali Paquot & Stefan Th. Gries (eds.), Practical handbook of corpus linguistics, 119–139. Berlin & New York: Springer.10.1007/978-3-030-46216-1_6Search in Google Scholar

Schmidt, Ben & Jian Li. 2015. wordVectors: Tools for creating and analyzing vector-space models of texts. R package. Available at: https://github.com/bmschmidt/wordVectors.Search in Google Scholar

Scott, Mike & Christopher Tribble. 2006. Textual patterns: Key words and corpus analysis in language education. Amsterdam & Philadelphia: John Benjamins.10.1075/scl.22Search in Google Scholar

Scott, Mike. 1997. PC analysis of key words – and key words. System 25(2). 233–245. https://doi.org/10.1016/s0346-251x(97)00011-0.Search in Google Scholar

Stubbs, Michael. 1996. Text and corpus analysis: Computer-assisted studies of language and culture. Oxford: Blackwell.Search in Google Scholar

Received: 2024-05-02

Accepted: 2024-05-07

Published Online: 2024-05-27

Published in Print: 2024-10-28

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/cllt-2024-0050

Keywords for this article

CLLT; association-based keyness; dispersion-based keyness; Kullback-Leibler divergence; log-likelihood ratio