Abstract
Keyword analysis is used in a range of sub-disciplines of applied linguistics from genre analyses to critically-oriented studies for different purposes ranging from producing a general characterization of a genre to identifying text-specific ideological issues. This study compares the use of log-likelihood (LL), a probability statistic, and odds ratio (OR), an effect size statistic, for keyword identification and argues that the two methods produce different keywords applicable to research focusing on different purposes. Through two case studies, keyword analyses of advance fee scams against the British National Corpus and research articles in applied linguistics against research articles from other academic disciplines, we show that both the LL and OR keywords concern the aboutness of the corpus, but differ in their specificity and pervasiveness through the corpus. LL highlights words which are relatively common in general use serving genre purposes, whereas OR highlights more specialized words serving critically-oriented purposes. Methodological and practical contributions to keyword analysis are discussed.
References
Adolphs, Svenja. 2006. Introducing electronic text analysis: A practical guide for language and literacy studies. New York: Routledge.10.4324/9780203087701Suche in Google Scholar
Adolphs, Svenja, Brian Brown, Ronald Carter, Paul Crawford & Opinder Sahota. 2004. Applying corpus linguistics in a health care context. Journal of Applied Linguistics 1(1). 9–28.10.1558/japl.1.1.9.55871Suche in Google Scholar
Agresti, Alan. 2002 [1990]. Categorical data analysis, 2nd edn. New York: Wiley.10.1002/0471249688Suche in Google Scholar
Agresti, Alan. 2007 [1996]. An introduction to categorical data analysis, 2nd edn. New York: Wiley.10.1002/0470114754Suche in Google Scholar
Anthony, Laurence. 2013a. AntWordProfiler (Version 1.4.0.1) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/software/antwordprofiler/ (accessed 8 October 2014).Suche in Google Scholar
Anthony, Laurence. 2013b. A critical look at software tools in corpus linguistics. Linguistic Research 30(2). 141–161.10.17250/khisli.30.2.201308.001Suche in Google Scholar
Anthony, Laurence. 2014. AntConc (Version 3.4.3) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/software/antconc/ (accessed 8 October 2014).Suche in Google Scholar
Baker, Paul. 2004. Querying key words: Questions of difference, frequency, and sense in key words analysis. Journal of English Linguistics 32(4). 346–359.10.1177/0075424204269894Suche in Google Scholar
Baker, Paul. 2006a. The question is, how cruel is it? Keywords, foxhunting and the House of Commons. Paper presented at AHRC ICT [Information and Communications Technology in Arts and Humanities Research] Methods Network Expert Seminar on Linguistics, Lancaster University, 8 September.Suche in Google Scholar
Baker, Paul. 2006b. Using corpora in discourse analysis. London: Continuum.10.5040/9781350933996Suche in Google Scholar
Bassi, Erica. 2010. A contrastive analysis of keywords in newspaper articles on the “Kyoto Protocol”. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 207–218. Amsterdam: John Benjamins.10.1075/scl.41.15basSuche in Google Scholar
Bestgen, Yves & Sylviane Granger. 2014. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 26(1). 28–41.10.1016/j.jslw.2014.09.004Suche in Google Scholar
Bestgen, Yves. 2014. Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary & Linguistic Computing 29(2). 164–170.10.1093/llc/fqt020Suche in Google Scholar
Bondi, Marina & Mike Scott (eds.). 2010. Keyness in texts. Amsterdam: John Benjamins.10.1075/scl.41Suche in Google Scholar
Bowker, Lynne & Jennifer Pearson. 2002. Working with specialized language: A practical guide to using corpora. London: Routledge.10.4324/9780203469255Suche in Google Scholar
Butler, Christopher S. 2001. A matter of give and take: Corpus linguistics and the predicate frame. Revista Canaria de Estudios Ingleses 42. 55–78.Suche in Google Scholar
Carreon, Jonathan Rante & Richard Watson Todd. 2011. Analysing private hospital websites from a critical perspective: Potential issues of methodology, analysis and interpretation of findings. In Proceedings of the International Conference on Doing Research in Applied Linguistics [DRAL], 26–36. Bangkok: King Mongkut’s University of Technology Thonburi.Suche in Google Scholar
Chujo, Kiyomi & Masao Utiyama. 2006. Selecting level-specific specialized vocabulary using statistical measures. System 34(2). 255–269.10.1016/j.system.2005.12.003Suche in Google Scholar
Crawford, Lynn, Julien Pollack & David England. 2006. Uncovering the trends in project management: Journal emphases over the last 10 years. International Journal of Project Management 24. 175–184.10.1016/j.ijproman.2005.10.005Suche in Google Scholar
Cruickshank, Douglas. 2001. I crave your distinguished indulgence (and all your cash). http://www.salon.com/2001/08/07/419scams/ (accessed 14 May 2015).Suche in Google Scholar
Cukier, Wendy L., Eva J. Nesselroth & Susan Cody. 2007. Genre, narrative and the “Nigerian letter” in electronic mail. Proceedings of the 40th Annual Hawaii International Conference on System Sciences [HICSS’07]. 70a. http://www.computer.org/csdl/proceedings/hicss/2007/2755/00/27550070a.pdf (accessed 25 May 2015).10.1109/HICSS.2007.238Suche in Google Scholar
Culpeper, Jonathan. 2002. Computers, language and characterisation: An analysis of six characters in Romeo and Juliet. In Ulla Melander-Marttala, Carin Östman & Merja Kytö (eds.), Conversation in life and in literature, 11–30. Uppsala: Universitetstryckeriet.Suche in Google Scholar
Culpeper, Jonathan. 2009. Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1). 29–59.10.1075/ijcl.14.1.03culSuche in Google Scholar
De Schryver, Gilles-Maurice. 2012. Trends in twenty-five years of academic lexicography. International Journal of Lexicography 25(4). 464–506.10.1093/ijl/ecs030Suche in Google Scholar
del-Teso-Craviotto, Marisol. 2006. Words that mater: Lexical choice and gender ideologies in women’s magazines. Journal of Pragmatics 38(11). 2003–2021.10.1016/j.pragma.2005.03.012Suche in Google Scholar
Dörnyei, Zoltán. 2007. Research methods in applied linguistics. Oxford: Oxford University Press.Suche in Google Scholar
Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1). 61–74.Suche in Google Scholar
Dyrud, Marilyn A. 2005. Letters, “I brought you a good news”: An analysis of Nigerian 419 letters. In Lisa E. Gueldenzoph (ed.), Proceedings of the 2005 Association for Business Communication Annual Convention [ABC], 1–11. Irvine: The Association for Business Communication.Suche in Google Scholar
Evert, Stefan. 2008. Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook 2, 1212–1248. Berlin & New York: Mouton de Gruyter.10.1515/9783110213881.2.1212Suche in Google Scholar
Feng, Haiying. 2006. A corpus-based study of research grant proposal abstracts. Perspectives: Working Papers in English and Communication 17(1). 1–24.Suche in Google Scholar
Freddi, Maria. 2005. Arguing linguistics: Corpus investigation of one functional variety of academic discourse. Journal of English for Academic Purposes 4(1). 5–26.10.1016/j.jeap.2003.09.002Suche in Google Scholar
Gabrielatos, Costas & Paul Baker. 2008. Fleeing, sneaking, flooding a corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press, 1996–2005. Journal of English Linguistics 36(1). 5–38.10.1177/0075424207311247Suche in Google Scholar
Gabrielatos, Costas & Anna Marchi. 2012. Keyness: Appropriate metrics and practical issues. Paper presented at Critical Approaches to Discourse Studies, University of Bologna, 13–14 September. http://repository.edgehill.ac.uk/4196/1/Gabrielatos%26Marchi-Keyness-CADS2012.pdf (accessed 20 September 2015).Suche in Google Scholar
Gabrielatos, Costas. 2007. Selecting query terms to build a specialised corpus from a restricted-access database. ICAME Journal 31. 5–44.Suche in Google Scholar
Gerbig, Andrea. 2010. Key words and key phrases in a corpus of travel writing. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 147–168. Amsterdam: John Benjamins.10.1075/scl.41.11gerSuche in Google Scholar
Gleick, James. 2003. You have spam. Australian Magazine March 15. 16. http://web.lexis-nexis.com/universe/document?_m=3550ffbea5787e1788de3f3a33bdabf&_docnum=48&wchp=dGLbVtz-zSkVb&+md5=34b249bcee6db14d8b237c3448899aab.Suche in Google Scholar
Goldstein, Alan. 2003. Growing junk e-mail traffic has become a ‘Headache.’ Hamilton Spectator [Ontario, Canada] August 12. http://web.lexis-nexis.com/universe/document?_m=35501T6bea5787e1788de3f3a33bdabf&_docnum=48&wchp=dGLbVtz-zSkVb&_md5=34b249bcee6db14d8b237c3448899aab.Suche in Google Scholar
Gooberman-Hill, Rachael, Melissa French, Paul Dieppe & Gillian Hawker. 2009. Expressing pain and fatigue: A new method of analysis to explore differences in osteoarthritis experience. Arthritis and Rheumatism 61(3). 353–360.10.1002/art.24273Suche in Google Scholar
Graham, Dougal. 2014. KeyBNC [Computer Software]. Bangkok: King Mongkut’s University of Technology Thonburi. http://crs2.kmutt.ac.th/Key-BNC/ (accessed 27 November 2014).Suche in Google Scholar
Gries, Stefan Th. 2014. Frequency tables, effect sizes, and explorations. In Dylan Glynn & Justyna Robinson (eds.), Corpus methods for semantics: Quantitative studies in polysemy and synonymy, 365–389. Amsterdam & Philadelphia: John Benjamins.10.1075/hcp.43.14griSuche in Google Scholar
Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber & Randi Reppen (eds.), The Cambridge handbook of English corpus linguistics, 50–72. Cambridge: Cambridge University Press.10.1017/CBO9781139764377.004Suche in Google Scholar
Grissom, Robert J. & John J. Kim. 2005. Effect sizes for research: A broad practical approach. New Jersey: Lawrence Erlbaum.Suche in Google Scholar
Hardie, Andrew. 2014. Log Ratio – an informal introduction. http://cass.lancs.ac.uk/?p=1133 (accessed 27 August 2015).Suche in Google Scholar
Jimarkon, Pattamawan & Richard Watson Todd. 2013. Red or yellow, peace or war: Agonism and antagonism in online discussion during the 2010 political unrest in Thailand. In Antoon De Rycker & Zuraidah Mohd Don (eds.), Discourse and crisis: Critical perspectives, 301–322. Amsterdam: John Benjamins.10.1075/dapsac.52.10jimSuche in Google Scholar
Kang, Ning & Qiaofeng Yu. 2011. Corpus-based stylistic analysis of tourism English. Journal of Language Teaching and Research 2(1). 129–136.10.4304/jltr.2.1.129-136Suche in Google Scholar
Kich, Martin. 2005. A rhetorical analysis of fund-transfer-scam solicitations. Cercles 14. 129–142.Suche in Google Scholar
Kilgarriff, Adam. 2001. Comparing corpora. International Journal of Corpus Linguistics 6(1). 97–133.10.1075/ijcl.6.1.05kilSuche in Google Scholar
Kotzé, Ernst Frederick. 2010. Author identification from opposing perspectives in forensic linguistics. Southern Africa Linguistics and Applied Language Studies 28(2). 185–197.10.2989/16073614.2010.519111Suche in Google Scholar
Kwary, Deny Arnos. 2011. A hybrid method for determining technical vocabulary. System 39(2). 175–185.10.1016/j.system.2011.04.003Suche in Google Scholar
Lamberger, Igor, Bojan Dobovšek & Boštjan Slak. 2013. Analysis of the fraudulent letters A.K.A. Nigerian letters. In Gorazd Meško, Andrej Sotlar & Jack R. Greene (eds.), Proceedings of the Biennial International Conference: Criminal Justice and Security–Contemporary Criminal Justice Practice and Research, 443–466. Ljubljana: University of Maribor. https://www.ncjrs.gov/pdffiles1/242949.pdf (accessed 25 May 2015).Suche in Google Scholar
Leone, Paola. 2010. General spoken language and school language: Key words and discourse patterns in history textbooks. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 234–248. Amsterdam: John Benjamins.10.1075/scl.41.17leoSuche in Google Scholar
Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki & Heikki Mannila. 2014. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 29(4). http://users.ics.aalto.fi/lijffijt/articles/lijffijt2015a.pdf (accessed 20 September 2015).10.1093/llc/fqu064Suche in Google Scholar
Ljung, Magnus. 2002. What vocabulary tells us about genre differences: A study of lexis in five newspaper genres. Language and Computers 40(1). 181–196.10.1163/9789004334267_011Suche in Google Scholar
Loudermilk, Brandon Conner. 2007. Occluded academic genres: An analysis of the MBA thought essay. Journal of English for Academic Purposes 6(3). 190–205.10.1016/j.jeap.2007.07.001Suche in Google Scholar
Malavasi, Donatella & Davide Mazzi. 2010. History v. marketing: Keywords as a clue to disciplinary epistemology. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 169–184. Amsterdam: John Benjamins.10.1075/scl.41.12malSuche in Google Scholar
Martínez, Antonia Sánchez. 2008. Collocation analysis of a sample corpus using some statistical measures: An empirical approach. In Rafael Monroy & Aquilino Sánchez (eds.), Proceedings of the 25th International AESLA [The Spanish Society for Applied Linguistics] Conference: 25 years of Applied Linguistics in Spain: milestones and challenges, 763–768. Murcia: University of Murcia.Suche in Google Scholar
Moudraia, Olga. 2003. The student engineering corpus: Analysing word frequency. In Dawn Archer, Paul Rayson, Andrew Wilson & Tony McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference [CL2003], 552–561. Lancaster: Lancaster University.Suche in Google Scholar
Nassaji, Hossein. 2012. Statistical significance tests and result generalisability. In Graeme Porte (ed.), Replication research in applied linguistics, 92–115. Cambridge: Cambridge University Press.Suche in Google Scholar
Nation, Pual & Alex Heatley. 2002. Range: A program for the analysis of vocabulary in texts [Computer Software]. Wellington: Victoria University. http://www.victoria.ac.nz/lals/about/staff/paul-nation (accessed 19 September 2014).Suche in Google Scholar
O’Halloran, Kieran. 2011. Investigating argumentation in reading groups: Combining manual qualitative coding and automated corpus analysis tools. Applied Linguistics 32(2). 172–196.10.1093/applin/amq041Suche in Google Scholar
Oakes, Michael P. 2008. Measures from information retrieval to find the words which are characteristic of a corpus. In Barbara Lewandowska-Tomaszczyj (ed.), Corpus linguistics, computer tools, and applications–state of the art: PALC 2007, 127–138. Frankfurt: Peter Lang.Suche in Google Scholar
Paquot, Magali & Yves Bestgen. 2009. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Andreas H. Jucker, Daniel Schreier & Marianne Hundt (eds.), Corpora: Pragmatics and discourse, 247–269. Amsterdam & New York: Rodopi.10.1163/9789042029101_014Suche in Google Scholar
Rayson, Paul & Roger Garside. 2000. Comparing corpora using frequency profiling. In Proceedings of the Workshop on Comparing Corpora [WCC’00], 1–6. Hong Kong: Association for Computational Linguistics.10.3115/1117729.1117730Suche in Google Scholar
Rayson, Paul. 2008a. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4). 519–149.10.1075/ijcl.13.4.06raySuche in Google Scholar
Rayson, Paul. 2008b. Log-likelihood and effect size calculator. http://ucrel.lancs.ac.uk/llwizard.html (accessed 27 August 2015).Suche in Google Scholar
Rayson, Paul. 2009. Wmatrix: a web-based corpus processing environment [Computer Software]. Lancaster: Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/Suche in Google Scholar
Rayson, Paul. 2013. Corpus analysis of key words. In Carol A. Chapelle (ed.), The encyclopaedia of applied linguistics, 1–7. Oxford: Wiley-Blackwell.10.1002/9781405198431.wbeal0247Suche in Google Scholar
Rayson, Paul, Damon Berridge & Brian Francis. 2004. Extending the Cochran rule for the comparison of word frequencies between corpora. In Gérald Purnelle, Cédrick Fairon & Anne Dister (eds.), Proceedings of the 7th International Conference on Statistical Analysis of Textual Data [JADT], 926–936. Louvain-la-Neuve: UCL Presses universitaires de Louvain.Suche in Google Scholar
Renström, Caroline. 2011. Framing Obama: A comparative study of keywords and frames in two Washington newspapers. Stockholm: Stockholm University Bachelor Degree Thesis. http://su.diva-portal.org/smash/get/diva2:479520/FULLTEXT01 (accessed 24 September 2013).Suche in Google Scholar
Römer, Ute & Stefanie Wulff. 2010. Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research 2(2). 99–127.10.17239/jowr-2010.02.02.2Suche in Google Scholar
Schaffer, Deborah. 2012. The language of scam spams: linguistic features of “Nigerian fraud” e-mails. et Cetera 69(2). 157–179.Suche in Google Scholar
Scharl, Arno & Albert Weichselbraun. 2008. An automated approach to investigating the online media coverage of US presidential elections. Journal of Information Technology and Politics 5(1). 121–132.10.1080/19331680802149582Suche in Google Scholar
Schmitt, Norbert. 2010. Researching vocabulary: A vocabulary research manual. Basingstoke: Palgrave Macmillan.10.1057/9780230293977Suche in Google Scholar
Scott, Mike & Christopher Tribble. 2006. Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins.10.1075/scl.22Suche in Google Scholar
Scott, Mike. 1997. PC analysis of key words – and key key words. System 25(2). 233–245.10.1016/S0346-251X(97)00011-0Suche in Google Scholar
Scott, Mike. 2000. Focusing on the text and its key words. In Lou Burnard & Tony McEnery (eds.), Rethinking language pedagogy from a corpus perspective, 103–122. Frankfurt: Peter Lang.Suche in Google Scholar
Scott, Mike. 2015. WordSmith Tools (Version 6.0) [Computer Software]. Oxford: Oxford University Press.Suche in Google Scholar
Seale, Clive 2008. Mapping the field of medical sociology: A comparative analysis of journals. Sociology of Health & Illness 30(5). 677–695.10.1111/j.1467-9566.2008.01090.xSuche in Google Scholar
Seale, Clive, Sue Ziebland & Jonathan Charteris-Black. 2006. Gender, cancer experience and internet use: A comparative keyword analysis of interviews and online cancer support groups. Social Science and Medicine 62(10). 2577–2590.10.1016/j.socscimed.2005.11.016Suche in Google Scholar
Sealey, Alison. 2009. Probabilities and surprises: A realist approach to identifying linguistic and social patterns, with reference to an oral history corpus. Applied Linguistics 31(2). 215–235.10.1093/applin/amp023Suche in Google Scholar
Stubbs, Michael. 1995. Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language 2(1), 23–55.10.1075/fol.2.1.03stuSuche in Google Scholar
Sweeney, Latanya. 2006. Protecting job seekers from identity theft. IEEE Internet Computing 10(2). http://dataprivacylab.org/dataprivacy/projects/idangel/paper3.pdf (accessed 25 May 2015).10.1109/MIC.2006.40Suche in Google Scholar
Thompson, Geoff. 2004 [1996]. Introducing functional grammar, 2nd edn. London: Arnold.Suche in Google Scholar
Tomokiyo, Takashi & Matthew Hurst. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL [Association for Computational Linguistics] 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment [MWE’03], 33–40. Sapporo: Association for Computational Linguistics.10.3115/1119282.1119287Suche in Google Scholar
Vechtomova, Olga & Stephen Robertson. 2000. Integration of collocation statistics into the probabilistic retrieval model. In Stephen Robertson & Goker Ayse (eds.), Proceedings of the 22nd Annual Colloquium on Information Retrieval Research [ECIR], 165–177. Cambridge: Sidney Sussex College.Suche in Google Scholar
Viosca, R. Charles Jr., Blaise J. Bergiel & Phillip Balsmeier. 2004. Effects of the electronic Nigerian money fraud on the brand equity of Nigeria and Africa. Management Research News 27(6). 11–20.10.1108/01409170410784167Suche in Google Scholar
Viswamohan, Aysha Iqbal, Charles Hadfield & Jill Hadfield. 2010. ‘Dearest beloved one, I need your assistance’: the rhetoric of spam mail. ELT Journal 64(1). 85–94.10.1093/elt/ccp086Suche in Google Scholar
Walsh, Matthew. 2005. Collocation and the learner of English. Language teaching publications. Hove 2(7). 26–54.Suche in Google Scholar
Webb, Stuart & Paul Nation. 2008. Evaluating the vocabulary load of written text. TESOLANZ Journal 16. 1–10.Suche in Google Scholar
Wilson, Andrew. 2013. Embracing Bayes factors for key item analysis in corpus linguistics. In Markus Bieswanger & Amei Koll-Stobbe (eds.), New approaches to the study of linguistic variability (Language competence and language awareness in Europe 4), 3–11. Frankfurt: Peter Lang.Suche in Google Scholar
419 Advance Fee Fraud Statistics 2009. 2010. http://www.ultrascan-agi.com/public_html/html/aff_37_countries.htm.Suche in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis
- Register variation by Spanish users of English: The Nijmegen Corpus of Spanish English
- Recent change in the productivity and schematicity of the way-construction: A distributional semantic analysis
- A multivariate analysis of the partitive genitive in Dutch. Bringing quantitative data into a theoretical discussion
- Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis
- Prosodic modeling and position analysis of pragmatic markers in English conversation
Artikel in diesem Heft
- Frontmatter
- Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis
- Register variation by Spanish users of English: The Nijmegen Corpus of Spanish English
- Recent change in the productivity and schematicity of the way-construction: A distributional semantic analysis
- A multivariate analysis of the partitive genitive in Dutch. Bringing quantitative data into a theoretical discussion
- Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis
- Prosodic modeling and position analysis of pragmatic markers in English conversation