Startseite Linguistik & Semiotik Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis

  • Punjaporn Pojanapunya ORCID logo EMAIL logo und Richard Watson Todd
Veröffentlicht/Copyright: 6. April 2018

Abstract

Keyword analysis is used in a range of sub-disciplines of applied linguistics from genre analyses to critically-oriented studies for different purposes ranging from producing a general characterization of a genre to identifying text-specific ideological issues. This study compares the use of log-likelihood (LL), a probability statistic, and odds ratio (OR), an effect size statistic, for keyword identification and argues that the two methods produce different keywords applicable to research focusing on different purposes. Through two case studies, keyword analyses of advance fee scams against the British National Corpus and research articles in applied linguistics against research articles from other academic disciplines, we show that both the LL and OR keywords concern the aboutness of the corpus, but differ in their specificity and pervasiveness through the corpus. LL highlights words which are relatively common in general use serving genre purposes, whereas OR highlights more specialized words serving critically-oriented purposes. Methodological and practical contributions to keyword analysis are discussed.

References

Adolphs, Svenja. 2006. Introducing electronic text analysis: A practical guide for language and literacy studies. New York: Routledge.10.4324/9780203087701Suche in Google Scholar

Adolphs, Svenja, Brian Brown, Ronald Carter, Paul Crawford & Opinder Sahota. 2004. Applying corpus linguistics in a health care context. Journal of Applied Linguistics 1(1). 9–28.10.1558/japl.1.1.9.55871Suche in Google Scholar

Agresti, Alan. 2002 [1990]. Categorical data analysis, 2nd edn. New York: Wiley.10.1002/0471249688Suche in Google Scholar

Agresti, Alan. 2007 [1996]. An introduction to categorical data analysis, 2nd edn. New York: Wiley.10.1002/0470114754Suche in Google Scholar

Anthony, Laurence. 2013a. AntWordProfiler (Version 1.4.0.1) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/software/antwordprofiler/ (accessed 8 October 2014).Suche in Google Scholar

Anthony, Laurence. 2013b. A critical look at software tools in corpus linguistics. Linguistic Research 30(2). 141–161.10.17250/khisli.30.2.201308.001Suche in Google Scholar

Anthony, Laurence. 2014. AntConc (Version 3.4.3) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/software/antconc/ (accessed 8 October 2014).Suche in Google Scholar

Baker, Paul. 2004. Querying key words: Questions of difference, frequency, and sense in key words analysis. Journal of English Linguistics 32(4). 346–359.10.1177/0075424204269894Suche in Google Scholar

Baker, Paul. 2006a. The question is, how cruel is it? Keywords, foxhunting and the House of Commons. Paper presented at AHRC ICT [Information and Communications Technology in Arts and Humanities Research] Methods Network Expert Seminar on Linguistics, Lancaster University, 8 September.Suche in Google Scholar

Baker, Paul. 2006b. Using corpora in discourse analysis. London: Continuum.10.5040/9781350933996Suche in Google Scholar

Bassi, Erica. 2010. A contrastive analysis of keywords in newspaper articles on the “Kyoto Protocol”. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 207–218. Amsterdam: John Benjamins.10.1075/scl.41.15basSuche in Google Scholar

Bestgen, Yves & Sylviane Granger. 2014. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 26(1). 28–41.10.1016/j.jslw.2014.09.004Suche in Google Scholar

Bestgen, Yves. 2014. Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary & Linguistic Computing 29(2). 164–170.10.1093/llc/fqt020Suche in Google Scholar

Bondi, Marina & Mike Scott (eds.). 2010. Keyness in texts. Amsterdam: John Benjamins.10.1075/scl.41Suche in Google Scholar

Bowker, Lynne & Jennifer Pearson. 2002. Working with specialized language: A practical guide to using corpora. London: Routledge.10.4324/9780203469255Suche in Google Scholar

Butler, Christopher S. 2001. A matter of give and take: Corpus linguistics and the predicate frame. Revista Canaria de Estudios Ingleses 42. 55–78.Suche in Google Scholar

Carreon, Jonathan Rante & Richard Watson Todd. 2011. Analysing private hospital websites from a critical perspective: Potential issues of methodology, analysis and interpretation of findings. In Proceedings of the International Conference on Doing Research in Applied Linguistics [DRAL], 26–36. Bangkok: King Mongkut’s University of Technology Thonburi.Suche in Google Scholar

Chujo, Kiyomi & Masao Utiyama. 2006. Selecting level-specific specialized vocabulary using statistical measures. System 34(2). 255–269.10.1016/j.system.2005.12.003Suche in Google Scholar

Crawford, Lynn, Julien Pollack & David England. 2006. Uncovering the trends in project management: Journal emphases over the last 10 years. International Journal of Project Management 24. 175–184.10.1016/j.ijproman.2005.10.005Suche in Google Scholar

Cruickshank, Douglas. 2001. I crave your distinguished indulgence (and all your cash). http://www.salon.com/2001/08/07/419scams/ (accessed 14 May 2015).Suche in Google Scholar

Cukier, Wendy L., Eva J. Nesselroth & Susan Cody. 2007. Genre, narrative and the “Nigerian letter” in electronic mail. Proceedings of the 40th Annual Hawaii International Conference on System Sciences [HICSS’07]. 70a. http://www.computer.org/csdl/proceedings/hicss/2007/2755/00/27550070a.pdf (accessed 25 May 2015).10.1109/HICSS.2007.238Suche in Google Scholar

Culpeper, Jonathan. 2002. Computers, language and characterisation: An analysis of six characters in Romeo and Juliet. In Ulla Melander-Marttala, Carin Östman & Merja Kytö (eds.), Conversation in life and in literature, 11–30. Uppsala: Universitetstryckeriet.Suche in Google Scholar

Culpeper, Jonathan. 2009. Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1). 29–59.10.1075/ijcl.14.1.03culSuche in Google Scholar

De Schryver, Gilles-Maurice. 2012. Trends in twenty-five years of academic lexicography. International Journal of Lexicography 25(4). 464–506.10.1093/ijl/ecs030Suche in Google Scholar

del-Teso-Craviotto, Marisol. 2006. Words that mater: Lexical choice and gender ideologies in women’s magazines. Journal of Pragmatics 38(11). 2003–2021.10.1016/j.pragma.2005.03.012Suche in Google Scholar

Dörnyei, Zoltán. 2007. Research methods in applied linguistics. Oxford: Oxford University Press.Suche in Google Scholar

Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1). 61–74.Suche in Google Scholar

Dyrud, Marilyn A. 2005. Letters, “I brought you a good news”: An analysis of Nigerian 419 letters. In Lisa E. Gueldenzoph (ed.), Proceedings of the 2005 Association for Business Communication Annual Convention [ABC], 1–11. Irvine: The Association for Business Communication.Suche in Google Scholar

Evert, Stefan. 2008. Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook 2, 1212–1248. Berlin & New York: Mouton de Gruyter.10.1515/9783110213881.2.1212Suche in Google Scholar

Feng, Haiying. 2006. A corpus-based study of research grant proposal abstracts. Perspectives: Working Papers in English and Communication 17(1). 1–24.Suche in Google Scholar

Freddi, Maria. 2005. Arguing linguistics: Corpus investigation of one functional variety of academic discourse. Journal of English for Academic Purposes 4(1). 5–26.10.1016/j.jeap.2003.09.002Suche in Google Scholar

Gabrielatos, Costas & Paul Baker. 2008. Fleeing, sneaking, flooding a corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press, 1996–2005. Journal of English Linguistics 36(1). 5–38.10.1177/0075424207311247Suche in Google Scholar

Gabrielatos, Costas & Anna Marchi. 2012. Keyness: Appropriate metrics and practical issues. Paper presented at Critical Approaches to Discourse Studies, University of Bologna, 13–14 September. http://repository.edgehill.ac.uk/4196/1/Gabrielatos%26Marchi-Keyness-CADS2012.pdf (accessed 20 September 2015).Suche in Google Scholar

Gabrielatos, Costas. 2007. Selecting query terms to build a specialised corpus from a restricted-access database. ICAME Journal 31. 5–44.Suche in Google Scholar

Gerbig, Andrea. 2010. Key words and key phrases in a corpus of travel writing. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 147–168. Amsterdam: John Benjamins.10.1075/scl.41.11gerSuche in Google Scholar

Gleick, James. 2003. You have spam. Australian Magazine March 15. 16. http://web.lexis-nexis.com/universe/document?_m=3550ffbea5787e1788de3f3a33bdabf&_docnum=48&wchp=dGLbVtz-zSkVb&+md5=34b249bcee6db14d8b237c3448899aab.Suche in Google Scholar

Goldstein, Alan. 2003. Growing junk e-mail traffic has become a ‘Headache.’ Hamilton Spectator [Ontario, Canada] August 12. http://web.lexis-nexis.com/universe/document?_m=35501T6bea5787e1788de3f3a33bdabf&_docnum=48&wchp=dGLbVtz-zSkVb&_md5=34b249bcee6db14d8b237c3448899aab.Suche in Google Scholar

Gooberman-Hill, Rachael, Melissa French, Paul Dieppe & Gillian Hawker. 2009. Expressing pain and fatigue: A new method of analysis to explore differences in osteoarthritis experience. Arthritis and Rheumatism 61(3). 353–360.10.1002/art.24273Suche in Google Scholar

Graham, Dougal. 2014. KeyBNC [Computer Software]. Bangkok: King Mongkut’s University of Technology Thonburi. http://crs2.kmutt.ac.th/Key-BNC/ (accessed 27 November 2014).Suche in Google Scholar

Gries, Stefan Th. 2014. Frequency tables, effect sizes, and explorations. In Dylan Glynn & Justyna Robinson (eds.), Corpus methods for semantics: Quantitative studies in polysemy and synonymy, 365–389. Amsterdam & Philadelphia: John Benjamins.10.1075/hcp.43.14griSuche in Google Scholar

Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber & Randi Reppen (eds.), The Cambridge handbook of English corpus linguistics, 50–72. Cambridge: Cambridge University Press.10.1017/CBO9781139764377.004Suche in Google Scholar

Grissom, Robert J. & John J. Kim. 2005. Effect sizes for research: A broad practical approach. New Jersey: Lawrence Erlbaum.Suche in Google Scholar

Hardie, Andrew. 2014. Log Ratio – an informal introduction. http://cass.lancs.ac.uk/?p=1133 (accessed 27 August 2015).Suche in Google Scholar

Jimarkon, Pattamawan & Richard Watson Todd. 2013. Red or yellow, peace or war: Agonism and antagonism in online discussion during the 2010 political unrest in Thailand. In Antoon De Rycker & Zuraidah Mohd Don (eds.), Discourse and crisis: Critical perspectives, 301–322. Amsterdam: John Benjamins.10.1075/dapsac.52.10jimSuche in Google Scholar

Kang, Ning & Qiaofeng Yu. 2011. Corpus-based stylistic analysis of tourism English. Journal of Language Teaching and Research 2(1). 129–136.10.4304/jltr.2.1.129-136Suche in Google Scholar

Kich, Martin. 2005. A rhetorical analysis of fund-transfer-scam solicitations. Cercles 14. 129–142.Suche in Google Scholar

Kilgarriff, Adam. 2001. Comparing corpora. International Journal of Corpus Linguistics 6(1). 97–133.10.1075/ijcl.6.1.05kilSuche in Google Scholar

Kotzé, Ernst Frederick. 2010. Author identification from opposing perspectives in forensic linguistics. Southern Africa Linguistics and Applied Language Studies 28(2). 185–197.10.2989/16073614.2010.519111Suche in Google Scholar

Kwary, Deny Arnos. 2011. A hybrid method for determining technical vocabulary. System 39(2). 175–185.10.1016/j.system.2011.04.003Suche in Google Scholar

Lamberger, Igor, Bojan Dobovšek & Boštjan Slak. 2013. Analysis of the fraudulent letters A.K.A. Nigerian letters. In Gorazd Meško, Andrej Sotlar & Jack R. Greene (eds.), Proceedings of the Biennial International Conference: Criminal Justice and Security–Contemporary Criminal Justice Practice and Research, 443–466. Ljubljana: University of Maribor. https://www.ncjrs.gov/pdffiles1/242949.pdf (accessed 25 May 2015).Suche in Google Scholar

Leone, Paola. 2010. General spoken language and school language: Key words and discourse patterns in history textbooks. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 234–248. Amsterdam: John Benjamins.10.1075/scl.41.17leoSuche in Google Scholar

Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki & Heikki Mannila. 2014. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 29(4). http://users.ics.aalto.fi/lijffijt/articles/lijffijt2015a.pdf (accessed 20 September 2015).10.1093/llc/fqu064Suche in Google Scholar

Ljung, Magnus. 2002. What vocabulary tells us about genre differences: A study of lexis in five newspaper genres. Language and Computers 40(1). 181–196.10.1163/9789004334267_011Suche in Google Scholar

Loudermilk, Brandon Conner. 2007. Occluded academic genres: An analysis of the MBA thought essay. Journal of English for Academic Purposes 6(3). 190–205.10.1016/j.jeap.2007.07.001Suche in Google Scholar

Malavasi, Donatella & Davide Mazzi. 2010. History v. marketing: Keywords as a clue to disciplinary epistemology. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 169–184. Amsterdam: John Benjamins.10.1075/scl.41.12malSuche in Google Scholar

Martínez, Antonia Sánchez. 2008. Collocation analysis of a sample corpus using some statistical measures: An empirical approach. In Rafael Monroy & Aquilino Sánchez (eds.), Proceedings of the 25th International AESLA [The Spanish Society for Applied Linguistics] Conference: 25 years of Applied Linguistics in Spain: milestones and challenges, 763–768. Murcia: University of Murcia.Suche in Google Scholar

Moudraia, Olga. 2003. The student engineering corpus: Analysing word frequency. In Dawn Archer, Paul Rayson, Andrew Wilson & Tony McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference [CL2003], 552–561. Lancaster: Lancaster University.Suche in Google Scholar

Nassaji, Hossein. 2012. Statistical significance tests and result generalisability. In Graeme Porte (ed.), Replication research in applied linguistics, 92–115. Cambridge: Cambridge University Press.Suche in Google Scholar

Nation, Pual & Alex Heatley. 2002. Range: A program for the analysis of vocabulary in texts [Computer Software]. Wellington: Victoria University. http://www.victoria.ac.nz/lals/about/staff/paul-nation (accessed 19 September 2014).Suche in Google Scholar

O’Halloran, Kieran. 2011. Investigating argumentation in reading groups: Combining manual qualitative coding and automated corpus analysis tools. Applied Linguistics 32(2). 172–196.10.1093/applin/amq041Suche in Google Scholar

Oakes, Michael P. 2008. Measures from information retrieval to find the words which are characteristic of a corpus. In Barbara Lewandowska-Tomaszczyj (ed.), Corpus linguistics, computer tools, and applicationsstate of the art: PALC 2007, 127–138. Frankfurt: Peter Lang.Suche in Google Scholar

Paquot, Magali & Yves Bestgen. 2009. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Andreas H. Jucker, Daniel Schreier & Marianne Hundt (eds.), Corpora: Pragmatics and discourse, 247–269. Amsterdam & New York: Rodopi.10.1163/9789042029101_014Suche in Google Scholar

Rayson, Paul & Roger Garside. 2000. Comparing corpora using frequency profiling. In Proceedings of the Workshop on Comparing Corpora [WCC’00], 1–6. Hong Kong: Association for Computational Linguistics.10.3115/1117729.1117730Suche in Google Scholar

Rayson, Paul. 2008a. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4). 519–149.10.1075/ijcl.13.4.06raySuche in Google Scholar

Rayson, Paul. 2008b. Log-likelihood and effect size calculator. http://ucrel.lancs.ac.uk/llwizard.html (accessed 27 August 2015).Suche in Google Scholar

Rayson, Paul. 2009. Wmatrix: a web-based corpus processing environment [Computer Software]. Lancaster: Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/Suche in Google Scholar

Rayson, Paul. 2013. Corpus analysis of key words. In Carol A. Chapelle (ed.), The encyclopaedia of applied linguistics, 1–7. Oxford: Wiley-Blackwell.10.1002/9781405198431.wbeal0247Suche in Google Scholar

Rayson, Paul, Damon Berridge & Brian Francis. 2004. Extending the Cochran rule for the comparison of word frequencies between corpora. In Gérald Purnelle, Cédrick Fairon & Anne Dister (eds.), Proceedings of the 7th International Conference on Statistical Analysis of Textual Data [JADT], 926–936. Louvain-la-Neuve: UCL Presses universitaires de Louvain.Suche in Google Scholar

Renström, Caroline. 2011. Framing Obama: A comparative study of keywords and frames in two Washington newspapers. Stockholm: Stockholm University Bachelor Degree Thesis. http://su.diva-portal.org/smash/get/diva2:479520/FULLTEXT01 (accessed 24 September 2013).Suche in Google Scholar

Römer, Ute & Stefanie Wulff. 2010. Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research 2(2). 99–127.10.17239/jowr-2010.02.02.2Suche in Google Scholar

Schaffer, Deborah. 2012. The language of scam spams: linguistic features of “Nigerian fraud” e-mails. et Cetera 69(2). 157–179.Suche in Google Scholar

Scharl, Arno & Albert Weichselbraun. 2008. An automated approach to investigating the online media coverage of US presidential elections. Journal of Information Technology and Politics 5(1). 121–132.10.1080/19331680802149582Suche in Google Scholar

Schmitt, Norbert. 2010. Researching vocabulary: A vocabulary research manual. Basingstoke: Palgrave Macmillan.10.1057/9780230293977Suche in Google Scholar

Scott, Mike & Christopher Tribble. 2006. Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins.10.1075/scl.22Suche in Google Scholar

Scott, Mike. 1997. PC analysis of key words – and key key words. System 25(2). 233–245.10.1016/S0346-251X(97)00011-0Suche in Google Scholar

Scott, Mike. 2000. Focusing on the text and its key words. In Lou Burnard & Tony McEnery (eds.), Rethinking language pedagogy from a corpus perspective, 103–122. Frankfurt: Peter Lang.Suche in Google Scholar

Scott, Mike. 2015. WordSmith Tools (Version 6.0) [Computer Software]. Oxford: Oxford University Press.Suche in Google Scholar

Seale, Clive 2008. Mapping the field of medical sociology: A comparative analysis of journals. Sociology of Health & Illness 30(5). 677–695.10.1111/j.1467-9566.2008.01090.xSuche in Google Scholar

Seale, Clive, Sue Ziebland & Jonathan Charteris-Black. 2006. Gender, cancer experience and internet use: A comparative keyword analysis of interviews and online cancer support groups. Social Science and Medicine 62(10). 2577–2590.10.1016/j.socscimed.2005.11.016Suche in Google Scholar

Sealey, Alison. 2009. Probabilities and surprises: A realist approach to identifying linguistic and social patterns, with reference to an oral history corpus. Applied Linguistics 31(2). 215–235.10.1093/applin/amp023Suche in Google Scholar

Stubbs, Michael. 1995. Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language 2(1), 23–55.10.1075/fol.2.1.03stuSuche in Google Scholar

Sweeney, Latanya. 2006. Protecting job seekers from identity theft. IEEE Internet Computing 10(2). http://dataprivacylab.org/dataprivacy/projects/idangel/paper3.pdf (accessed 25 May 2015).10.1109/MIC.2006.40Suche in Google Scholar

Thompson, Geoff. 2004 [1996]. Introducing functional grammar, 2nd edn. London: Arnold.Suche in Google Scholar

Tomokiyo, Takashi & Matthew Hurst. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL [Association for Computational Linguistics] 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment [MWE’03], 33–40. Sapporo: Association for Computational Linguistics.10.3115/1119282.1119287Suche in Google Scholar

Vechtomova, Olga & Stephen Robertson. 2000. Integration of collocation statistics into the probabilistic retrieval model. In Stephen Robertson & Goker Ayse (eds.), Proceedings of the 22nd Annual Colloquium on Information Retrieval Research [ECIR], 165–177. Cambridge: Sidney Sussex College.Suche in Google Scholar

Viosca, R. Charles Jr., Blaise J. Bergiel & Phillip Balsmeier. 2004. Effects of the electronic Nigerian money fraud on the brand equity of Nigeria and Africa. Management Research News 27(6). 11–20.10.1108/01409170410784167Suche in Google Scholar

Viswamohan, Aysha Iqbal, Charles Hadfield & Jill Hadfield. 2010. ‘Dearest beloved one, I need your assistance’: the rhetoric of spam mail. ELT Journal 64(1). 85–94.10.1093/elt/ccp086Suche in Google Scholar

Walsh, Matthew. 2005. Collocation and the learner of English. Language teaching publications. Hove 2(7). 26–54.Suche in Google Scholar

Webb, Stuart & Paul Nation. 2008. Evaluating the vocabulary load of written text. TESOLANZ Journal 16. 1–10.Suche in Google Scholar

Wilson, Andrew. 2013. Embracing Bayes factors for key item analysis in corpus linguistics. In Markus Bieswanger & Amei Koll-Stobbe (eds.), New approaches to the study of linguistic variability (Language competence and language awareness in Europe 4), 3–11. Frankfurt: Peter Lang.Suche in Google Scholar

419 Advance Fee Fraud Statistics 2009. 2010. http://www.ultrascan-agi.com/public_html/html/aff_37_countries.htm.Suche in Google Scholar

Published Online: 2018-4-6
Published in Print: 2018-4-25

© 2018 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 30.12.2025 von https://www.degruyterbrill.com/document/doi/10.1515/cllt-2015-0030/html
Button zum nach oben scrollen