Home The Ultraviolet Bleach corpus
Article
Licensed
Unlicensed Requires Authentication

The Ultraviolet Bleach corpus

  • Natalia Knoblock EMAIL logo and Ryan Malkin
Published/Copyright: April 12, 2022
Linguistics Vanguard
From the journal Linguistics Vanguard

Abstract

This paper presents a new corpus of computer-mediated communication on the topic of Trump’s comments about household disinfectants and ultraviolet light as cures for COVID-19. The corpus, named the Ultraviolet Bleach (UVB) corpus, contains message board comments devoted to Trump’s suggestions. It was collected in May 2020 and consists of twenty-six files, with a total size of 2,344,164 word tokens. The paper includes frequency lists of words and clusters, keywords, and keywords in context, identified with the help of the corpus management software Sketch Engine. The corpus highlights the “receiving end” of political communication since it involves reactions to Trump’s televised briefing, and it will be of interest to researchers of political communication, populist and conspiratorial discourse, public opinion, and language aggression.


Corresponding author: Natalia Knoblock, English, Sagainaw Valley State University, University Center, USA, E-mail:

References

Baker, Paul & Erez Levon. 2015. Picking the right cherries? A comparison of corpus-based and qualitative analyses of news articles about masculinity. Discourse & Communication 9(2). 221–236. https://doi.org/10.1177/1750481314568542.Search in Google Scholar

Baker, Paul & Tony McEnery. 2005. A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal of Language and Politics 4(2). 97–226. https://doi.org/10.1075/jlp.4.2.04bak.Search in Google Scholar

Baker, Paul, Costas Gabrielatos, Majid Khosravinik, Michał Krzyżanowski, Tony McEnery & Ruth Wodak. 2008. A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society 19(3). 273–306. https://doi.org/10.1177/0957926508088962.Search in Google Scholar

Baker, Paul, Costas Gabrielatos & Tony McEnery. 2013. Discourse analysis and media attitudes: The representation of Islam in the British press. Cambridge: Cambridge University Press.10.1017/CBO9780511920103Search in Google Scholar

Bednarek, Monika. 2006. Evaluation in media discourse: Analysis of a newspaper corpus. London: Continuum.Search in Google Scholar

Bednarek, Monika & Helen Caple. 2014. Why do news values matter? Towards a new methodological framework for analyzing news discourse in critical discourse analysis and beyond. Discourse & Society 25(2). 135–158. https://doi.org/10.1177/0957926513516041.Search in Google Scholar

Beißwenger, Michael, Maria Ermakova, Alexander Geyken, Lothar Lemnitzer & Angelika Storrer. 2013. DeRiK: A German reference corpus of computer-mediated communication. Literary and Linguistic Computing 28(4). 531–537.10.1093/llc/fqt038Search in Google Scholar

Caldas-Coulthard, Carmen Rosa & Rosamund Moon. 2010. Curvy, hunky, kinky: Using corpora as tools for critical analysis. Discourse & Society 21(2). 99–133. https://doi.org/10.1177/0957926509353843.Search in Google Scholar

Chanier, Thierry, Céline Poudat, Benoit Sagot, Georges Antoniadis, Ciara R. Wigham, Linda Hriba, Julien Longhi & Djamé Seddah. 2014. The CoMeRe corpus for French: Structuring and annotating heterogeneous CMC genres. Journal for Language Technology and Computational Linguistics 29(2). 1–30.10.21248/jlcl.29.2014.187Search in Google Scholar

Cieliebak, Mark, Jan Milan Deriu, Dominic Egger & Fatih Uzdilli. 2017. A Twitter corpus and benchmark resources for German sentiment analysis. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 45–51. Valencia, Spain: Association for Computational Linguistics.10.18653/v1/W17-1106Search in Google Scholar

Cignarella, Alessandra Teresa, Cristina Bosco & Viviana Patti. 2017. Twittirò: A social media corpus with a multi-layered annotation for irony. In Roberto Basili, Malvina Nissim & Giorgio Satta (eds.), Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017). Rome, Italy (CEUR Workshop Proceedings 2006). Available at: http://ceur-ws.org/Vol-2006/.Search in Google Scholar

Davies, Mark. 2017. The new 4.3 billion word NOW corpus, with 4–5 million words of data added every day. Paper presented at the 9th International Corpus Linguistics Conference. Birmingham, UK, 24–28, July.Search in Google Scholar

Desigaud, Clementine, Philip N. Howard, Samantha Bradshaw, Bence Kollanyi & Gillian Bolsolver. 2017. Junk news and bots during the French presidential election: What are French voters sharing over Twitter in round two? (ComProp, OII, Data Memo). https://demtech.oii.ox.ac.uk/wp-content/uploads/sites/89/2017/05/What-Are-French-Voters-Sharing-Over-Twitter-Between-the-Two-Rounds-v7.pdf (accessed 9 July 2021).Search in Google Scholar

Duguid, Alison. 2010. Newspaper discourse informalisation: A diachronic comparison from keywords. Corpora 5(2). 109–138. https://doi.org/10.3366/cor.2010.0102.Search in Google Scholar

Flowerdew, Lynne. 2004. The argument for using English specialized corpora to understand academic and professional settings. In Ulla Connor & Thomas A. Upton (eds.), Discourse in professions: Perspectives from corpus linguistics, 11–33. Amsterdam: John Benjamins.10.1075/scl.16.02floSearch in Google Scholar

Gabrielatos, Costas & Paul Baker. 2008. Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK press, 1996–2005. Journal of English Linguistics 36(1). 5–38. https://doi.org/10.1177/0075424207311247.Search in Google Scholar

Gabrielatos, Costas & Anna Marchi. 2012. Keyness: Appropriate metrics and practical issues. Paper presented at Corpus-Assisted Discourse Studies International Conference. Bologna, Italy, 13–14 September 2012.Search in Google Scholar

Habernal, Ivan, Tomáš Ptáček & Josef Steinberger. 2014. Supervised sentiment analysis in Czech social media. Information Processing & Management 50(5). 693–707. https://doi.org/10.1016/j.ipm.2014.05.001.Search in Google Scholar

Habernal, Ivan, Tomáš Ptáček & Josef Steinberger. 2013. Sentiment analysis in Czech social media using supervised machine learning. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 65–74. Atlanta, Georgia: Association for Computational Linguistics. Available at: https://aclanthology.org/W13-1609.Search in Google Scholar

Hardt-Mautner, Gerlinde. 1995. Only connect: Critical discourse analysis and corpus linguistics (UCREL technical paper 6). Lancaster: University of Lancaster. http://ucrel.lancs.ac.uk/tech_papers.html (accessed 9 July 2021).Search in Google Scholar

Howard, Philip N., Gillian Bolsover, Bence Kollanyi, Samantha Bradshaw & Lisa-Maria Neudert. 2017. Junk news and bots during the US election: What were Michigan voters sharing over Twitter. (CompProp, OII, Data Memo). https://demtech.oii.ox.ac.uk/wp-content/uploads/sites/89/2017/03/What-Were-Michigan-Voters-Sharing-Over-Twitter-v2.pdf (accessed 9 July 2021).Search in Google Scholar

Jaworska, Sylvia. 2018. Corpus approaches: Investigating linguistic patterns and meanings. In Colleen Cotter & Daniel Perrin (eds.), Routledge handbook of language and media, 93–108. Abingdon: Routledge.10.4324/9781315673134-8Search in Google Scholar

Jaworska, Sylvia & Ramesh Krishnamurthy. 2012. On the F-word: A corpus-based analysis of the media representation of feminism in British and German press discourse, 1990–2009. Discourse & Society 23(4). 401–431. https://doi.org/10.1177/0957926512441113.Search in Google Scholar

Kilgarriff, Adam. 2009. Simple maths for keywords. Paper presented at the 2009 Corpus Linguistics Conference (CL2009). University of Liverpool, 21–23 July. https://www.sketchengine.eu/wp-content/uploads/2015/04/2009-Simple-maths-for-keywords.pdf (accessed 9 November 2021).Search in Google Scholar

Kilgarriff, Adam, Pavel Rychlý, Pavel Smrz & David Tugwell. 2004. The sketch ngine. In Geoffrey Williams & Sandra Vessier (eds.), Proceedings of the 11th EURALEX International Congress, 105–116. Lorient: Université de Bretagne-Sud, Faculté des lettres et des science humaine.Search in Google Scholar

Knoblock, Natalia & Ryan Malkin. 2021. Replication data for: The Ultraviolet Bleach corpus. Harvard Dataverse, V1. https://doi.org/10.7910/DVN/FWSTCA.Search in Google Scholar

Koester, Almut. 2010. Building small specialised corpora. In Anne O’Keeffe & Michael McCarthy (eds.), The Routledge handbook of corpus linguistics, 66–79. New York: Routledge.10.4324/9780203856949-6Search in Google Scholar

Krishnamurthy, Ramesh. 1996. Ethnic, racial and tribal: The language of racism? In Carmen R. Caldas-Coulthard & Malcolm Coulthard (eds.), Texts and practices: Readings in critical discourse analysis, 129–149. London: Routledge.Search in Google Scholar

McCreadie, Richard, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis & Dean McCullough. 2012. On building a reusable Twitter corpus. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, 1113–1114. Association for Computing Machinery.10.1145/2348283.2348495Search in Google Scholar

Mitra, Tanushree & Eric Gilbert. 2015. Credbank: A large-scale social media corpus with associated credibility annotations. In Proceedings of the Ninth International AAAI Conference on Web and Social Media, 258–267. Association for the Advancement of Artifical Intelligence. https://www.aaai.org/Library/ICWSM/icwsm15contents.php (accessed 9 July 2021).10.1609/icwsm.v9i1.14625Search in Google Scholar

Morley, John. 2003. The sting in the tail: Persuasion in English editorial discourse. In Alan Partington, John Morley & Louann Haarman (eds.), Corpora and discourse, 239–255. Frankfurt: Peter Lang.Search in Google Scholar

O’Keeffe, Anne. 2007. The pragmatics of corpus linguistics. Keynote presentation at the Fourth Corpus Linguistics Conference at the University of Birmingham, Birmingham, July 2007.Search in Google Scholar

Partington, Alan. 2003. Rhetoric, bluster and on-line gaffes: The tough life of a spin-doctor. In Jean Aitchison & Diana M. Lewis (eds.), New media language, 116–125. London: Routledge.Search in Google Scholar

Partington, Alan. 2010. Modern diachronic corpus-assisted discourse studies (MD-CADS) on UK newspapers: An overview of the project. Corpora 5(2). 83–108. https://doi.org/10.3366/cor.2010.0101.Search in Google Scholar

Partington, Alan & Anna Marchi. 2015. Using corpora in discourse analysis. In Douglas Biber & Randi Reppen (eds.), The Cambridge handbook of English corpus linguistics, 216–234. Cambridge: Cambridge University Press.10.1017/CBO9781139764377.013Search in Google Scholar

Petrović, Saša, Miles Osborne & Victor Lavrenko. 2010. The Edinburgh Twitter corpus. In Ben Hachey & Miles Osborne (eds.), Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media. Los Angeles, California: The Association for Computational Linguistics. https://aclanthology.org/W10-0513 (accessed 9 July 2021).Search in Google Scholar

Potts, Amanda. 2014. Love you guys (no homo): How gamers and fans play with sexuality, gender, and Minecraft on YouTube. Critical Discourse Studies 12(2). 163–186. https://doi.org/10.1080/17405904.2014.974635.Search in Google Scholar

Potts, Amanda, Monika Bednarek & Helen Caple. 2015. How can computer-based methods help researchers to investigate news values in large datasets? A corpus linguistic study of the construction of newsworthiness in the reporting on Hurricane Katrina. Discourse & Communication 9(2). 149–172. https://doi.org/10.1177/1750481314568548.Search in Google Scholar

Rajput, Adil & Samara Ahmed. 2019. Making a case for social media corpus for detecting depression. arXiv:1902.00702. Available at: https://arxiv.org/abs/1902.00702.Search in Google Scholar

Remarks by President Trump. 2020. Vice President Pence, and members of the Coronavirus Task Force in press briefing. https://www.whitehouse.gov/briefings-statements/remarks-president-trumpvice-president-pence-members-coronavirus-task-force-press-briefing-31/ (accessed 25 September 2020).Search in Google Scholar

Salama, Amir H. Y. 2011. Ideological collocation and the recontexualization of Wahhabi-Saudi Islam post-9/11: A synergy of corpus linguistics and critical discourse analysis. Discourse & Society 22(3). 315–342. https://doi.org/10.1177/0957926510395445.Search in Google Scholar

Sanguinetti, Manuela, Fabio Poletto, Cristina Bosco, Viviana Patti & Marco Stranisci. 2018. An Italian Twitter corpus of hate speech against immigrants. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association. https://aclanthology.org/L18-1443 (accessed 9 July 2021).Search in Google Scholar

Taylor, Carlotte. 2014. Investigating the representation of migrants in the UK and Italian press: A cross-linguistic corpus-assisted discourse analysis. International Journal of Corpus Linguistics 19(3). 368–400. https://doi.org/10.1075/ijcl.19.3.03tay.Search in Google Scholar

Van Dijk, Teun A. 1998. Ideology: A multidisciplinary approach. London: Sage.Search in Google Scholar

Vijay, Deepanshu, Aditya Bohra, Vinay Singh, Syed Sarfaraz Akhtar & Manish Shrivastava. 2018. Corpus creation and emotion prediction for Hindi-English code-mixed social media text. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 128–135. New Orleans, Louisiana: Association for Computation Linguistics.10.18653/v1/N18-4018Search in Google Scholar

Wodak, Ruth & Scott Wright. 2006. The European Union in cyberspace: Multilingual democratic participation in a virtual public sphere? Journal of Language and Politics 5(2). 251–275. https://doi.org/10.1075/jlp.5.2.07wod.Search in Google Scholar

Zaghouani, Wajdi & Anis Charfi. 2018. Arap-tweet: A large multi-dialect Twitter corpus for gender, age and language variety identification. In Proceedings of the 11th International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association. https://aclanthology.org/L18-1111 (accessed 9 July 2021).Search in Google Scholar

Received: 2020-12-21
Accepted: 2021-07-26
Published Online: 2022-04-12

© 2021 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 24.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/lingvan-2020-0145/html
Scroll to top button