Home Veracity and register in fake news analysis
Article Open Access

Veracity and register in fake news analysis

  • Bashayer Baissa , Matteo Fuoli EMAIL logo and Jack Grieve
Published/Copyright: April 14, 2025
Linguistics Vanguard
From the journal Linguistics Vanguard

Abstract

This article addresses two fundamental methodological challenges in linguistic fake news research: how to reliably classify news texts as real versus fake and how to control for register variation when building datasets for comparative analysis. We call for a rethinking of the veracity labels produced by fact-checking services. While fact-checkers remain a valuable resource for identifying fake news texts, their labels are more productively seen as a proxy for the communicative intent of the author, rather than as an absolute measure of veracity. This perspective emphasizes communicative intent as the only viable explanation for systematic linguistic differences between real and fake news and sidesteps the politically charged notion of truth. To address potential confounds caused by register and topic variation, we propose a multipronged comparative approach. This method analyzes fake news alongside real news on the same topic from a variety of news registers, allowing us to isolate linguistic differences driven by intent. We exemplify this approach by building a comparative corpus focused on climate change, vaccinations, and COVID-19, which we make available upon request for other researchers to use.

1 Introduction

Research on fake news has experienced significant growth in recent years, driven by mounting concerns over its detrimental impact on society. At the heart of this field lies a crucial challenge: how to build corpora that allow for the identification of reliable linguistic markers that distinguish fake from real news. This article addresses two main aspects of this challenge. First, we reexamine the widespread use of veracity labels from fact-checking services to identify fake news articles. We problematize these labels and stress the importance of fully understanding what they represent to accurately interpret the results of comparative analyses of fake versus real news. Second, we highlight the need to consider register variation as a potential confounding factor in fake news research. Often, studies do not match the register and content between real and fake news stories (Shu et al. 2019; Torabi Asr and Taboada 2019), leading to results that might highlight spurious topic or stylistic differences rather than isolating genuinely distinctive features of fake news.

After discussing these issues, we suggest actionable solutions for improving how we build and analyze corpora of fake news texts. We call for a rethinking of the role of fact-checkers and the meaning and implications of the veracity labels they assign. We argue that fact-checkers are a useful resource for identifying potentially fake news texts, but their labels should be interpreted as reflecting where a piece of news stands in relation to the mainstream consensus at the time on a particular event, rather than as an absolute measure of truth. In line with this idea and the principle that communicative intent is key to understanding systematic stylistic differences in language (Grieve and Woodfield 2023), we propose viewing veracity labels as indirect evidence of the author’s intent to challenge mainstream news narratives. We believe this more nuanced perspective not only represents a more productive and accurate way to understand the process of fact-checking, but also allows us to interpret and explain variation observed in fake news corpora compiled through fact-checking. Moreover, this approach redresses the prevailing binary view that fake news is solely intended to deceive – a notion often difficult or impossible to verify.

To address potential confounds caused by register and topic variation, we propose a multipronged comparative approach. Rather than comparing fake news with a single set of real news, we evaluate it against topic-matched texts from four different news registers: broadsheets, tabloids, web-based outlets, and blogs. We exemplify this approach to data collection by providing a detailed account of the procedure we followed to compile a comparative corpus of fake and real news articles. In addition to fake news, this corpus includes real news drawn from four news registers (broadsheets, tabloids, web-based outlets, blogs) and across three topic areas that have become associated with relatively high levels of misinformation (climate change, vaccinations, COVID-19). We make the corpus available upon request to promote further empirical work in the field.

2 Two major challenges in the linguistic study of fake news

Linguistic and natural language processing (NLP) research aims to discover stylistic patterns that distinguish fake news from real news, often with the goal of developing software that automatically detects and stops the spread of harmful disinformation. For this to work, research must focus on the underlying linguistic structure of fake news discourse, not just the topics discussed (Grieve and Woodfield 2023: 5). Achieving this goal requires building a balanced corpus of both fake and real news texts for comparative analysis. However, this process faces two significant challenges.

First, accurately classifying news as real or fake is difficult. Traditionally, researchers have ostensibly used veracity – the factual accuracy of the content – as the main criterion for categorizing news texts (e.g., Horne and Adali 2017; Shu et al. 2019; Torabi Asr and Taboada 2019; Wang 2017). Increasingly, veracity is assessed by researchers by using the ratings of reputable fact-checking services like Snopes and PolitiFact, which independently verify claims made in news texts against sources deemed trustworthy (e.g., Põldvere et al. 2023; Rashkin et al. 2017; Torabi Asr and Taboada 2019; Torabi Asr et al. 2024). Fact-checkers are an invaluable resource for identifying potentially false news, providing a more accurate approach than methods relying solely on the reputation of the source (Torabi Asr and Taboada 2019).

However, this approach has limitations and important implications that must be considered and factored into any comparative analyses based on fact-checked fake news corpora. First, concerns have been raised about the potential for political bias in fact-checking services, which may in some cases compromise their ability to distinguish between genuine fake news and news that simply challenges their own viewpoints (Graves 2017; Grieve and Woodfield 2023). More importantly, truth itself can be a moving target. For example, the narrative about Iraq possessing weapons of mass destruction, which heavily influenced the 2003 war and was endorsed by major mainstream media organizations, was later revealed to be based on faulty intelligence (Chilcot 2016). Similarly, the possibility of a lab leak origin for COVID-19, initially dismissed as a conspiracy theory, is now being more seriously investigated (e.g., Bloom et al. 2021).

In light of this, it is crucial to acknowledge that the veracity labels provided by fact-checking services are not absolute measures of factual accuracy. Instead, these labels indicate how a piece of news aligns with the mainstream consensus at the time regarding a particular event. This interpretation more accurately represents the way the fact-checking process actually works in practice. Fact-checkers do not perform independent journalistic investigations “on the ground”. Rather, they generally evaluate news items against mainstream sources like other news articles from reputable outlets, as well as governmental, academic, and scientific reports that can be found online (Graves 2017). This type of validation functions as fact-checking, but crucially, it assumes the mainstream narrative is true and compares the questioned news article to this narrative. It does not independently verify the veracity of the questioned news article or, for that matter, the mainstream sources themselves.

The other problem with treating fact-checkers’ labels as a proxy for veracity is that there is no reason to believe that factual accuracy alone automatically leads to distinct linguistic styles. An honest mistake, for example, would not necessarily result in a stylistically different text compared to a fully accurate one written by the same person (Grieve and Woodfield 2023: 20). Grieve and Woodfield (2023) argue that a more reliable way to distinguish fake from real news is by looking at the writer’s intent. They define fake news as intentionally dishonest information designed to mislead the audience (Grieve and Woodfield 2023: 5). Their analysis of fabricated stories by former New York Times journalist Jayson Blair highlights systematic differences in writing style between his genuine and fake news pieces. Crucially, Grieve and Woodfield (2023) propose that these differences are not accidental; they are a direct result of the pressure Blair faced to produce fake content quickly and his knowledge that he was fabricating information. This pressure led to lower information density in his fake articles, as he lacked the time for the careful editing typical of his real news writing. Additionally, his awareness of lying likely made his writing tone less assertive, subtly revealing the uncertainty behind the fabricated stories.

Recognizing the centrality of communicative intent as the only viable explanatory criterion for systematic differences between fake and real news, we propose that veracity labels produced by fact-checking organizations are more productively seen as a proxy for the communicative intent of the author, rather than as a measure of veracity. Grieve and Woodfield’s (2023) emphasis on deception, however, is overly restrictive. It aptly fits the case of Jayson Blair but fails to capture many forms of journalism generally considered by society and fact-checkers to be fake news. Research across disciplines shows that fake news can be motivated by various factors, including ideological, financial, and social/psychological goals (Baptista and Gradim 2020; Osmundsen et al. 2021; Rini 2017; Zannettou et al. 2019). This suggests that some fake news writers, especially on partisan platforms like Mediaite (left leaning) or The Gateway Pundit (right leaning), genuinely believe their content is true, seeing their work as a form of political activism, not deceit. For example, despite official reviews and court rulings confirming that the 2020 US election was legitimate, a significant portion of Republican voters believe it was undermined by widespread voter fraud (The Guardian 2021). This suggests at least some authors promoting this conspiracy theory through fake news texts might genuinely subscribe to it. Thus, while some fake news is certainly driven by the intent to deceive, this does not fully capture the entire spectrum of motivations behind it.

To reconcile these complex and seemingly divergent perspectives, we suggest that veracity labels by fact-checkers generally reflect the communicative intent of fake news authors to challenge mainstream views on a given topic. This conceptualization better represents how fact-checkers operate. It also moves beyond the problematic and politically charged notion of truth as previously discussed. Most crucially, this perspective can enhance our understanding of the phenomenon of fake news. By viewing veracity labels as a proxy for the intent to challenge mainstream narratives, we can reinterpret the results of previous studies, uncovering new insights into the linguistic patterns and persuasive mechanisms in fake news texts. For example, in a recent study, Baissa et al. (2024) found that fake news stories are discursively constructed as revelatory and disruptive and that this is a key part of their rhetorical appeal. We are not claiming that fake news lacks an intent to deceive – much of it does, of course. Rather, we are claiming that when we collect fake news data using fact-checked veracity labels, we are effectively gathering news that falls outside the mainstream consensus. This variation in the status of news necessarily influences the communicative intent behind news texts and, consequently, shapes the linguistic patterns we observe. In other words, authors of fake news texts are likely aware that their content is fringe and potentially controversial. This awareness drives them to proactively challenge mainstream consensus to create space for their news, and this intent is reflected in their linguistic choices.

The second major challenge in building a corpus for identifying linguistic markers of fake news is compiling a suitable set of real news for comparison. To isolate genuine differences between real and fake news, reference texts must be as closely matched as possible. Crucially, studying the language of fake news requires controlling for register variation – linguistic differences stemming from communicative purpose and context (Biber and Conrad 2009). Without this control, any analysis risks misattributing differences. For instance, if real news comes from traditional newspapers and fake news from blogs, differences in language use could either be due to news authenticity or to inherent stylistic differences between formal newspaper writing and the informal style of blogs (Biber 1988; Grieve et al. 2010). We are also likely to find significant topical differences between these registers, especially when comparing fringe fake news outlets to broadsheet press, which may also confound the results of our analysis.

Register imbalance is a common pitfall in fake news datasets used for NLP research. For instance, the LIAR dataset (Wang 2017), which has been used in many studies (e.g., Aslam et al. 2021; Shu et al. 2019), includes fact-checked news statements from diverse sources like news reports, speeches, and social media posts. The dataset lacks information on how registers were chosen or how the distribution of true/false statements is balanced across them. Similarly, the BuzzFeed fake news dataset (Silverman et al. 2016), also used in many studies (e.g., Mangal and Sharma 2021; Shu et al. 2017; Torabi Asr et al. 2024), compares mainstream news outlets with hyper-partisan Facebook pages. As a result, fake news identification systems trained on such datasets might achieve high accuracy, but this could primarily be driven by broad register differences between the “real” and “fake” news sources, rather than by differences in communicative intent, as defined above.

3 Building a multi-register comparative corpus of real and fake news

We have compiled a comparative corpus of fake and real news articles as part of a larger project that investigates the structural, stylistic, and discursive features of fake news from multiple methodological perspectives (Baissa 2024). The corpus focuses on three topics that have been significant targets of misinformation: climate change, vaccinations, and COVID-19. The global reach of these issues means that misinformation can quickly spread with serious consequences. For example, vaccine hesitancy or ineffective measures to combat COVID-19 have created significant health risks, while false narratives about climate change can hinder efforts to tackle this problem.

To address the methodological challenges discussed above, we adopted a multipronged data collection and comparison strategy. In line with previous work, we identify fake news texts based on assessments from well-established fact-checking services. Using fact-checkers is preferable to simply relying on sources known for publishing false information, like conspiracy-pseudoscience websites such as Natural News or Infowars (e.g., Horne and Adali 2017; Volkova et al. 2017). These sources may also publish real news, making such an approach less reliable. With fact-checkers, we can have a reasonable degree of confidence that most of the content of flagged news texts is plausibly factually incorrect at the time of assessment. Crucially, however, instead of treating fact-checking scores as absolute measures of veracity, we view them as a proxy for communicative intent, defined as the author’s aim to either support or challenge mainstream news narratives.

In addition, to control for register variation and isolate the distinctive linguistic features of fake news, our comparative or “reference” corpus includes a careful selection of topic-matched real news articles from four registers: broadsheets, tabloids, web-based publications, and news blogs. By conducting multiple comparisons against these diverse registers, we can distinguish linguistic variation driven purely by stylistic register differences from variation likely motivated by news writers’ communicative intent.

To select articles for our “focus” corpus of fake news, we consulted five reputable fact-checking websites: Snopes, PolitiFact, FactCheck.org, AFP Fact Check, and Reuters. We used several strategies to identify suitable texts:

  1. Collections and archives: PolitiFact and FactCheck.org maintain collections of false news related to vaccination, climate change, and COVID-19. These served as our primary sources.

  2. Keyword search: We conducted searches on all five websites using terms like “vaccine(s)”, “vaccination”, “climate change”, “global warming”, “covid-19”, “covid”, “coronavirus”, and so on. We then filtered results by date and veracity label.

  3. Source search: On PolitiFact, we specifically searched for false articles by bloggers (excluding social media posts).

Our selection criteria for fake news articles were:

  1. Register: texts had to have a news article format and be published by news websites or blogs (not social media, WhatsApp messages, memes, etc.).

  2. Veracity: Articles had to be rated “Pants on Fire”, “False”, or “Mostly False” by the fact-checking websites to ensure that the majority of their content could plausibly be considered inaccurate at the time of assessment.

Using these search methods and criteria, we carefully examined fact-checked articles related to our three topics. We followed hyperlinks provided by the fact-checking services to locate the original news items, verified their suitability, and archived them if appropriate. Collected articles were saved as text files (title and body text only), excluding extraneous elements like author names, dates, images, advertisements, comments, and links. We included 137 articles in our focus corpus. A metadata table was created to record the following for each article: claim, fact-checking URL, original article URL, veracity label, publication date, topic, and source type. Table 1 provides an overview of the corpus, including the number of articles collected per topic, average text length, and text source types. As the table shows, the number of suitable articles that met our criteria and were retrievable across different topics varies. This variation likely reflects the relative prevalence of fake news on each topic.

Table 1:

Overview of the focus corpus.

False news topic No. of articles Average no. of words SD of no. of words Total no. of words Sources Publication dates
Climate change 20 887 708 17,736 Online tabloids, news websites, blogs 2011–2020
Vaccination 27 903 568 24,372 News websites, blogs 2013–2019
COVID-19 90 717 724 64,565 Online tabloids, news websites, blogs January 2020–December 2020
Total 137 779 693 106,673

The reference corpus of real news is made up of four equally sized parts (137 articles each), representing broadsheets, tabloids, web-based publications, and news blogs. These texts were collected from a range of reputable mainstream news sources, as shown in Table 2. Drawing on mainstream news outlets was necessary to obtain enough articles representing diverse news types because verified real news available from fact-checking websites is limited. However, it is important to acknowledge that even reputable news sources occasionally publish factually inaccurate content (e.g., Grieve and Woodfield 2023). To identify and collect suitable articles, we searched the news database Nexis Advance UK using the following keywords within headlines and lead sections: “vaccine(s)”, “vaccination”, “climate change”, “global warming”, “coronavirus”, “covid-19”, and “covid”. For broadsheets and tabloids, only the printed versions were included. Web-based publications were restricted to well-established online news sources from three English-speaking countries (US, UK, and Australia). We selected five to seven reputable sources for each register. The news blog corpus, on the other hand, came from Newstex LLC. This company gathers full-text content from high-quality blogs on various topics and delivers them through the Nexis Advance UK database.

Table 2:

News sources used to compile the reference subcorpus.

Broadsheet sources New York Times

USA Today

The Times (UK)

The Independent (UK)

The Australian

Sydney Morning Herald (Australia)
Tabloid sources Daily News (New York)

The Sun (UK)

The Daily Mail and Mail on Sunday (UK)

The Daily Telegraph (Australia)

The Age (Australia)
Web-based publication sources CNN.com

Business Insider US

Politics Home (UK)

Progressive Media (UK)a

BBC World (UK)

WA Today (Australia)

ABC Premium News (Australia)
Blog sources Newstex blogsb
  1. aThis news source does not have articles on COVID-19. bThe blogs delivered by Newstex LLC are listed at http://www.lexisnexis.com/documents/academic/academic_migration/Newstex%20Blogs%20on%20LexisNexis.doc (accessed 31 October 2024).

Table 3 shows the composition of the reference subcorpus. News texts were randomly selected from Nexis Advance UK using a random number generator.[1] To ensure alignment with the focus corpus of fake news, article counts, publication dates, and word limits (300–4,000 words) were matched. Additionally, the reference corpus includes both news and opinion articles to mirror the composition of the fake news dataset. The articles downloaded from Nexis Advance UK were manually checked for relevance, converted to text files, and thoroughly cleaned using the search and replace function within SarAnt (Anthony 2016). This cleaning process removed all metadata (news source, author, dates, markers) and extraneous elements (contact information, notes, links). Duplicate articles, which are often automatically generated by the database, were manually eliminated. The goal was to isolate the title and body text of each article, ensuring a structure comparable to the fake news subcorpus.

Table 3:

Breakdown of the reference corpus.

Reference subcorpus No. of articles on climate change No. of articles on vaccination No. of articles on COVID-19 Total no. of articles Average no. of words SD of the no. of words Total no. of words
Broadsheets 20 27 90 137 683 273 93,533
Tabloids 20 27 90 137 600 260 82,183
Web-based publications 20 27 90 137 700 282 95,884
Blogs 20 27 90 137 578 203 79,198
Total 80 108 360 548 640 261 350,798

It is important to acknowledge that the focus corpus, while meticulously constructed, is relatively small. However, as Egbert (2019) notes, even smaller corpora can yield valid results if they are carefully designed with a specific target domain in mind, particularly when the research questions do not involve infrequent linguistic features. Another limitation of the focus corpus is its exclusive coverage of health and environmental topics, which may restrict the generalizability of the findings to other domains. A valuable direction for future research would be to expand the corpus to include a broader range of topics beyond those covered in the current version. Nonetheless, the corpus benefits from including articles by diverse authors from various news outlets over nearly a decade (2011–2021). This diversity enhances both its applicability and the validity of results derived from it. Although the distribution of articles across the three topics is uneven, this should not be a major concern given that the primary purpose of the corpus is to facilitate systematic comparisons between fake and mainstream news, and the texts in the two subcorpora are matched in terms of number and size.

In order to support empirical research in linguistics and NLP on the language of fake news and fake news identification, we make this corpus available upon request. We encourage researchers from any field interested in better understanding fake news to make use of this corpus as they see fit, citing this paper to acknowledge its source.

4 Conclusions

In this paper, we have discussed and sought to offer workable methodological solutions to two key challenges in collecting data for linguistic fake news research: how to reliably distinguish fake from real news and how to minimize the confounding impact of register variation. We have critically reexamined the meaning and implications of the veracity labels produced by fact-checking services, which are commonly used as the criterion for identifying fake news texts. We argue that while fact-checkers remain vital tools for identifying suitable texts for analysis, their labels should be considered as a proxy for the communicative intent of fake news writers rather than absolute pronouncements of truth. Specifically, we propose that veracity labels by fact-checkers are more appropriately and productively interpreted as indirect evidence for the communicative intent of fake news authors to challenge mainstream views on a given topic. This conceptualization represents how fact-checkers actually operate, overcomes the problematic and politically charged notion of truth, and accounts for fake news written by authors who genuinely believe their content is true, seeing their work as a form of political activism, not deceit. In line with this revised definition, we argue that systematic linguistic differences between real and fake news can stem not only from a conscious desire to deceive other people, but also from the author’s awareness of their position relative to the established narrative and their desire to either establish/support or challenge prevailing narratives. This idea finds empirical corroboration in recent discourse analytical work, which shows that “subversiveness”, defined as “the discursive construction of a story as revelatory and disruptive, challenging standard assumptions and debunking accepted wisdom”, is a distinctive “news value” of fake news (Baissa et al. 2024: 16). We argue that this conceptual shift is necessary and useful for achieving a better understanding of the linguistic mechanisms and persuasive appeal of fake news.

To address the methodological pitfalls of register and topic mismatches in existing fake news datasets, we propose a multipronged comparative approach. This strategy analyzes fake news against topic-matched texts from multiple news registers, allowing us to more accurately and more precisely distinguish linguistic variation driven by context and text type from variation driven by communicative intent. We believe that the data collection and comparison strategy presented here strikes a good balance between practicality and rigor. It provides a concrete, workable method for building sizable comparative corpora of real and fake news leveraging well-established data collection practices.

In particular, we make our corpus as outlined in this paper available for other researchers, although we also encourage other researchers to build additional corpora following the methodological procedures and theoretical considerations introduced in this paper so as to build large and more comparable corpora of real and fake news, not only to expand our descriptive and theoretical understanding of this important societal topic, but especially to build larger datasets needed for developing NLP systems for fake news detection.

Overall, we believe this research has significant implications for the linguistic study of fake news. By rethinking the role of fact-checkers and their veracity labels, we can gain a more realistic and nuanced understanding of what drives fake news discourse and its underlying linguistic patterns. Future studies should expand this work by considering the complexities of partisan misinformation, where the intent may be more focused on influencing opinion than outright deception and where the identity of the authors is often difficult to establish. Our detailed account of our corpus compilation process provides a replicable and transparent method that can be readily adopted by future researchers, fostering greater rigor and systematicity in fake news analysis.


Corresponding author: Matteo Fuoli, University of Birmingham, Birmingham, England, E-mail:

Funding source: Taif University, Ministry of Education, Saudi Arabia

  1. Research funding: This work is supported by Taif University, Ministry of Education, Saudi Arabia.

References

Anthony, Laurence. 2016. SarAnt, version 1.1.0. Available at: https://www.laurenceanthony.net/software/sarant/.Search in Google Scholar

Aslam, Nida, Irfan Ullah Khan, Farah Salem Alotaibi, Lama Abdulaziz Aldaej & Asma Khaled Aldubaikil. 2021. Fake detect: A deep learning ensemble model for fake news detection. Complexity 2021. 1–14. https://doi.org/10.1155/2021/5557784.Search in Google Scholar

Baissa, Bashayer. 2024. False news discourse online: Corpus-assisted discourse analyses. Birmingham: University of Birmingham PhD Dissertation.Search in Google Scholar

Baissa, Bashayer, Matteo Fuoli & Jack Grieve. 2024. The news values of fake news. Discourse & Communication. Advance online publication https://doi.org/10.1177/17504813241280489.Search in Google Scholar

Baptista, João Pedro & Anabela Gradim. 2020. Understanding fake news consumption: A review. Social Sciences 9(185). 1–22. https://doi.org/10.3390/socsci9100185.Search in Google Scholar

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Search in Google Scholar

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.10.1017/CBO9780511814358Search in Google Scholar

Bloom, Jesse D., Yujia Alina Chan, Ralph S. Baric, Pamela J. Bjorkman, Sarah Cobey, Benjamin E. Deverman, David N. Fisman, Ravindra Gupta, Akiko Iwasaki, Marc Lipsitch, Ruslan Medzhitov, Richard A. Neher, Rasmus Nielsen, Nick Patterson, Tim Stearns, Erik van Nimwegen, Michael Worobey & David A. Relman. 2021. Investigate the origins of COVID-19. Science 372. 694. https://doi.org/10.1126/science.abj0016.Search in Google Scholar

Chilcot, John. 2016. The report of the Iraq inquiry: Executive summary. London: Stationery Office.Search in Google Scholar

Egbert, Jesse. 2019. Corpus design and representativeness. In Tony Berber Sardinha & Marcia Veirano Pinto (eds.), Multi-dimensional analysis: Research methods and current issues, 27–42. London: Bloomsbury Academic.Search in Google Scholar

Graves, Lucas. 2017. Anatomy of a fact check: Objective practice and the contested epistemology of fact checking. Communication, Culture and Critique 10. 518–537. https://doi.org/10.1111/cccr.12163.Search in Google Scholar

Grieve, Jack & Helena Woodfield. 2023. The language of fake news (Elements in Forensic Linguistics). Cambridge: Cambridge University Press.10.1017/9781009349161Search in Google Scholar

Grieve, Jack, Biber Douglas, Eric Friginal & Tatiana Nekrasova. 2010. Variation among blogs: A multi-dimensional analysis. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the web, 303–322. New York: Springer.10.1007/978-90-481-9178-9_14Search in Google Scholar

Horne, Benjamin & Sibel Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, vol. 11, 759–766.10.1609/icwsm.v11i1.14976Search in Google Scholar

Mangal, Deepak & Dilip Kumar Sharma. 2021. A framework for detection and validation of fake news via authorize source matching. In Devendra Kumar Sharma, Le Hoang Son, Rohit Sharma & Korhan Cengiz (eds.), Micro-electronics and telecommunication engineering, 577–586. Singapore: Springer.10.1007/978-981-33-4687-1_54Search in Google Scholar

Osmundsen, Mathias, Alexander Bor, Peter Bjerregaard Vahlstrup, Anja Bechmann & Michael Bang Petersen. 2021. Partisan polarization is the primary psychological motivation behind political fake news sharing on Twitter. American Political Science Review 115(3). 999–1015. https://doi.org/10.1017/s0003055421000290.Search in Google Scholar

Põldvere, Nele, Zia Uddin & Aleena Thomas. 2023. The PolitiFact-Oslo corpus: A new dataset for fake news analysis and detection. Information 14(12). https://doi.org/10.3390/info14120627.Search in Google Scholar

Rashkin, Hannah, Eunsol Choi, Jin Yea Jang, Svitlana Volkova & Yejin Choi. 2017. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Martha Palmer, Rebecca Hwa & Sebastian Riedel (eds.), Proceedings of the 2017 conference on empirical methods in natural language processing, 2931–2937. Copenhagen: Association for Computational Linguistics.10.18653/v1/D17-1317Search in Google Scholar

Rini, Regina. 2017. Fake news and partisan epistemology. Kennedy Institute of Ethics Journal 27(2). 43–64. https://doi.org/10.1353/ken.2017.0025.Search in Google Scholar

Shu, Kai, Amy Sliva, Suhang Wang, Jiliang Tang & Huan Liu. 2017. Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter 19. 22–36.10.1145/3137597.3137600Search in Google Scholar

Shu, Kai, Suhang Wang & Huan Liu. 2019. Beyond news contents: The role of social context for fake news detection. In Proceedings of the twelfth ACM international conference on web search and data mining, 312–320. New York: Association for Computing Machinery.10.1145/3289600.3290994Search in Google Scholar

Silverman, Craig, Lauren Strapagiel, Hamza Shaban, Ellie Hall & Jeremy Singer-Vine. 2016. Hyperpartisan Facebook pages are publishing false and misleading information at an alarming rate. BuzzFeed News. 20 October 2016. https://www.buzzfeednews.com/article/craigsilverman/partisan-fb-pages-analysis (accessed 8 April 2024).Search in Google Scholar

The Guardian . 2021. Most Republicans still believe 2020 election was stolen from Trump – poll. The Guardian. 24 May 2021. Available at: https://www.theguardian.com/us-news/2021/may/24/republicans-2020-election-poll-trump-biden.Search in Google Scholar

Torabi Asr, Fatemeh & Maite Taboada. 2019. Big data and quality data for fake news and misinformation detection. Big Data & Society 6. 1–14. https://doi.org/10.1177/2053951719843310.Search in Google Scholar

Torabi Asr, Fatemeh, Mehrdad Mokhtari & Maite Taboada. 2024. Misinformation detection in news text: Automatic methods and data limitations. In Stefania Maci, Massimiliano Demata, Mark McGlashan & Philip Seargeant (eds.), The Routledge handbook of discourse and disinformation, 79–102. London: Routledge.10.4324/9781003224495-7Search in Google Scholar

Volkova, Svitlana, Kyle Shaffer, Jin Yea Jang & Nathan Hodas. 2017. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on Twitter. In Regina Barzilay & Min-Yen Kan (eds.), Proceedings of the 55th annual meeting of the association for computational linguistics, vol. 2, Short papers, 647–653. Vancouver: Association for Computational Linguistics.10.18653/v1/P17-2102Search in Google Scholar

Wang, William Yang. 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Regina Barzilay & Min-Yen Kan (eds.), Proceedings of the 55th annual meeting of the association for computational linguistics, vol. 2, Short papers, 422–426. Vancouver: Association for Computational Linguistics.10.18653/v1/P17-2067Search in Google Scholar

Zannettou, Savvas, Michael Sirivianos, Jeremy Blackburn & Nicolas Kourtellis. 2019. The web of false information: Rumors, fake news, hoaxes, clickbait, and various other shenanigans. Journal of Data and Information Quality 11(3). 1–37.10.1145/3309699Search in Google Scholar

Received: 2024-04-16
Accepted: 2025-03-05
Published Online: 2025-04-14

© 2025 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 26.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/lingvan-2024-0069/html
Scroll to top button