Detection of fake papers in the era of artificial intelligence

Mehdi Dadkhah; Marilyn H. Oermann; Mihály Hegedüs; Raghu Raman; Lóránt Dénes Dávid

doi:10.1515/dx-2023-0090

Article

Detection of fake papers in the era of artificial intelligence

Mehdi Dadkhah , Marilyn H. Oermann , Mihály Hegedüs , Raghu Raman and Lóránt Dénes Dávid

Published/Copyright: August 17, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Diagnosis Volume 10 Issue 4

Abstract

Objectives

Paper mills, companies that write scientific papers and gain acceptance for them, then sell authorships of these papers, present a key challenge in medicine and other healthcare fields. This challenge is becoming more acute with artificial intelligence (AI), where AI writes the manuscripts and then the paper mills sell the authorships of these papers. The aim of the current research is to provide a method for detecting fake papers.

Methods

The method reported in this article uses a machine learning approach to create decision trees to identify fake papers. The data were collected from Web of Science and multiple journals in various fields.

Results

The article presents a method to identify fake papers based on the results of decision trees. Use of this method in a case study indicated its effectiveness in identifying a fake paper.

Conclusions

This method to identify fake papers is applicable for authors, editors, and publishers across fields to investigate a single paper or to conduct an analysis of a group of manuscripts. Clinicians and others can use this method to evaluate articles they find in a search to ensure they are not fake articles and instead report actual research that was peer reviewed prior to publication in a journal.

Keywords: fake paper; hijacked journals; machine learning; paper mills; predatory journals

Corresponding author: Mehdi Dadkhah, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India; and Technology Forecasting Department, SnowaTec Technology Center and Innovation Factory, Entekhab Industrial Group, Isfahan, Iran, E-mail: d_mehdi@av.amrita.edu

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. There is no AI generated content in this article.
Competing interests: The authors declare no conflicts of interest.
Research funding: None declared.
Data availability: The raw data can be obtained on request from the corresponding author.

References

1. COPE, STM. Paper mills research report from COPE & STM [Internet]; 2022. Available from: https://doi.org/10.24318/jtbG8IHL.Search in Google Scholar

2. Abalkina, A, Bishop, D. Paper mills: a novel form of publishing malpractice affecting psychology. PsyArXiv 2022:1–24. https://doi.org/10.31234/osf.io/2yf8z.Search in Google Scholar

3. Santos-d’Amorim, K, Wang, T, Lund, B, Macedo Dos Santos, RN. From plagiarism to scientific paper mills: a profile of retracted articles within the SciELO Brazil collection. Ethics Behav 2022:1–18. https://doi.org/10.1080/10508422.2022.2141747.Search in Google Scholar

4. Day, A. Exploratory analysis of text duplication in peer-review reveals peer-review fraud and paper mills. Scientometrics 2022;127:5965–87. https://doi.org/10.1007/s11192-022-04504-5.Search in Google Scholar

5. Perez-Neri, I, Pineda, C, Sandoval, H. Threats to scholarly research integrity arising from paper mills: a rapid scoping review. Clin Rheumatol 2022;41:2241–8. https://doi.org/10.1007/s10067-022-06198-9.Search in Google Scholar PubMed

6. Byrne, JA, Park, Y, Richardson, RA, Pathmendra, P, Sun, M, Stoeger, T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022;50:12058–70. https://doi.org/10.1093/nar/gkac1139.Search in Google Scholar PubMed PubMed Central

7. Calver, M. Combatting the rise of paper mills. Pac Conserv Biol 2021;27:1–2. https://doi.org/10.1071/pcv27n1_ed.Search in Google Scholar

8. Dadkhah, M, Raja, AM, Memon, AR, Borchardt, G, Nedungadi, P, Abu-Eteen, K, et al.. A toolkit for detecting fallacious calls for papers from potential predatory journals. Adv Pharm Bull 2023;13:1–8.10.34172/apb.2023.068Search in Google Scholar

9. Dadkhah, M, Bianciardi, G. Ranking predatory journals: solve the problem instead of removing it. Adv Pharmaceut Bull 2016;6:1. https://doi.org/10.15171/apb.2016.001.Search in Google Scholar PubMed PubMed Central

10. Mathew, RP, Patel, V, Low, G. Predatory journals-The power of the predator versus the integrity of the honest. Curr Probl Diagn Radiol 2022;51:740–6. https://doi.org/10.1067/j.cpradiol.2021.07.005.Search in Google Scholar PubMed

11. Oermann, MH, Wrigley, J, Nicoll, LH, Ledbetter, LS, Carter-Templeton, H, Edie, AH. Integrity of databases for literature searches in nursing: avoiding predatory journals. Adv Nurs Sci 2021;44:102. https://doi.org/10.1097/ans.0000000000000349.Search in Google Scholar

12. Sureda‐Negre, J, Calvo‐Sastre, A, Comas‐Forgas, R. Predatory journals and publishers: characteristics and impact of academic spam to researchers in educational sciences. Learn Publ 2022;35:441–7. https://doi.org/10.1002/leap.1450.Search in Google Scholar

13. Dadkhah, M, Rahimnia, F, Darbyshire, P, Borchardt, G. Ten (Bad) reasons researchers publish their papers in hijacked journals. J Clin Nurs 2021;30:e60–3.10.1111/jocn.15947Search in Google Scholar PubMed

14. Dadkhah, M, Borchardt, G. Hijacked journals: an emerging challenge for scholarly publishing. Aesthetic Surg J 2016;36:739–41. https://doi.org/10.1093/asj/sjw026.Search in Google Scholar PubMed

15. Dadkhah, M, Lagzian, M, Borchardt, G. Questionable papers in citation databases as an issue for literature review. J Cell Commun Signal 2017;11:181–5. https://doi.org/10.1007/s12079-016-0370-6.Search in Google Scholar PubMed PubMed Central

16. Cabanac, G, Labbé, C. Prevalence of nonsensical algorithmically generated papers in the scientific literature. J Assoc Inf Sci Technol 2021;72:1461–76. https://doi.org/10.1002/asi.24495.Search in Google Scholar

17. Ali, MJ, Djalilian, A. Readership awareness series – paper 4: chatbots and ChatGPT – ethical considerations in scientific publications. Semin Ophthalmol 2023;1–2:403–4. https://doi.org/10.1016/j.jtos.2023.04.001.Search in Google Scholar PubMed

18. Gao, CA, Howard, FM, Markov, NS, Dyer, EC, Ramesh, S, Luo, Y, et al.. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. NPJ Digit Med 2023;6:1–5.10.1038/s41746-023-00819-6Search in Google Scholar PubMed PubMed Central

19. Sun, GH, Hoelscher, SH. The ChatGPT storm and what faculty can do. Nurse Educat 2023;48:119–24. https://doi.org/10.1097/nne.0000000000001390.Search in Google Scholar PubMed

20. van Dis, EA, Bollen, J, Zuidema, W, van Rooij, R, Bockting, CL. ChatGPT: five priorities for research. Nature 2023;614:224–6. https://doi.org/10.1038/d41586-023-00288-7.Search in Google Scholar PubMed

21. Gravel, J, D’Amours-Gravel, M, Osmanlliu, E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clin Proc Digital Health 2023;1;226–34.10.1016/j.mcpdig.2023.05.004Search in Google Scholar

22. Oermann. Writing for publication in nursing. New York: Springer Publishing; 2024.Search in Google Scholar

23. The retraction watch hijacked journal checker [Internet]. 2022 [cited 2023 Mar 31]. Available from: https://retractionwatch.com/the-retraction-watch-hijacked-journal-checker/.Search in Google Scholar

24. Candal-Pedreira, C, Ross, JS, Ruano-Ravina, A, Egilman, DS, Fernández, E, Pérez-Ríos, M. Retracted papers originating from paper mills: cross sectional study. BMJ 2022;379. https://doi.org/10.1136/bmj-2022-071517.Search in Google Scholar PubMed PubMed Central

25. Campos-Varela, I, Ruano-Raviña, A. Misconduct as the main cause for retraction. A descriptive study of retracted publications and their authors. Gac Sanit 2019;33:356–60. https://doi.org/10.1016/j.gaceta.2018.01.009.Search in Google Scholar PubMed

26. Martinson, BC, Anderson, MS, De Vries, R. Scientists behaving badly. Nature 2005;435:737–8. https://doi.org/10.1038/435737a.Search in Google Scholar PubMed

27. Anderson, N, Belavy, DL, Perle, SM, Hendricks, S, Hespanhol, L, Verhagen, E, et al.. AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation. BMJ Open Sport Exerc Med 2023;9:e001568. https://doi.org/10.1136/bmjsem-2023-001568.Search in Google Scholar PubMed PubMed Central

28. Stokel-Walker, C, Van Noorden, R. What ChatGPT and generative AI mean for science. Nature 2023;614:214–6. https://doi.org/10.1038/d41586-023-00340-6.Search in Google Scholar PubMed

29. El Naqa, I, Murphy, MJ. What is machine learning? In: El Naqa, I, Li, R, Murphy, MJ, editors Machine learning in radiation oncology: theory and applications [Internet]. Cham: Springer International Publishing; 2015. pp. 3–11.10.1007/978-3-319-18305-3_1Search in Google Scholar

30. Theobald, O. Machine learning for absolute beginners: a plain English introduction, 157. UK: Scatterplot press London; 2017.Search in Google Scholar

31. Weka 3: machine learning software in Java [Internet]. 2023 [cited 2023 Mar 30]. Available from: https://www.cs.waikato.ac.nz/ml/weka/.Search in Google Scholar

32. Myles, AJ, Feudale, RN, Liu, Y, Woody, NA, Brown, SD. An introduction to decision tree modeling. J Chemometr 2004;18:275–85. https://doi.org/10.1002/cem.873.Search in Google Scholar

33. Breiman, L. Classification and regression trees. New York: Routledge; 2017.10.1201/9781315139470Search in Google Scholar

34. Quinlan, JR. C4. 5: programs for machine learning. Burlington: Elsevier; 2014.Search in Google Scholar

35. Kass, GV. An exploratory technique for investigating large quantities of categorical data. J Roy Stat Soc Ser C 1980;29:119–27. https://doi.org/10.2307/2986296.Search in Google Scholar

36. Loh, WY, Shih, YS. Split selection methods for classification trees. Stat Sin 1997:815–40.Search in Google Scholar

37. Hermawan, DR, Fatihah, MFG, Kurniawati, L, Helen, A. Comparative study of J48 decision tree classification algorithm, random tree, and random forest on in-vehicle CouponRecommendation data. In: 2021 International conference on artificial intelligence and big data analytics. 2021. pp. 1–6.10.1109/ICAIBDA53487.2021.9689701Search in Google Scholar

38. Song, YY, Ying, L. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 2015;27:130. https://doi.org/10.11919/j.issn.1002-0829.215044.Search in Google Scholar PubMed PubMed Central

39. Oermann, MH, Nicoll, LH, Carter-Templeton, H, Owens, JK, Wrigley, J, Ledbetter, LS, et al.. How to identify predatory journals in a search: precautions for nurses. Nursing 2022;52:41–5. https://doi.org/10.1097/01.nurse.0000823280.93554.1a.Search in Google Scholar

Received: 2023-07-18

Accepted: 2023-08-02

Published Online: 2023-08-17

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/dx-2023-0090

Keywords for this article

fake paper; hijacked journals; machine learning; paper mills; predatory journals