Abstract
Objectives
Paper mills, companies that write scientific papers and gain acceptance for them, then sell authorships of these papers, present a key challenge in medicine and other healthcare fields. This challenge is becoming more acute with artificial intelligence (AI), where AI writes the manuscripts and then the paper mills sell the authorships of these papers. The aim of the current research is to provide a method for detecting fake papers.
Methods
The method reported in this article uses a machine learning approach to create decision trees to identify fake papers. The data were collected from Web of Science and multiple journals in various fields.
Results
The article presents a method to identify fake papers based on the results of decision trees. Use of this method in a case study indicated its effectiveness in identifying a fake paper.
Conclusions
This method to identify fake papers is applicable for authors, editors, and publishers across fields to investigate a single paper or to conduct an analysis of a group of manuscripts. Clinicians and others can use this method to evaluate articles they find in a search to ensure they are not fake articles and instead report actual research that was peer reviewed prior to publication in a journal.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. There is no AI generated content in this article.
-
Competing interests: The authors declare no conflicts of interest.
-
Research funding: None declared.
-
Data availability: The raw data can be obtained on request from the corresponding author.
References
1. COPE, STM. Paper mills research report from COPE & STM [Internet]; 2022. Available from: https://doi.org/10.24318/jtbG8IHL.Search in Google Scholar
2. Abalkina, A, Bishop, D. Paper mills: a novel form of publishing malpractice affecting psychology. PsyArXiv 2022:1–24. https://doi.org/10.31234/osf.io/2yf8z.Search in Google Scholar
3. Santos-d’Amorim, K, Wang, T, Lund, B, Macedo Dos Santos, RN. From plagiarism to scientific paper mills: a profile of retracted articles within the SciELO Brazil collection. Ethics Behav 2022:1–18. https://doi.org/10.1080/10508422.2022.2141747.Search in Google Scholar
4. Day, A. Exploratory analysis of text duplication in peer-review reveals peer-review fraud and paper mills. Scientometrics 2022;127:5965–87. https://doi.org/10.1007/s11192-022-04504-5.Search in Google Scholar
5. Perez-Neri, I, Pineda, C, Sandoval, H. Threats to scholarly research integrity arising from paper mills: a rapid scoping review. Clin Rheumatol 2022;41:2241–8. https://doi.org/10.1007/s10067-022-06198-9.Search in Google Scholar PubMed
6. Byrne, JA, Park, Y, Richardson, RA, Pathmendra, P, Sun, M, Stoeger, T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022;50:12058–70. https://doi.org/10.1093/nar/gkac1139.Search in Google Scholar PubMed PubMed Central
7. Calver, M. Combatting the rise of paper mills. Pac Conserv Biol 2021;27:1–2. https://doi.org/10.1071/pcv27n1_ed.Search in Google Scholar
8. Dadkhah, M, Raja, AM, Memon, AR, Borchardt, G, Nedungadi, P, Abu-Eteen, K, et al.. A toolkit for detecting fallacious calls for papers from potential predatory journals. Adv Pharm Bull 2023;13:1–8.10.34172/apb.2023.068Search in Google Scholar
9. Dadkhah, M, Bianciardi, G. Ranking predatory journals: solve the problem instead of removing it. Adv Pharmaceut Bull 2016;6:1. https://doi.org/10.15171/apb.2016.001.Search in Google Scholar PubMed PubMed Central
10. Mathew, RP, Patel, V, Low, G. Predatory journals-The power of the predator versus the integrity of the honest. Curr Probl Diagn Radiol 2022;51:740–6. https://doi.org/10.1067/j.cpradiol.2021.07.005.Search in Google Scholar PubMed
11. Oermann, MH, Wrigley, J, Nicoll, LH, Ledbetter, LS, Carter-Templeton, H, Edie, AH. Integrity of databases for literature searches in nursing: avoiding predatory journals. Adv Nurs Sci 2021;44:102. https://doi.org/10.1097/ans.0000000000000349.Search in Google Scholar
12. Sureda‐Negre, J, Calvo‐Sastre, A, Comas‐Forgas, R. Predatory journals and publishers: characteristics and impact of academic spam to researchers in educational sciences. Learn Publ 2022;35:441–7. https://doi.org/10.1002/leap.1450.Search in Google Scholar
13. Dadkhah, M, Rahimnia, F, Darbyshire, P, Borchardt, G. Ten (Bad) reasons researchers publish their papers in hijacked journals. J Clin Nurs 2021;30:e60–3.10.1111/jocn.15947Search in Google Scholar PubMed
14. Dadkhah, M, Borchardt, G. Hijacked journals: an emerging challenge for scholarly publishing. Aesthetic Surg J 2016;36:739–41. https://doi.org/10.1093/asj/sjw026.Search in Google Scholar PubMed
15. Dadkhah, M, Lagzian, M, Borchardt, G. Questionable papers in citation databases as an issue for literature review. J Cell Commun Signal 2017;11:181–5. https://doi.org/10.1007/s12079-016-0370-6.Search in Google Scholar PubMed PubMed Central
16. Cabanac, G, Labbé, C. Prevalence of nonsensical algorithmically generated papers in the scientific literature. J Assoc Inf Sci Technol 2021;72:1461–76. https://doi.org/10.1002/asi.24495.Search in Google Scholar
17. Ali, MJ, Djalilian, A. Readership awareness series – paper 4: chatbots and ChatGPT – ethical considerations in scientific publications. Semin Ophthalmol 2023;1–2:403–4. https://doi.org/10.1016/j.jtos.2023.04.001.Search in Google Scholar PubMed
18. Gao, CA, Howard, FM, Markov, NS, Dyer, EC, Ramesh, S, Luo, Y, et al.. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. NPJ Digit Med 2023;6:1–5.10.1038/s41746-023-00819-6Search in Google Scholar PubMed PubMed Central
19. Sun, GH, Hoelscher, SH. The ChatGPT storm and what faculty can do. Nurse Educat 2023;48:119–24. https://doi.org/10.1097/nne.0000000000001390.Search in Google Scholar PubMed
20. van Dis, EA, Bollen, J, Zuidema, W, van Rooij, R, Bockting, CL. ChatGPT: five priorities for research. Nature 2023;614:224–6. https://doi.org/10.1038/d41586-023-00288-7.Search in Google Scholar PubMed
21. Gravel, J, D’Amours-Gravel, M, Osmanlliu, E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clin Proc Digital Health 2023;1;226–34.10.1016/j.mcpdig.2023.05.004Search in Google Scholar
22. Oermann. Writing for publication in nursing. New York: Springer Publishing; 2024.Search in Google Scholar
23. The retraction watch hijacked journal checker [Internet]. 2022 [cited 2023 Mar 31]. Available from: https://retractionwatch.com/the-retraction-watch-hijacked-journal-checker/.Search in Google Scholar
24. Candal-Pedreira, C, Ross, JS, Ruano-Ravina, A, Egilman, DS, Fernández, E, Pérez-Ríos, M. Retracted papers originating from paper mills: cross sectional study. BMJ 2022;379. https://doi.org/10.1136/bmj-2022-071517.Search in Google Scholar PubMed PubMed Central
25. Campos-Varela, I, Ruano-Raviña, A. Misconduct as the main cause for retraction. A descriptive study of retracted publications and their authors. Gac Sanit 2019;33:356–60. https://doi.org/10.1016/j.gaceta.2018.01.009.Search in Google Scholar PubMed
26. Martinson, BC, Anderson, MS, De Vries, R. Scientists behaving badly. Nature 2005;435:737–8. https://doi.org/10.1038/435737a.Search in Google Scholar PubMed
27. Anderson, N, Belavy, DL, Perle, SM, Hendricks, S, Hespanhol, L, Verhagen, E, et al.. AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation. BMJ Open Sport Exerc Med 2023;9:e001568. https://doi.org/10.1136/bmjsem-2023-001568.Search in Google Scholar PubMed PubMed Central
28. Stokel-Walker, C, Van Noorden, R. What ChatGPT and generative AI mean for science. Nature 2023;614:214–6. https://doi.org/10.1038/d41586-023-00340-6.Search in Google Scholar PubMed
29. El Naqa, I, Murphy, MJ. What is machine learning? In: El Naqa, I, Li, R, Murphy, MJ, editors Machine learning in radiation oncology: theory and applications [Internet]. Cham: Springer International Publishing; 2015. pp. 3–11.10.1007/978-3-319-18305-3_1Search in Google Scholar
30. Theobald, O. Machine learning for absolute beginners: a plain English introduction, 157. UK: Scatterplot press London; 2017.Search in Google Scholar
31. Weka 3: machine learning software in Java [Internet]. 2023 [cited 2023 Mar 30]. Available from: https://www.cs.waikato.ac.nz/ml/weka/.Search in Google Scholar
32. Myles, AJ, Feudale, RN, Liu, Y, Woody, NA, Brown, SD. An introduction to decision tree modeling. J Chemometr 2004;18:275–85. https://doi.org/10.1002/cem.873.Search in Google Scholar
33. Breiman, L. Classification and regression trees. New York: Routledge; 2017.10.1201/9781315139470Search in Google Scholar
34. Quinlan, JR. C4. 5: programs for machine learning. Burlington: Elsevier; 2014.Search in Google Scholar
35. Kass, GV. An exploratory technique for investigating large quantities of categorical data. J Roy Stat Soc Ser C 1980;29:119–27. https://doi.org/10.2307/2986296.Search in Google Scholar
36. Loh, WY, Shih, YS. Split selection methods for classification trees. Stat Sin 1997:815–40.Search in Google Scholar
37. Hermawan, DR, Fatihah, MFG, Kurniawati, L, Helen, A. Comparative study of J48 decision tree classification algorithm, random tree, and random forest on in-vehicle CouponRecommendation data. In: 2021 International conference on artificial intelligence and big data analytics. 2021. pp. 1–6.10.1109/ICAIBDA53487.2021.9689701Search in Google Scholar
38. Song, YY, Ying, L. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 2015;27:130. https://doi.org/10.11919/j.issn.1002-0829.215044.Search in Google Scholar PubMed PubMed Central
39. Oermann, MH, Nicoll, LH, Carter-Templeton, H, Owens, JK, Wrigley, J, Ledbetter, LS, et al.. How to identify predatory journals in a search: precautions for nurses. Nursing 2022;52:41–5. https://doi.org/10.1097/01.nurse.0000823280.93554.1a.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Reviews
- Diagnostic errors in uncommon conditions: a systematic review of case reports of diagnostic errors
- Routine blood test markers for predicting liver disease post HBV infection: precision pathology and pattern recognition
- Opinion Papers
- The challenge of clinical reasoning in chronic multimorbidity: time and interactions in the Health Issues Network model
- The first diagnostic excellence conference in Japan
- Clouds across the new dawn for clinical, diagnostic and biological data: accelerating the development, delivery and uptake of personalized medicine
- Original Articles
- Towards diagnostic excellence on academic ward teams: building a conceptual model of team dynamics in the diagnostic process
- Error codes at autopsy to study potential biases in diagnostic error
- Multicenter evaluation of a method to identify delayed diagnosis of diabetic ketoacidosis and sepsis in administrative data
- Detection of fake papers in the era of artificial intelligence
- Is language an issue? Accuracy of the German computerized diagnostic decision support system ISABEL and cross-validation with the English counterpart
- The feasibility of a mystery case curriculum to enhance diagnostic reasoning skills among medical students: a process evaluation
- Internal medicine intern performance on the gastrointestinal physical exam
- Scaling up a diagnostic pause at the ICU-to-ward transition: an exploration of barriers and facilitators to implementation of the ICU-PAUSE handoff tool
- Learned cautions regarding antibody testing in mast cell activation syndrome
- Diagnostic properties of natriuretic peptides and opportunities for personalized thresholds for detecting heart failure in primary care
- Incomplete filling of spray-dried K2EDTA evacuated blood tubes: impact on measuring routine hematological parameters on Sysmex XN-10
- Letters to the Editor
- The diagnostic accuracy of AI-based predatory journal detectors: an analogy to diagnosis
- Explainable AI for gut microbiome-based diagnostics: colorectal cancer as a case study
- Restless X syndrome: a new diagnostic family of nocturnal, restless, abnormal sensations of various body parts
- Erratum
- Retraction of: Establishing a stable platform for the measurement of blood endotoxin levels in the dialysis population
Articles in the same Issue
- Frontmatter
- Reviews
- Diagnostic errors in uncommon conditions: a systematic review of case reports of diagnostic errors
- Routine blood test markers for predicting liver disease post HBV infection: precision pathology and pattern recognition
- Opinion Papers
- The challenge of clinical reasoning in chronic multimorbidity: time and interactions in the Health Issues Network model
- The first diagnostic excellence conference in Japan
- Clouds across the new dawn for clinical, diagnostic and biological data: accelerating the development, delivery and uptake of personalized medicine
- Original Articles
- Towards diagnostic excellence on academic ward teams: building a conceptual model of team dynamics in the diagnostic process
- Error codes at autopsy to study potential biases in diagnostic error
- Multicenter evaluation of a method to identify delayed diagnosis of diabetic ketoacidosis and sepsis in administrative data
- Detection of fake papers in the era of artificial intelligence
- Is language an issue? Accuracy of the German computerized diagnostic decision support system ISABEL and cross-validation with the English counterpart
- The feasibility of a mystery case curriculum to enhance diagnostic reasoning skills among medical students: a process evaluation
- Internal medicine intern performance on the gastrointestinal physical exam
- Scaling up a diagnostic pause at the ICU-to-ward transition: an exploration of barriers and facilitators to implementation of the ICU-PAUSE handoff tool
- Learned cautions regarding antibody testing in mast cell activation syndrome
- Diagnostic properties of natriuretic peptides and opportunities for personalized thresholds for detecting heart failure in primary care
- Incomplete filling of spray-dried K2EDTA evacuated blood tubes: impact on measuring routine hematological parameters on Sysmex XN-10
- Letters to the Editor
- The diagnostic accuracy of AI-based predatory journal detectors: an analogy to diagnosis
- Explainable AI for gut microbiome-based diagnostics: colorectal cancer as a case study
- Restless X syndrome: a new diagnostic family of nocturnal, restless, abnormal sensations of various body parts
- Erratum
- Retraction of: Establishing a stable platform for the measurement of blood endotoxin levels in the dialysis population