The Detection of ChatGPT’s Textual Crumb Trails is an Unsustainable Solution to Imperfect Detection Methods

Jaime A. Teixeira da Silva

doi:10.1515/opis-2024-0007

Article Open Access

The Detection of ChatGPT’s Textual Crumb Trails is an Unsustainable Solution to Imperfect Detection Methods

Jaime A. Teixeira da Silva

Published/Copyright: August 19, 2024

Published by

Become an author with De Gruyter Brill

Author Information Explore this Subject

From the journal Open Information Science Volume 8 Issue 1

Abstract

A recent disruptive innovation to scientific publishing is OpenAI’s ChatGPT, a large language model. The International Committee of Medical Journal Editors and COPE, and COPE member journals or publishers, set limitations to ChatGPT’s involvement in academic writing, requesting authors to declare its use. Those guidelines are practically useless because they ignore two fundamentals: first, academics who cheat to achieve success will not declare the use of ChatGPT; second, they fail to explicitly assign the responsibility of detection to editors, journals, and publishers. Using two primers, i.e., residual text that may reflect traces of ChatGPT’s output but that authors may have forgotten to remove from their articles, this commentary draws readers’ attention to 46 open-access examples sourced from PubPeer. Even though editors should be obliged to investigate such cases, a primer-based detection of ChatGPT’s textual crumb trails is only a temporary measure and not a sustainable solution because it relies on the detection of carelessness.

Keywords: AI; human editing; large language models; productivity; publish or perish

1 Ethical Limitations of Using ChatGPT, and Associated Risks in Academic Publishing

OpenAI’s large language model (LLM) Generative Pre-trained Transformer (ChatGPT) received a very positive welcome in academia, aspects that are not covered herein because they are numerous and self-explanatory, but it has also been the subject of some criticism, as is focused in this article. ChatGPT is a disruptive technology because it upends established norms regarding human-based innovation (Sardana, Fagan, & Wright, 2023) for four reasons. First, ChatGPT is an artificial intelligence (AI)-driven tool that is, through training, only able to recreate newly phrased ideas from a finite body of already-existing information that humans initially created; i.e., it is not a source of originality or intelligence per se, in the humanistic interpretation of these terms (Floridi, 2023). Second, older versions like ChatGPT-3 might not perform remarkably well, including the introduction of errors, such as to references (Walters & Wilder, 2023), known colloquially as “hallucinations” (Alkaissi & McFarlane, 2023; Azamfirei, Kudchadkar, & Fackler, 2023). Third, some consider these errors as “fabrications and falsifications” (Emsley, 2023). Finally, humans with innovative thinking may feel threatened by ChatGPT’s “originality,” specifically the fear of their human skills being substituted by AI. In an academic setting, scholars who decide to rely exclusively on their own human endeavor may feel threatened by competing scholars who rely on this technology, or fail to declare its use transparently, and thus feign originality.

Such concerns and threats were captured quite rapidly by the academic community following the use, starting in late 2022, of ChatGPT-3, leading publishers and globalist organizations dedicated to ethics policy, like the Committee on Publication Ethics COPE (COPE, 2023) and the International Committee of Medical Journal Editors (ICMJE) (ICMJE, 2024), to quickly implement policies and guidelines that did not ban the use of ChatGPT in academic articles, but instead placed limitations, such as the illegitimate authorship of ChatGPT (Nazarovets & Teixeira da Silva, 2024) or of those who claimed to be authors of text that was “created” by this LLM. Instead, COPE and the ICMJE opted for policies that require authors to declare and acknowledge the use of AI/LLMs, indicate the exact text derived from it, and specify in the methodology how it was used (Teixeira da Silva & Tsigaris, 2023; Teixeira da Silva, 2023a).

However, close scrutiny of the COPE “position” statement and the ICMJE recommendations reveals statements that carry limited practical weight. This is because, despite the ethical guidance and advice to over 14,100 member journals and publishers in the case of COPE (COPE, 2024), there is a distinct flaw in those guidelines: they fail to indicate the agent of responsibility when it comes to detecting the use of AI/LLMs or text derived from their use (e.g., in the case of ChatGPT). It has been fervently argued that the responsibility of detection lies squarely on the shoulders of journal editors (Kaebnick et al., 2023; Teixeira da Silva, 2023b). Even if there is currently no effective method to detect ChatGPT-derived text, this does not absolve editors (or publishers) of their responsibility to detect it. Due to the absence of efficient detection methods, yet aware of the risks of the undeclared use of AI/LLMs, the publishing industry is still unclear on how to deal with this threat to originality, or how to detect ChatGPT-derived text. In the meantime, it is likely that a body of unscrupulous authors who do not respect publishing ethics, or who may be willing to go to any length to cheat as a way to advance their careers, might employ ChatGPT opaquely to write or edit text in their articles, not being able to complete this task effectively on their own (Kendall & Teixeira da Silva, 2024). Absent the ability to effectively detect such text, it may be premature to claim that the use of AI/LLMs will liberalize academia and diversify participation. Consequently, its use in diversity, equity and inclusion policies (Abdelhalim, Anazodo, Gali, & Robson, 2024) is cautioned.

Since scientific writing is a fundamental aspect of academic publishing, the latter is experiencing a unique crisis of trust because of AI/LLMs because the globalist ethical leadership (ICMJE, COPE, editors, etc.) is acting reactively rather than preemptively. Thus, by the time a solution is found, LLM-derived text may have already populated and tainted hundreds or even thousands of articles, despite having gone through peer review and editorial screening, because peer reviewers and editors were not provided any tools by publishers to detect LLM-derived text. When efficient detection methods are eventually found, authors who violated established codes of conduct (e.g., in journals claiming to follow the ethical guiding principles of COPE and the ICMJE) will need to be held ethically accountable.

2 Two Primers to Detect the Potential Undeclared Use of ChatGPT

This article highlights an unsustainable detection method to compensate for gaps in the detection of LLM-derived text, specifically the detection of ChatGPT’s textual “crumbs”, an issue that received some media attention (Conroy, 2023; Retraction Watch, 2023). This method is considered minimalistic because it only captures instances where authors may have been careless in removing traces of ChatGPT-derived textual output from their article’s text. In other words, this method does not provide a reliable, replicable, or even long-term solution, merely only one way to detect possible ChatGPT-derived text, and thus serves as a way to identify authors who might not have been sufficiently careful, honest, or transparent about their use of ChatGPT or other AI/LLMs. However, since evidence must be irrefutable to avoid the false accusation of scholars (Gorichanaz, 2023), this places a heavy burden of responsibility on the management agents of the publishing industry (journal editors, copyeditors, and publishers) to devise or use effective detection tools.

Authors who might not be satisfied with ChatGPT’s output and would like it to regenerate another or alternative response might press the “Regenerate response” button (Baumgartner, 2023). However, this may result in different outputs (Beutel, Geerits, & Kielstein, 2023). The existence of this text (i.e., “regenerate response”) serves as the first primer – in the context of this article, primer means possible detection method – to detect ChatGPT-derived text in articles. The second primer, “as an AI language model”, is typically part of ChatGPT’s response when it explains the limitations of what information it can offer. In the case of both primers, authors’ failure to delete such ChatGPT-derived text leaves this residual trace in their articles. However, as noted above, this is not an absolute certainty. Only a confession by authors would serve as absolute confirmation. Using PubPeer as the source of information, Table 1 lists several examples of open-access articles in which the two primers were detected.

Table 1

Instances of text that is likely derived from the use of ChatGPT in indexed, open-access peer-reviewed articles or preprints¹

Primer 1: “Regenerate response”
DOI	Location in article (page)
http://dx.doi.org/10.4108/eai.26-5-2023.2337335	5
https://doi.org/10.3390/toxins15030199	5 (corrected: https://doi.org/10.3390/toxins15060399)
https://doi.org/10.34087/cbusbed.1284455	131
https://doi.org/10.1016/j.nlp.2023.100027	12 (text corrected or modified)²
https://doi.org/10.48550/arXiv.2307.01931*	5
http://dx.doi.org/10.2139/ssrn.4476426*	14
https://doi.org/10.26434/chemrxiv-2023-6d7r9*	14 (primer removed from version 2: https://doi.org/10.26434/chemrxiv-2023-6d7r9-v2)
https://doi.org/10.33545/2707661X.2023.v4.i1a.60	3
https://www.doi.org/10.56726/IRJMETS39138	3,613
https://doi.org/10.47191/ijmra/v6-i5-26	2,054
https://doi.org/10.31219/osf.io/ksfet*	1³
http://dx.doi.org/10.2139/ssrn.4439049*	18
https://doi.org/10.32996/ijllt.2023.6.5.3	21
https://doi.org/10.47750/jptcp.2023.30.09.032	e317
https://doi.org/10.21203/rs.3.rs-2755735/v1*	Unknown²
https://doi.org/10.3934/era.2023144	2,863
https://doi.org/10.58712/jerel.v2i1.11	6
https://zenodo.org/doi/10.5281/zenodo.7984688	301
https://doi.org/10.20372/ajec.2023.v3.i1.810	22
https://doi.org/10.48550/arXiv.2310.18839*	2
https://doi.org/10.1371/journal.pone.0298220	17 (retracted)
https://doi.org/10.3126/rcab.v2i1.57649	162
https://doi.org/10.48550/arXiv.2307.01931*	5
https://doi.org/10.3390/su15064920	Peer report (deleted: https://www.mdpi.com/2071-1050/15/6/4920/review_report)
https://doi.org/10.33222/jaladri.v9i1.2794	17 (disguised as white font)
https://doi.org/10.58712/jerel.v2i1.11	6 (apparently silently/opaquely corrected)
https://doi.org/10.36227/techrxiv.21691934.v1*	1
https://doi.org/10.3390/ijms241411691	Peer 3 report: https://www.mdpi.com/1422-0067/24/14/11691/review_report
https://doi.org/10.31219/osf.io/szwqt*	Preprint apparently silently retracted
https://doi.org/10.1088/1402-4896/aceb40	3 (retracted)

Primer 2: “as an AI language model”
DOI	Location in article (page + quote)
https://doi.org/10.21608/ijhlr.2023.215933.1013	20 “As an AI language model, I don’t have direct access to the most recent studies.”
https://doi.org/10.21608/ijelr.2023.215525.1006	89 “As an AI language model, I don’t have real-time access to current studies.”
https://doi.org/10.21608/ijaecr.2023.216401.1019	83 “As an AI language model, I don’t have direct access to current research articles or studies.”
https://doi.org/10.21608/ijmae.2023.215953.1013 (inaccessible at time of sampling)	45 “As an AI language model, I don’t have direct access to current research articles or studies.”
https://doi.org/10.31219/osf.io/u7jen* (retracted/withdrawn: https://osf.io/preprints/osf/u7jen)	5 “As an AI language model, I do not have personal beliefs or emotions, but I can provide evidence-based information about…”
https://doi.org/10.36948/ijfmr.2023.v05i02.2503	8 “As an AI language model, I cannot predict the future.”
https://doi.org/10.17605/OSF.IO/W5GBF* (retracted/withdrawn: https://osf.io/w5gbf/)	3 “As an AI language model, I am unable to provide an article that promotes or markets a product.”
https://doi.org/10.32897/ajib.2023.2.1.2061	2 “Please note that I am an AI language model, and the statements above reflect general information.”
https://doi.org/10.17762/jaz.v44iS6.2594	1,707 “I apologize for the confusion, but as an AI language model, I don’t have access to specific articles or their sections”
https://doi.org/10.1016/j.radcr.2024.02.037 (removed)	2,110 “I’m very sorry, but I don’t have access to real-time information or patient-specific data, as I am an AI language model.”
https://doi.org/10.21608/ijaecr.2023.216221.1017 (inaccessible at time of sampling)	Unclear “As an AI language model, I don’t have real-time access to the internet or the ability to browse recent studies.”
https://doi.org/10.26577/JAPJ.2023.v106.i2.06	58 “As an AI language model, I can provide you with an outline of the key areas to consider when conducting research on”
https://doi.org/10.21608/ijaecr.2023.216400.1018 (inaccessible at time of sampling)	Unclear “As an AI language model, I don’t have real-time access to current research or studies.”
https://doi.org/10.46632/cellrm/2/2/1	3 “As an AI language model, I don’t have personal experiences or emotions, so I don’t have the ability to feel satisfaction or dissatisfaction with.”; “As an AI language model, I don’t have personal opinions or the ability to provide subjective ratings.”; 4 “As an AI language model, I don’t attend online classes or have personal experiences.”
https://doi.org/10.31219/osf.io/g42uv* (retracted/withdrawn: https://osf.io/g42uv)	4 “As an AI language model, I do not have access to specific research or data to provide a comprehensive result and discussion section for your research paper on”
https://doi.org/10.21608/ijaecr.2023.216401.1019 (inaccessible at time of sampling)	Unclear “As an AI language model, I don’t have direct access to current research articles or studies.”

¹Examples drawn from PubPeer using these verbatim primers (searches 7–12 October 2023; updated search: 6 May 2024).

²Although there is evidence, in the form of a screenshot, at PubPeer, the published article no longer carries this primer; Elsevier appears to have made a stealth (undeclared) correction.

³An apparent duplicate of another preprint with the exact same tile but a different author, and now “silently” retracted: https://osf.io/szwqt/.

^*Preprint.

The vast majority of the 46 examples in Table 1 are in journals that claim to be peer-reviewed, but 13/46 (28%) of cases were in preprints. To hold authors to equal ethical standards, preprints need to be treated in the same manner as peer-reviewed articles (Teixeira da Silva, 2022).

What is the likelihood that such text was generated by ChatGPT? Since the wording in both primers is very context-specific, it is unlikely that an author writing about a thematically specific topic (e.g., agriculture) would use these primers’ terms or expressions. The second clue that suggests that such text may represent vestigial ChatGPT-derived text is their random placement in odd locations of the text.

3 Ethical Repercussions and Epistemic Limitations

Editors of journals whose articles carry these primers need to question the authors regarding the use of ChatGPT, in much the same way that any ethical investigation regarding the integrity of a article would take place. Where authors confess their undeclared use of AI/LLMs, confirming that the primers served their purpose in detecting the undeclared use of ChatGPT, is this an ethical breach? At a basic level, there is the issue of a false claim, i.e., claiming originality when in fact, “originality” was derived from the assistance of an LLM. The second ethical argument is that such authors have failed to declare the use of an LLM (ChatGPT in this case); i.e., they have hidden a methodological truth, namely their reliance on an LLM to help generate text. Based on these arguments, journals claiming to follow COPE guidelines or ICMJE recommendations would then need to consider, depending on the level of reliance on ChatGPT, whether a correction or retraction is merited.

Due to the absence of a confession of the undeclared use of AI/LLMs by authors, do journal editors currently have at their disposal robust tools to verify whether text is de facto LLM/ChatGPT-derived? How will editors determine whether the cases in Table 1 were due to the undeclared use of ChatGPT? To avoid false accusations and thus potentially reputation-damaging litigious resolutions, the for-profit publishing industry should be tasked with seeking a reliable tool to detect LLM-derived text.

4 Limitations and Future Research Hypotheses

This article, as well as the primer-based method of detection, has some limitations. Despite the presence of either primer in articles listed in Table 1, absence of the confession by authors of their undeclared use of ChatGPT, resulting in text in their articles that was not written or edited by humans, and absence of irrefutable proof of the volume of text that was derived from ChatGPT, it cannot be stated with absolute certainty that the authors are guilty of unethical behavior or that they are in violation of established codes of conduct. Consequently, until reliable and robust methods are devised that can effectively detect LLM-derived text, such cases may remain in an ethically indefinite unresolved state. The cases in Table 1 relied exclusively on searches using PubPeer, focusing exclusively on open access to allow for public and open verification. However, those cases likely represent only a fraction of a larger pool of articles whose authors may have employed LLMs, so a more extensive analysis would be needed using scientific databases (Scopus, Web of Science, and PubMed) and Google Scholar to appreciate if these two ChatGPT textual primers exist in indexed and grey literature. Additional primers that might represent residual ChatGPT-derived text also need to be investigated.

Author contribution: The author confirms the sole responsibility for the conception of the study, presented results and manuscript preparation.
Conflict of interest: The author declares no conflicts of interest. No text was written by ChatGPT or any other AI/LLM.

References

Abdelhalim, E., Anazodo, K. S., Gali, N., & Robson, K. (2024). A framework of diversity, equity, and inclusion safeguards for chatbots. Business Horizons, in press. doi: 10.1016/j.bushor.2024.03.003.Search in Google Scholar

Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus, 15(2), e35179. doi: 10.7759/cureus.35179.Search in Google Scholar

Azamfirei, R., Kudchadkar, S. R., & Fackler, J. (2023). Large language models and the perils of their hallucinations. Critical Care, 27, 120. doi: 10.1186/s13054-023-04393-x.Search in Google Scholar

Baumgartner, C. (2023). The potential impact of ChatGPT in clinical and translational medicine. Clinical and Translational Medicine, 13(3), e1206. doi: 10.1002/ctm2.1206.Search in Google Scholar

Beutel, G., Geerits, E., & Kielstein, J. T. (2023). Artificial hallucination: GPT on LSD? Critical Care, 27, 148. doi: 10.1186/s13054-023-04425-6.Search in Google Scholar

Conroy, G. (2023). Scientific sleuths spot dishonest ChatGPT use in papers. Nature, news. doi: 10.1038/d41586-023-02477-w.Search in Google Scholar

COPE. (2023). COPE position statement. https://publicationethics.org/cope-position-statements/ai-author (13 February 2023; last accessed: 22 May 2024).Search in Google Scholar

COPE. (2024). Members. https://publicationethics.org/members (last accessed: 22 May 2024).Search in Google Scholar

Emsley, R. (2023). ChatGPT: These are not hallucinations – they’re fabrications and falsifications. Schizophrenia, 9, 52. doi: 10.1038/s41537-023-00379-4.Search in Google Scholar

Floridi, L. (2023). AI as agency without intelligence: On ChatGPT, large language models, and other generative models. Philosophy & Technology, 36, 15. doi: 10.1007/s13347-023-00621-y.Search in Google Scholar

Gorichanaz, T. (2023). Accused: How students respond to allegations of using ChatGPT on assessments. Learning: Research and Practice, 9(2), 183–196. doi: 10.1080/23735082.2023.2254787.Search in Google Scholar

ICMJE. (2024). Recommendations. https://www.icmje.org/recommendations/(January 2024; last accessed: 22 May 2024).Search in Google Scholar

Kaebnick, G. E., Magnus, D. C., Kao, A., Hosseini, M., Resnik, D., Dubljević, V., … Cherry, M. J. (2023). Editors’ statement on the responsible use of generative AI technologies in scholarly journal publishing. The Hastings Center Report, 53(5), 3–6. doi: 10.1002/hast.1507.Search in Google Scholar

Kendall, G., & Teixeira da Silva, J. A. (2024). Risks of abuse of large language models, like ChatGPT, in scientific publishing: Authorship, predatory publishing, and paper mills. Learned Publishing, 37(1), 55–62. doi: 10.1002/leap.1578.Search in Google Scholar

Nazarovets, S., & Teixeira da Silva, J. A. (2024). ChatGPT as an “author”: Bibliometric analysis to assess invalid authorship. Accountability in Research, in press. doi: 10.1080/08989621.2024.2345713.Search in Google Scholar

Retraction Watch. (2023). Signs of undeclared ChatGPT use in papers mounting. https://retractionwatch.com/2023/10/06/signs-of-undeclared-chatgpt-use-in-papers-mounting/(6 October 2023; last accessed: 22 May 2024).Search in Google Scholar

Sardana, D., Fagan, T. R., & Wright, J. T. (2023). ChatGPT: A disruptive innovation or disrupting innovation in academia? Journal of the American Dental Association, 154(5), 361–364. doi: 10.1016/j.adaj.2023.02.008.Search in Google Scholar

Teixeira da Silva, J. A. (2022). Should preprints and peer-reviewed papers be assigned equal status? Journal of Visceral Surgery, 159(5), 444–445. doi: 10.1016/j.jviscsurg.2022.08.002.Search in Google Scholar

Teixeira da Silva, J. A. (2023a). Is ChatGPT a valid author? Nurse Education in Practice, 68, 103600. doi: 10.1016/j.nepr.2023.103600.Search in Google Scholar

Teixeira da Silva, J. A. (2023b). ChatGPT: Detection in academic journals is editors’ and publishers’ responsibilities. Annals of Biomedical Engineering, 51(10), 2103–2104. doi: 10.1007/s10439-023-03247-5.Search in Google Scholar

Teixeira da Silva, J. A., & Tsigaris, P. (2023). Human- and AI-based authorship: Principles and ethics. Learned Publishing, 36(3), 453–462. doi: 10.1002/leap.1547.Search in Google Scholar

Walters, W. H., & Wilder, E. I. (2023). Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports, 13, 14045. doi: 10.1038/s41598-023-41032-5.Search in Google Scholar

Received: 2023-11-06

Revised: 2024-05-22

Accepted: 2024-07-31

Published Online: 2024-08-19

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/opis-2024-0007

Keywords for this article

AI; human editing; large language models; productivity; publish or perish

Creative Commons

BY 4.0