Startseite A tailored fit that doesn’t fit all: the problem of threshold overfitting in diagnostic studies
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

A tailored fit that doesn’t fit all: the problem of threshold overfitting in diagnostic studies

  • Javier Arredondo Montero ORCID logo EMAIL logo
Veröffentlicht/Copyright: 16. September 2025
Diagnosis
Aus der Zeitschrift Diagnosis

Abstract

Objectives

To critically examine the phenomenon of threshold overfitting in diagnostic accuracy research and evaluate its methodological implications through a structured review of relevant literature.

Methods

This article presents a narrative and critical review of methodological studies and reporting guidelines related to threshold selection in diagnostic test accuracy. It focuses on the misuse of post hoc thresholds, the misapplication of bias assessment tools such as QUADAS-2, and the frequent absence of independent validation. In addition to identifying these structural flaws, the article proposes a set of five concrete safeguards – ranging from transparent reporting to rigorous risk of bias classification – designed to mitigate threshold-related bias in future diagnostic studies.

Results

Thresholds are frequently derived and evaluated within the same dataset, inflating sensitivity and specificity estimates. This overfitting is seldom acknowledged and is often misclassified as low risk of bias. QUADAS-2 is frequently misapplied, with reviewers mistaking the mere presence of a threshold for proper pre-specification. The article identifies five key safeguards to mitigate this bias: (1) clear declaration of pre-specification, (2) justification of threshold choice, (3) independent validation, (4) full performance reporting across thresholds, and (5) rigorous application of bias assessment tools.

Conclusions

Threshold overfitting remains an underrecognized but methodologically critical source of bias in diagnostic accuracy studies. Addressing it requires more than awareness – it demands transparent reporting, proper validation, and stricter adherence to methodological standards.


Corresponding author: Javier Arredondo Montero, MD, PhD, Department of Pediatric Surgery, Complejo Asistencial Universitario de León, c/Altos de Nava s/n, 24008, Castilla y León, León, Spain, E-mail:

  1. Research ethics: Not applicable. This study did not involve human or animal subjects, and therefore, IRB approval was not sought.

  2. Informed consent: Not applicable. This study did not involve the participation of human subjects, and therefore, informed consent was not required.

  3. Author contributions: JAM: Conceptualization and study design; literature search and selection; investigation; methodology; project administration; resources; validation; visualization; writing – original draft; writing – review and editing. The author has accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: The author used ChatGPT (OpenAI, San Francisco, CA) to assist with linguistic refinement and stylistic editing of the manuscript. No content generation, data analysis, or substantive intellectual contribution was performed by the tool. All scientific ideas, methodological content, and critical arguments were developed independently by the author.

  5. Conflict of interest: The author states no conflict of interest.

  6. Research funding: None declared.

  7. Data availability: Not applicable.

References

1. Cohen, JF, Korevaar, DA, Altman, DG, Bruns, DE, Gatsonis, CA, Hooft, L, et al.. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 2016;6:e012799. https://doi.org/10.1136/bmjopen-2016-012799.Suche in Google Scholar

2. Leeflang, MM, Moons, KG, Reitsma, JB, Zwinderman, AH. Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clin Chem 2008;54:729–37. https://doi.org/10.1373/clinchem.2007.096032.Suche in Google Scholar

3. Ewald, B. Post hoc choice of cut points introduced bias to diagnostic research. J Clin Epidemiol 2006;59:798–801. https://doi.org/10.1016/j.jclinepi.2005.11.025. Epub 2006 May 26. Erratum in: J Clin Epidemiol. 2007 Jul;60(7):756. PMID: 16828672.Suche in Google Scholar

4. Reitsma, JB, Rutjes, AWS, Whiting, P, Westwood, M, Leeflang, MMG, Deeks, JJ, et al.. Chapter 8: assessing risk of bias and applicability. In: Bossuyt, PMM, McGowan, J, Korevaar, DA, editors. Cochrane 2023; 2023. Available from: https://www.cochrane.org/authors/handbooks-and-manuals/style-manual/references/reference-types/cochrane-publications#handbook-chapter.10.1002/9781119756194.ch8Suche in Google Scholar

5. Whiting, PF, Rutjes, AWS, Westwood, ME, Mallett, S, Deeks, JJ, Reitsma, JB, et al.. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529–36. https://doi.org/10.7326/0003-4819-155-8-201110180-00009.Suche in Google Scholar

6. Altman, DG, Royston, P. What do we mean by validating a prognostic model? Stat Med 2000;19:453–73.10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5Suche in Google Scholar

Received: 2025-07-12
Accepted: 2025-07-31
Published Online: 2025-09-16

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 21.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/dx-2025-0096/html
Button zum nach oben scrollen