Home How to reduce scientific irreproducibility: the 5-year reflection
Article Publicly Available

How to reduce scientific irreproducibility: the 5-year reflection

  • Clare Fiala and Eleftherios P. Diamandis EMAIL logo
Published/Copyright: October 14, 2017

Abstract

We discuss in depth six causes of scientific irreproducibility and their ramifications for the clinical sciences: fraud, unfounded papers published by prominent authorities, bias, technical deficiencies, fragmented science and problems with big data. Some proposed methods to combat this problem are briefly described, including an effort to replicate results from some high impact papers and a proposal that authors include detailed preclinical data in papers with supposedly high translational value. We here advocate for a 5-year reflection on papers with seemingly high clinical/translational potential, published alongside the original paper where authors reflect on the quality, reproducibility and impact of their findings. These reflections can be used as a benchmark for credibility, and begin a virtuous cycle of improving the quality of published findings in the literature.

It seems these days everybody is talking about irreproducibility in science [1]. But is this a new problem? Certainly not! The increased awareness is due to the fact that the number of irreproducible papers published in high-impact journals is on the rise [2]. In 2006, one of us (EPD) wrote a mentorship paper addressing this problem [3]. Also, John Ioannidis [4] and others published repeatedly on the same issue, including one paper with the attention catching title “Why most published research findings are false”. Can this menace to science be eliminated? Likely not, but it could be significantly reduced with some measures (please see below).

We emphasize here that this issue is of great interest to laboratory professionals and other readers of CCLM because a significant number of papers describing disease biomarkers fall into the category of irreproducible science.

In this paper, we identify six types of irreproducible results or papers in science (Table 1).

Table 1:

Reasons for/types of scientific irreproducibility.

Reason for irreproducibilityDescriptionReferences
Fraud– Outright fakery of results is uncommon and some cases are highly publicized[5], [6], [7], [8]
Overconfident investigations publish unfounded papers– Prominent authorities, such as Nobel Prize winners publish in an area unrelated to their specialty

– These authors overestimate their abilities and publish poorly founded papers
[9], [10], [11]
Bias– Bias can interfere with accurate results throughout the scientific process leading to statistically significant differences between the comparison groups that are not due to what is being studied[12], [13], [14]
Technical deficiencies– Unawareness of the limitations of the techniques used or unfamiliarity with equipment can lead to erroneous results

– Fake reagents (counterfeit science)
[15], [16]
Fragmented science– Mega projects are now often executed in pieces across the world and the results are then knitted together

– It is virtually impossible to know exactly what is happening in other sites; external results are usually accepted based on faith
[17]
Problems with bioinformatics/big data analysis– It is next to impossible to be sure that bioinformatic tools are analyzing huge amounts of information with great robustness; even small glitches can lead to false conclusions[18]

Type 1 irreproducibility is due to fraud, which is presumed to be rare. Some highly publicized cases can be found in the cited literature [5], [6], [7], [8].

Type 2 irreproducible papers are usually written by prominent authorities, such as Nobel Prize winners, usually in an area that is unrelated to their Nobel-winning discovery. Some of these authors may suffer from a disease that one of us coined recently as “Nobelitis”, which is related to another megalomania-related malady known as “Hubris syndrome” [9], [10]. In such cases, these authors overestimate their abilities due to their fame and publish papers that are not well-founded. An example is the case of the brilliant Chemist Linus Pauling (a double Nobel Prize winner) who claimed in the 1970s that mega-doses of vitamin C can prevent or cure cancer. Specialists in this field worked for years examining this suggestion and eventually discredited it. As expected, Pauling never accepted the verdict. Even Einstein published blunders. For example, in his work about relativity, he added an (incorrect) cosmological constant to balance his equations [11].

A third type of irreproducibility is false discovery due to bias (pre-analytical/analytical/post-analytical) [12]. Bias can lead to statistically significant differences between the comparison groups (e.g. control subjects and cancer patients) that are not due to the disease at hand but to something else. Ideally, scientists should control for every factor in the comparison groups but this is easier said than done. There are numerous examples of papers published in top-tier journals where there was obvious bias between the comparison groups. For example, comparing a test for prostate cancer diagnosis between a control group of 20–30-year-old men to prostate cancer patients (who are usually 60–70 years old) can be very problematic as differences between the groups could be due to cancer or to the large age difference. Although this appears to be an obvious bias, we previously identified a study in a prestigious journal which utilized this troublesome comparison [13]. Other biases can be very subtle. For example, biotechnology company “Atairgin”, formed in 1996 with millions of dollars of investment, aimed to commercialize a test for early ovarian cancer diagnosis [14]. The company folded a few years later after realizing that the observed differences between their controls and ovarian cancer patients in a study to develop their test was merely due to minor differences in the centrifugation speed of blood samples collected at two different locations. This is a prime example of a very subtle bias which was very difficult to spot but had massive impact on the results. Other examples are cited in our previous review [14].

Irreproducibility type 4 is usually due to technical deficiencies of those who do the experiments, especially unawareness of the limitations of the techniques used. We have plenty of experience with our own graduate students working with mass spectrometry instrumentation. These trainees are familiar with the routine operation of the machines but they generally lack in-depth knowledge of the technology and its limitations. For example, we have seen students using the wrong parameters to search their results in public databases while others used the wrong Zip-Tips (a solid-phase extraction device) for sample preparation. These small technical details can lead to significant errors in data collection and interpretation. As we often tell our students “a small hole can sink a big ship”. A newly surfaced reason for paper irreproducibility, included in the technical deficiencies category, is what is known as “counterfeit science”. In this case, manufacturers sell fake reagents, which generate misleading (and irreproducible) results [15], [16].

Type 5 irreproducibility is on the rise and it is a byproduct of fragmented science. Mega projects are now executed in pieces in various laboratories, sometimes on an international scale, and the results are then knitted together, usually by a single principal investigator. As it is virtually impossible to know exactly what is happening in other sites, the PI accepts external results based on faith. This practice can lead to disasters, as was the case with the 2014 pluripotent stem cell case, where prominent scientists participated in a fraudulent (and later retracted) publication in a top tier journal [7], [17]. As fragmented science is becoming more common, this problem will likely become more frequent.

Irreproducibility type 6 is associated with the recent trend towards big data generation and analysis by bioinformatics. With big data, it is next to impossible to be sure that the bioinformatics tools are analyzing the information with great robustness. Small glitches in the tools may lead to false conclusions. An example of this type of error is the case of a well-respected crystallographer who was forced to retract five high-impact papers in 2007 (three in Science, one in PNAS and one in J. Mol. Biol.) due to a very small computer program glitch [18].

Is there a radical solution to these irreproducibility issues in science? Likely not, but some recent efforts to minimize the problem are worth mentioning.

The Reproducibility Project was launched in 2013 with the goal of reproducing data from 50 cancer papers published in high-impact journals [19]. Due to financial constraints, the project was subsequently downsized and data on five papers are now published [20]. The results were not clear-cut, precluding firm conclusions. In these authors’ opinion, this project is not viable, for four major reasons. This process is costly, frequently inconclusive, technically demanding and slow.

Recently, Mogil and Macleod [21] proposed inclusion of very strong preclinical data before publishing a paper with perceived impact. Though this suggestion could improve scientific irreproducibility, we doubt that it will be effective as generating such preclinical data will be expensive, time consuming and will likely delay publication of critical information. Also recently, top tier journals have adopted new policies for submitting manuscripts, requiring more explicit details of the experimental and statistical procedures [22].

We propose an alternative solution to address the issue of irreproducibility of papers with translational or commercial importance (perceived as “breakthroughs”). We suggest authors of such papers should be invited to provide a 5-year (and perhaps a 10-year) reflection on their papers. With this system, authors will be obliged to sign a declaration form at publication (like the commonly used conflict of interest forms) where they will agree to publish a reflection on their work; including information such as whether the invention was translated or commercialized, the state of its development and any other insights (especially identification of errors, misinterpretations or other roadblocks). We proposed this idea to some authors of such “high-impact” papers a few years back, but none of them agreed to write such a reflection. Nonetheless, with the advent of electronic publishing, it is now possible for journals to easily publish such a 5–10-year reflection, which could be an addendum to the original manuscript. This is a no cost procedure which will allow readers to very quickly assess the impact and success or failures of these papers.

We believe the obligation of authors to write a reflection on their high-impact papers will make them be more careful when executing their experiments and more conservative when publishing their work. For example, they will likely avoid overselling their results. Furthermore, we believe that this obligation will likely improve upon other stages of any research project, such as idea conception and experimental planning.

Work by psychologists Jessica Lerner and Philip Tetlock [23] suggests that simple steps meant to hold people accountable for their judgment can actually improve their judgment. In this spirit, in Figure 1, we describe the thought process of a scientist who is executing an experiment, anticipating that he/she may or may not be writing a reflection on the outcome of his/her work in the near future. We postulate that the experiment will likely be planned, executed and published with a different state of mind between the two scenarios. The future accountability, represented by the reflection, will likely lead to more sound and reproducible results.

Figure 1: Behavioral modification of investigators who are expected to write (lower panel) or not write (upper panel) a 5-year reflection. Note that all steps of the investigation, from idea conception to publishing, will likely be improved with the 5-year reflection. This thought modification will likely lead to better and more reproducible science. This behavioral modification has been proposed earlier by Lerner and Tetlock [23].
Figure 1:

Behavioral modification of investigators who are expected to write (lower panel) or not write (upper panel) a 5-year reflection. Note that all steps of the investigation, from idea conception to publishing, will likely be improved with the 5-year reflection. This thought modification will likely lead to better and more reproducible science. This behavioral modification has been proposed earlier by Lerner and Tetlock [23].

We hope that high-impact journals, especially in the clinical sciences, will adopt this proposal as a mandatory publication requirement for papers with seemingly large translational potential. We also suggest that funding agencies adopt similar requirements for their grantees and include such reflections in the reporting process for their awarded grants. The outcome of our proposal, to make scientists reflect, could be evaluated in the future as a no cost measure for its effectiveness in reducing scientific irreproducibility.


Corresponding author: Eleftherios P. Diamandis, MD, PhD, FRCP(C), FRSC, Head of Clinical Biochemistry, Mount Sinai Hospital and University Health Network, 60 Murray St. Box 32, Floor 6, Rm L6-201, Toronto, ON, MST 3L9, Canada, Phone: (416) 586-8443

  1. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Employment or leadership: None declared.

  4. Honorarium: None declared.

  5. Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

References

1. Schulz JB, Cookson MR, Hausmann L. The impact of fraudulent and irreproducible data to the translational research crisis – solutions and implementation. J Neurochem 2016;139:253–70.10.1111/jnc.13844Search in Google Scholar PubMed

2. Baker M. 1,500 Scientists lift the lid on reproducibility. Nature 2016;533:452–4.10.1038/533452aSearch in Google Scholar PubMed

3. Diamandis EP. Quality of the scientific literature: all that glitters is not gold. Clin Biochem 2006;39:1109–11.10.1016/j.clinbiochem.2006.08.015Search in Google Scholar PubMed

4. Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2:e124.10.1371/journal.pmed.0020124Search in Google Scholar PubMed PubMed Central

5. Vogel G. Jan Hendrik Schön loses his Ph.D. Science 2011. Available at: http://www.sciencemag.org/news/2011/09/jan-hendrik-sch-n-loses-his-phd.Search in Google Scholar

6. Cyranoski D. Verdict: Hwang’s human stem cells were all fake. Nature 2006;439:122–3.10.1038/439122aSearch in Google Scholar PubMed

7. Obokata H, Wakayama T, Sasai Y, Kojima K, Vacanti MP, Niwa H, et al. Retraction: stimulus-triggered fate conversion of somatic cells into pluripotency. Nature 2014;511:112.10.1038/nature13598Search in Google Scholar PubMed

8. Odling-Smee L, Giles J, Fuyuno I, Cyranoski D, Marris E. Misconduct special: where are they now? Nature 2007;445:244–5.10.1038/445244aSearch in Google Scholar PubMed

9. Diamandis EP. Nobelitis: a common disease among nobel laureates? Clin Chem Lab Med 2013;51:1573–4.10.1515/cclm-2013-0273Search in Google Scholar PubMed

10. Owen D, Davidson J. Hubris syndrome: an acquired personality disorder? A study of US Presidents and UK Prime Ministers over the last 100 years. Brain 2009;132:1396–406.10.1093/brain/awp008Search in Google Scholar PubMed

11. Ohanian HC. Einstein’s mistakes: the human failings of genius. New York, NY, USA: W.W. Norton and Company, 2008.Search in Google Scholar

12. Ransohoff DF. Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer 2005;5:142–9.10.1038/nrc1550Search in Google Scholar PubMed

13. Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB, et al. Differential exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006;116:271–84.10.1172/JCI26022Search in Google Scholar PubMed PubMed Central

14. Diamandis EP. Cancer biomarkers: can we turn recent failures into success? J Natl Cancer Inst 2010;102:1462–7.10.1093/jnci/djq306Search in Google Scholar PubMed PubMed Central

15. Cyranoski D. The secret war against counterfeit science. Nature 2017;545:148–50.10.1038/545148aSearch in Google Scholar PubMed

16. Prassas I, Brinc D, Farkona S, Leung F, Dimitromanolakis A, Chrystoja CC, et al. False biomarker discovery due to reactivity of a commercial ELISA for CUZD1 with cancer antigen CA125. Clin Chem 2014;60:381–8.10.1373/clinchem.2013.215236Search in Google Scholar PubMed

17. Sugawara Y, Tanimoto T, Miyagawa S, Murakami M, Tsuya A, Tanaka A, et al. Scientific misconduct and social media: role of Twitter in the stimulus triggered acquisition of pluripotency cells scandal. J Med Intern Res 2017;19:e57.10.2196/jmir.6706Search in Google Scholar PubMed PubMed Central

18. Miller G. Scientific publishing. A scientist’s nightmare: software problem leads to five retractions. Science 2006;314:1856–7.10.1126/science.314.5807.1856Search in Google Scholar PubMed

19. Baker M, Dolgin E. Cancer reproducibility project releases first results. Nature 2017;541:269–70.10.1038/541269aSearch in Google Scholar PubMed

20. Anonymous. The challenges of replication. Elife 2017;6:pii: e23693.10.7554/eLife.23693Search in Google Scholar PubMed PubMed Central

21. Mogil JS, Macleod MR. No publication without confirmation. Nature 2017;542:409–11.10.1038/542409aSearch in Google Scholar PubMed

22. Anonymous. Towards greater reproducibility. Nature 2017;546:8.10.1038/546008aSearch in Google Scholar PubMed

23. Lerner JS, Tetlock PE. Accounting for the effects of accountability. Psychol Bull 1999;125:255–75.10.1037/0033-2909.125.2.255Search in Google Scholar PubMed

Received: 2017-8-25
Accepted: 2017-9-5
Published Online: 2017-10-14
Published in Print: 2017-10-26

©2017 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Tumor microenvironment and systemic disease: a dual target in medical oncology (also in the case of biomarkers)
  4. Reviews
  5. Sample types applied for molecular diagnosis of therapeutic management of advanced non-small cell lung cancer in the precision medicine
  6. The impact of pneumatic tube system on routine laboratory parameters: a systematic review and meta-analysis
  7. Opinion Paper
  8. How to reduce scientific irreproducibility: the 5-year reflection
  9. EFLM Paper
  10. Strategies to define performance specifications in laboratory medicine: 3 years on from the Milan Strategic Conference
  11. General Clinical Chemistry and Laboratory Medicine
  12. Intensive educational efforts combined with external quality assessment improve the preanalytical phase in general practitioner offices and nursing homes
  13. Separate patient serum sodium medians from males and females provide independent information on analytical bias
  14. Are admission procalcitonin levels universal mortality predictors across different medical emergency patient populations? Results from the multi-national, prospective, observational TRIAGE study
  15. Biochemical testing in a laboratory tent and semi-intensive care of Ebola patients on-site in a remote part of Guinea: a paradigm shift based on a bleach-sensitive point-of-care device
  16. Efficient reporting of the estimated glomerular filtration rate without height in pediatric patients with cancer
  17. Evaluation of thyroid test utilization through analysis of population-level data
  18. Relation of serum γ-glutamyl transferase activity with copper in an adult population
  19. Impact of a single oral dose of 100,000 IU vitamin D3 on profiles of serum 25(OH)D3 and its metabolites 24,25(OH)2D3, 3-epi-25(OH)D3, and 1,25(OH)2D3 in adults with vitamin D insufficiency
  20. Automated antinuclear immunofluorescence antibody analysis is a reliable approach in routine clinical laboratories
  21. Infrared analyzers for breast milk analysis: fat levels can influence the accuracy of protein measurements
  22. Hematology and Coagulation
  23. A new approach to define acceptance limits for hematology in external quality assessment schemes
  24. The effects of transport by car on coagulation tests
  25. Combined measurement of factor XIII and D-dimer is helpful for differential diagnosis in patients with suspected pulmonary embolism
  26. Reference Values and Biological Variations
  27. Influence of age, gender and body mass index on late-night salivary cortisol in healthy adults
  28. Cancer Diagnostics
  29. Assessment of real-time PCR method for detection of EGFR mutation using both supernatant and cell pellet of malignant pleural effusion samples from non-small-cell lung cancer patients
  30. Detection of EGFR mutations in patients with non-small cell lung cancer by high resolution melting. Comparison with other methods
  31. Predicting outcomes of EGFR-targeted therapy in non-small cell lung cancer patients using pleural effusions samples and peptide nucleic acid probe assay
  32. Analytical and clinical performance of thyroglobulin autoantibody assays in thyroid cancer follow-up
  33. Thyroglobulin autoantibodies before radioiodine ablation predict differentiated thyroid cancer outcome
  34. Cardiovascular Diseases
  35. Real life dabigatran and metabolite concentrations, focused on inter-patient variability and assay differences in patients with atrial fibrillation
  36. Infectious Diseases
  37. HIV avidity index performance using a modified fourth-generation immunoassay to detect recent HIV infections
  38. Letters to the Editor
  39. Effect of age on serum prostate-specific antigen in women
  40. Falsely elevated thyroid-stimulating hormone value due to anti-ruthenium antibodies in a patient with primary hypothyroidism: a case report
  41. High titers of anti-infliximab antibody do not interfere with Abbott immunoassays
  42. Extraordinarily elevated serum CA19-9 in a patient with posterior mediastinum cyst: a case report
  43. Low levels of 25-OH vitamin D in women with endometriosis and associated pelvic pain
  44. Evaluation of serum cortisol biological variation in the evening withdrawal
  45. Distinction between urine crystals by automated urine analyzer SediMAX conTRUST is specific but lacks sensitivity
  46. Impact of heat-inactivation on anti-Toxoplasma IgM antibody levels
Downloaded on 10.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/cclm-2017-0759/html
Scroll to top button