Home Method comparison – a practical approach based on error identification
Article Publicly Available

Method comparison – a practical approach based on error identification

  • Jacobus Petrus Johannes Ungerer EMAIL logo and Carel Jacobus Pretorius
Published/Copyright: November 2, 2017

In 1965 Roy Barnett, in consultation with William Youden, proposed a systematic scheme to compare methods that are still remarkably relevant after more than five decades [1]. In the interim, numerous papers and guidelines on the topic have appeared, and broadly speaking, there are two schools of thought, one advocating scatter plots, regression and correlation, with the second promoting difference plots and analysis of differences [2], [3], [4], [5], [6], [7], [8]. The CLSI recommends both as in fact Barnett did [9].

Our comments are directed towards the comparison of field methods, with the assumption that they have been validated and have known performance characteristics. We advocate a simple and consistent approach based on fundamental aspects and principles. The focus should primarily be that of common sense, “fit-for-purpose”, judicious use of statistical analysis and to some extent on “value judgements”. Results should be presented in a way that would enable a third party to critically evaluate the data and to draw their own conclusions. Wherever possible, the data and statistical parameters should be presented in the actual units of measurement. Imprecision is often presented in relative units (coefficient of variation), which has led to the misconception that precision of methods is worse at low concentrations when the opposite is in fact true when one considers the variance in actual units (standard deviation [SD]). Presenting information as ratios (percentages or logarithms) can potentially obscure the relationship, or the absence thereof, between methods. Transformation of data and elaborate statistical procedures should be used as a last resort as it creates difficult-to-grasp complexity.

It is important to realise that in a chemical measurement procedure (method), a response signal is generated that is proportional to the concentration of the analyte of interest. This signal is used to estimate the concentration via a calibration procedure. The evaluated field methods are assumed to have no systematic bias but are always accompanied by imprecision (“noise”) [10]. Thus, the starting point is an expectation that bias is absent and that differences in responses can be explained by the imprecision of both methods. This can be assessed from a scatter plot, which is intuitively easy to interpret. The same information can be presented in a difference plot, which is simply a derivative of the former. The main advantage of difference plots is that they present the variances between the methods in a way that may be easier to relate to analytical imprecision than a plot of the residuals or the original scatter plot.

Differences between methods are due to analytical errors and therefore an appreciation of the error components is essential. Errors can be classified as systematic or random. Systematic error includes constant and proportional bias and is the result of errors in calibration. Random error consists of imprecision and sample-method error or bias [10]. Sample-method bias, also called patient/method interaction, method-related factors, between-method analytical variation and aberrant sample bias (matrix effect), is often neglected and is the result of an error (bias) that is unique to a sample and is caused by non-specificity [3], [11], [12], [13]. This error may substantially contribute to the apparent random differences between methods, and it may also affect bias, especially if the interference is predominantly one-sided. Another type of error is outliers, which are large errors that are distinct from the distribution of the majority of results. IUPAC refers to these errors as blunders, whereas Barnett called these “large discrepancies” [1], [10]. Barnett recommended that these results be removed and investigated separately.

Method comparison can be viewed as a process by which the error components are characterised. Errors within acceptable limits indicate adequate agreement and that the test method is fit-for-purpose. These limits should be determined a priori and based on what is deemed clinically acceptable [12]. Unexpectedly large differences indicate an error in one or both methods and this requires further scrutiny and may mandate more complicated statistical analysis. Hypothesis testing, where the null hypothesis states that there is no systematic difference and that the distribution of differences is due to imprecision, is straightforward.

We recommend a stepwise approach to method comparison that will identify all components of error in the majority of cases (Figure 1). The acceptable error limits that will identify the test method as fit-for-purpose should be stated beforehand (a priori). First, characterise the imprecision of each method accurately across an appropriate measuring range (Figure 1A). This information may be presented via a characteristic function of SD vs. concentration. The uncertainty is invariably heteroscedastic, whereas homoscedasticity is often assumed with statistical analysis. By limiting the data range, imprecision may approximate homoscedasticity.

Figure 1: Stepwise approach to method comparison.(A) Characteristics function: standard deviation vs. concentration of the methods. (B) Scatter plot with the comparative method on the x-axis and test method on the y-axis, including all data points with outliers identified. (C) Scatter plot after trimming the data, with regression analysis to determine systematic bias. (D) A difference plot using data and imprecision characteristics after correction for systematic bias. The broken line represents the predicted SDD and the solid line the observed SDD. The difference between observed and predicted SDD indicates the contribution of sample-method bias to the distribution of differences.
Figure 1:

Stepwise approach to method comparison.

(A) Characteristics function: standard deviation vs. concentration of the methods. (B) Scatter plot with the comparative method on the x-axis and test method on the y-axis, including all data points with outliers identified. (C) Scatter plot after trimming the data, with regression analysis to determine systematic bias. (D) A difference plot using data and imprecision characteristics after correction for systematic bias. The broken line represents the predicted SDD and the solid line the observed SDD. The difference between observed and predicted SDD indicates the contribution of sample-method bias to the distribution of differences.

Next, use a scatter plot for viewing the comparative results (Figure 1B). Singleton results should be plotted, and transformation of data should be a last resort. Use first-line statistical analysis and interpret statistical parameters in context. Trim the data by excluding extreme data points, and limit the range to suit the intended purpose of the method. Should any outliers be identified, they must be removed and investigated separately [1].

Determine the slope and intercept on the trimmed data by using a regression procedure that takes imprecision of both methods into account (Figure 1C). Any significant variance of the slope and intercept from the line of identity indicates a systematic bias in one or both of the methods. Correct the bias and use the adjusted results for further interrogation with a difference plot and analysis of differences (Figure 1D). It should be emphasised that the imprecision data (SD) should also be rescaled to reflect this correction before proceeding with further analysis.

The distribution of the unbiassed differences now reflects the imprecision and sample-method bias of both methods. Plotting the differences in the presence of a significant constant or proportional error between the methods and reporting a 95% limit of agreement is as meaningless as omitting an a priori statement on what the acceptable agreement limits should be. The SD of the differences (SDD) can be predicted from the imprecision parameters. The variation between the observed SDD and the predicted SDD reflects the sample-method bias and can be quantified. When the distribution of differences approximates homoscedasticity, constant limits will do. Otherwise, alternative approaches are required, i.e. bins, proportional 95% limits or transformation of the data. Identification of significant errors will require more in depth analysis.

An example of this approach is the comparison of two entirely different total CO2 methods, one directly measured vs. one calculated with the Henderson-Hasselbalch equation [11]. The controversy about their agreement was settled by showing that the sample-method bias was insignificant. By contrast, comparing cardiac troponin I methods, sample-method error was found that was of an order of a magnitude larger than the error caused by imprecision [13]. Interestingly, sample-method bias was ignored, whereas precision was targeted with zeal in an attempt to improve method performance, thereby popularising misnomers such as “highly sensitive” to describe troponin. Cardiac troponin provides an example of heteroscedasticity in which bins were used and, probably not ideal, differences transformed into percentages. The effect of sample-method bias is accentuated when measuring troponin in healthy individuals [14]. When comparing two cardiac troponin I methods using samples from healthy individuals, correlation was lacking (R2 0.18) and the distribution of differences could not be explained by imprecision. In fact, the sample-method bias was the major contributor to the differences between the methods, with imprecision playing a minor role (SDD predicted 1.9 ng/L and observed 11.1 ng/L). This mostly ignored inaccuracy of cardiac troponin assays has significant clinical implications.

To summarise, the approach that involves a scatter and difference plot is in line with CLSI recommendations – with some common sense adjustments. The major points are that methods are initially assumed to have no calibration bias and that all differences can be explained by their imprecision. We recommend that difference plots and analysis of differences are performed after correction of constant and proportional error. Any significant (not fit-for-purpose) observed deviation from these assumptions is the result of one or more components of error, all of which can be characterised. The decision whether these errors and methods are acceptable will depend on a priori performance limits or judgement calls. Those errors that can be should be fixed, or at worst, be managed by changing reference intervals and decision limits.


Dr. Jacobus Petrus Johannes Ungerer, Director of Chemical Pathology, Pathology Queensland, Queensland Health, Block 7, Royal Brisbane Hospital, Brisbane, 4029 QLD, Australia, Phone: +61 7 3646 8420, Fax: +61 7 3646 1392

  1. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Employment or leadership: None declared.

  4. Honorarium: None declared.

References

1. Barnett RN. A scheme for the comparison of quantitative methods. Am J Clin Pathol 1965;43:562–9.10.1093/ajcp/43.6.562Search in Google Scholar PubMed

2. Westgard JO, Hunt MR. Use and interpretation of common statistical tests in method-comparison studies. Clin Chem 1973;19:49–57.10.1093/clinchem/19.1.49Search in Google Scholar

3. Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician 1983;32:307–17.10.2307/2987937Search in Google Scholar

4. Hollis S. Analysis of method comparison studies. Ann Clin Biochem 1996;33:1–4.10.1177/000456329603300101Search in Google Scholar PubMed

5. Stöckl D. Beyond the myths of difference plots. Ann Clin Biochem 1996;33:575–7.10.1177/000456329603300618Search in Google Scholar PubMed

6. Twomey PJ, Kroll MH. How to use linear regression and correlation in quantitative method comparison studies. Int J Clin Pract 2008;62:529–38.10.1111/j.1742-1241.2008.01709.xSearch in Google Scholar PubMed

7. Stöckl D, Dewitte K, Thienpont LM. Validity of linear regression in method comparison studies: is it limited by the statistical model or the quality of the analytical input data? Clin Chem 1998;44:2340–6.10.1093/clinchem/44.11.2340Search in Google Scholar

8. Dewitte K, Fierens C, Stöckl D, Thienpont LM. Application of the Bland-Altman plot for interpretation of method-comparison studies: a critical investigation of its practice. Clin Chem 2002;48:799–801.10.1093/clinchem/48.5.799Search in Google Scholar

9. EP09-A3. Measurement procedure comparison and bias estimation using patient samples; Approved Guideline – Third Edition. CLSI 2013.Search in Google Scholar

10. Currie LA. Nomenclature in evaluation of analytical methods including detection and quantification capabilities (IUPAC recommendations 1995). Pure Appl Chem 1995;67:1699–723.10.1351/pac199567101699Search in Google Scholar

11. Ungerer JP, Ungerer MJ, Vermaak WJ. Discordance between measured and calculated total carbon dioxide. Clin Chem 1990;36:2093–6.10.1093/clinchem/36.12.2093Search in Google Scholar

12. Petersen PH, Stöckl D, Blaabjerg O, Pedersen B, Birkemose E, Thienpont L, et al. Graphical interpretation of analytical data from comparison of a field method with a reference method by use of difference plots. Clin Chem 1997;43:2039–46.10.1093/clinchem/43.11.2039Search in Google Scholar

13. Ungerer JP, Marquart L, O’Rourke PK, Wilgen U, Pretorius CJ. Concordance, variance, and outliers in 4 contemporary cardiac troponin assays: implications for harmonisation. Clin Chem 2012;58:274–83.10.1373/clinchem.2011.175059Search in Google Scholar PubMed

14. Ungerer JP, Tate JR, Pretorius CJ. Discordance between 3 cardiac troponin I and T assays: implications for the 99th percentile cutoff. Clin Chem 2016;62:1106–14.10.1373/clinchem.2016.255281Search in Google Scholar PubMed

Published Online: 2017-11-2
Published in Print: 2017-11-27

©2018 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Method comparison – a practical approach based on error identification
  4. Review
  5. Neutrophil gelatinase-associated lipocalin as a risk marker in cardiovascular disease
  6. Mini Reviews
  7. α-Defensin point-of-care test for diagnosis of prosthetic joint infections: neglected role of laboratory and clinical pathologists
  8. The diagnostic accuracy of biomarkers for diagnosis of primary biliary cholangitis (PBC) in anti-mitochondrial antibody (AMA)-negative PBC patients: a review of literature
  9. Opinion Paper
  10. New issues on measurement of B-type natriuretic peptides
  11. Genetics and Molecular Diagnostics
  12. The SEeMORE strategy: single-tube electrophoresis analysis-based genotyping to detect monogenic diseases rapidly and effectively from conception until birth
  13. General Clinical Chemistry and Laboratory Medicine
  14. Determination of serum calcium levels by 42Ca isotope dilution inductively coupled plasma mass spectrometry
  15. The effects of dry ice exposure on plasma pH and coagulation analyses
  16. Placental protein-13 (PP13) in combination with PAPP-A and free leptin index (fLI) in first trimester maternal serum screening for severe and early preeclampsia
  17. Circulating CD89-IgA complex does not predict deterioration of kidney function in Korean patients with IgA nephropathy
  18. Performance analysis of automated evaluation of Crithidia luciliae-based indirect immunofluorescence tests in a routine setting – strengths and weaknesses
  19. Performance of automated digital cell imaging analyzer Sysmex DI-60
  20. Reference Values and Biological Variations
  21. Determination of reference intervals for urinary steroid profiling using a newly validated GC-MS/MS method
  22. Reference intervals and longitudinal changes in copeptin and MR-proADM concentrations during pregnancy
  23. Definition of the upper reference limit of glycated albumin in blood donors from Italy
  24. Reference values of fecal calgranulin C (S100A12) in school aged children and adolescents
  25. Processing-independent proANP measurement for low concentrations in plasma: reference intervals and effect of body mass index and plasma glucose
  26. Cancer Diagnostics
  27. Cancer sniffer dogs: how can we translate this peculiarity in laboratory medicine? Results of a pilot study on gastrointestinal cancers
  28. Cardiovascular Diseases
  29. NGAL and MMP-9/NGAL as biomarkers of plaque vulnerability and targets of statins in patients with carotid atherosclerosis
  30. Analytical evaluation of the new Beckman Coulter Access high sensitivity cardiac troponin I immunoassay
  31. Infectious Diseases
  32. Analytical evaluation of the performances of Diazyme and BRAHMS procalcitonin applied to Roche Cobas in comparison with BRAHMS PCT-sensitive Kryptor
  33. Effects of procalcitonin testing on antibiotic use and clinical outcomes in patients with upper respiratory tract infections. An individual patient data meta-analysis
  34. Acknowledgment
  35. Letters to the Editor
  36. Handling the altered test results of hemolyzed samples. Recommendations of the Quality, Management, Safety and Evidence Committee (CCGSE) of the Spanish Association of Medical Biopathology and Laboratory Medicine (AEBM-ML)
  37. Reply to: Analytical evaluation of the performances of Diazyme and BRAHMS procalcitonin applied to Roche Cobas in comparison with BRAHMS PCT-sensitive Kryptor
  38. Excessive hypercortisolemia due to ectopic Cushing’s syndrome requiring extending the reportable range for plasma cortisol for management
  39. Heavy chain disease: our experience
  40. An abnormal elevation of serum CA72-4 due to taking colchicine
  41. Rivaroxaban non-responders: do plasma measurements have a place?
  42. Next generation sequencing and immuno-histochemistry profiling identify numerous biomarkers for personalized therapy of endometrioid endometrial carcinoma
  43. A multicenter effort to improve comparability of vitamin B6 assays in whole blood
  44. PR3-anti-neutrophil cytoplasmic antibodies (ANCA) in ulcerative colitis
Downloaded on 20.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/cclm-2017-0842/html
Scroll to top button