Total error vs. measurement uncertainty: revolution or evolution?

Wytze P. Oosterhuis; Elvar Theodorsson

doi:10.1515/cclm-2015-0997

Article Publicly Available

Total error vs. measurement uncertainty: revolution or evolution?

Wytze P. Oosterhuis and Elvar Theodorsson

Published/Copyright: November 5, 2015

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Clinical Chemistry and Laboratory Medicine (CCLM) Volume 54 Issue 2

Abstract

The first strategic EFLM conference “Defining analytical performance goals, 15 years after the Stockholm Conference” was held in the autumn of 2014 in Milan. It maintained the Stockholm 1999 hierarchy of performance goals but rearranged them and established five task and finish groups to work on topics related to analytical performance goals including one on the “total error” theory. Jim Westgard recently wrote a comprehensive overview of performance goals and of the total error theory critical of the results and intentions of the Milan 2014 conference. The “total error” theory originated by Jim Westgard and co-workers has a dominating influence on the theory and practice of clinical chemistry but is not accepted in other fields of metrology. The generally accepted uncertainty theory, however, suffers from complex mathematics and conceived impracticability in clinical chemistry. The pros and cons of the total error theory need to be debated, making way for methods that can incorporate all relevant causes of uncertainty when making medical diagnoses and monitoring treatment effects. This development should preferably proceed not as a revolution but as an evolution.

Keywords: allowable total error; analytical performance goals; error; total analytical error; uncertainty

Introduction

All scientific paradigms and methods need to be honestly challenged to develop and prevail. This was the purpose of the first strategic EFLM conference “Defining analytical performance goals, 15 years after the Stockholm Conference” held in the autumn of 2014 in Milan. The Stockholm conference 1999 constituted an influential milestone in Clinical Chemistry and the Milan conference 2014 honored its pioneers. Other recent developments also contributed to the incentives for organizing the Milan 2014 conference including the ISO 17025 and 15189 accreditation standards which require that laboratories routinely provide the measurement uncertainty (MU) of the results of measurements and the harmonization of the evaluation of external quality assessment procedures and recent challenges to the total error (TE) theory including the calculation of allowable total error (ATE).

The Stockholm quality criteria were maintained by the Milan 2014 conference but they were arranged in parallel instead of the Stockholm 1999 hierarchy [1]. Five EFLM Task and Finish Groups (TFG) were initiated by the Milan 2014 Conference commissioned with exploring and developing the main topics discussed during the Conference, amongst them one group tasked with work on TE of which the current authors are members. Because the on-going work of the TFGs is not finished, the opinions expressed in the current paper represent only the opinions of its authors and not the consensus of the group.

The Westgard website (www.westgard.com) early on criticized the Milan 2014 conference comparing it to the edict of Milan 313 AD. Despite its ominous name, the edict of Milan was actually an agreement between the Western Roman Emperor Constantine I, and Licinius, who controlled the Balkans to treat Christians benevolently within the Roman Empire. In contrast to the freedom embodied in the Stockholm 1999 consensus, the Westgard website expressed fear of persecution, and that those who persist in adhering to the original Stockholm 1999 criteria or the TE paradigm will be deemed unclean and face ostracism.

In fact, the Milan 2014 conference encouraged openness and freedom of thought in all the topics of interest. The Westgards thus have “nothing to fear but fear itself” having for decades represented the most influential development in the area of quality control in clinical chemistry. Sten Westgard has from the outset been and is a welcome member of the EFLM task and finish group on TE.

In an opinion paper in the current issue [2] Jim Westgard calls attention to several subjects that were discussed during the 2014 Milan conference. While in general supportive of further work on the issues, it strongly emphasizes the advantages of the original Stockholm 1999 criteria and of the conventional TE paradigm. The paper is laudable in its comprehensive coverage of the TE and ATE principles. However, it lacks perspectives on the limitations of the TE and ATE to accommodate all relevant sources of uncertainty encountered in clinical chemistry. Our purpose here is to point out areas related to TE and ATE where we find that new thinking is needed.

The error and uncertainty paradigms

The error paradigm was initiated in the early 19th century i.a. by C.F. Gauss in order to elucidate the true information contained in a measurement result. Errors were divided into the categories systematic and random errors. Systematic errors can be corrected and sometimes practically eliminated by proper calibration whereas the effects of random errors on the estimation of the mean can only be reduced – apart from improving the method – by repeated measurements. However, it became evident that a multitude of factors can influence the result of measurement and the true value – the cornerstone of the error paradigm – can never be known exactly. Therefore the error paradigm including TE and ATE struggles when representing numerous and complex factors influencing measurement results and commonly resorts to the simplest models of reality, e.g. the common misconception amongst clinical chemists that the repeatability and reproducibility – as measures of short and long term imprecision – of a measurement result equals its overall MU.

The MU paradigm, which originated in the early 20th century [3], claims that uncertainty is a property of a measurement result which expresses lack of knowledge of the true value of the result and incorporates the factors known to influence it. To estimate the combined uncertainty of a measurement, all sources of random and systematic influences should be elucidated and the contribution of each major component to the MU estimated by proper methods. MU has been generally accepted in all fields of metrology except in clinical chemistry [4] despite the fact that the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) is amongst the founders of uncertainty methods in chemistry [4, 5].

Total error and total allowable error

The TE/ATE paradigm aspires to be a complete model, but suffers from a number of shortcomings that have become particularly evident lately.

In the conventional TE paradigm, bias and imprecision (multiplied by a z-factor) are added linearly, resulting in a value for the total error: TE=bias+zCVa. This value represents an estimate of the limit of an interval around the (unknown) “true” value where measured analytical results can be found with a defined probability (usually a 95% probability).

The same model can be used to derive maximum (allowable) error limits. Here we encounter a remarkable property of the Gaussian distribution: when a limit is defined with a corresponding number of test results outside this limit (e.g. 5% at z=1.65, one-sided) the combinations of bias and imprecision (CVa) fulfilling this condition show a straight line with a slope of –z. For this reason the model can be called linear. At the one extreme of the line we have bias=0, with ATE=1.65CVa, at the other extreme we have CVa=0 (a hypothetical value) and bias=ATE.

The TE model further assumes that the difference TE between the patients’ result and the true value is known, and if the patients’ results differ less than ATE from the true value (TE<ATE), all the clinically intended quality needs are satisfied.

None of these assumptions are however comprehensive enough or fit for purpose. Usually we do not know whether ATE satisfies the intended quality for the clinical use of the patient results despite the fact that “acceptance limits” in many proficiency testing/external quality assurance programs are expressed in the form of ATE. These criteria are in many cases technical and established for satisfying the minimum regulatory limits, such as a certain percentage of participants fulfilling the criteria. In some cases including the US CLIA for acceptable performance limits are written into law, with no documentation of their origin or their fitness for clinical use. Furthermore certain criteria may not be fit for use for both diagnosis and monitoring purposes.

According to the linear TE/ATE model, the sum of a maximum bias and z times a maximum imprecision equals a constant, the ATE. The fact that maximum bias and imprecision combine to a constant number is very convenient in the model propagated by Westgard. According to this model this characteristic is valid over the total range of imprecision and bias. This allows for the development of quality rules for internal quality control procedures independent of the ratio of bias and imprecision. This is of crucial and fundamental importance in the methods advocated by both J. Westgard and S. Westgard (www.westgard.com).

This model is however only valid when imprecision and bias are the only variables involved, or in other words: when the assumed Gaussian distribution is completely described by the measurement bias and imprecision alone. This model, e.g. does not cater for biological variation and other evident additional causes of variation. This is a serious drawback of this model going forwards as it needs to incorporate possibilities to deal with variation in additional factors. Biological variation is not relevant when monitoring variation in control samples, but it certainly is when dealing with patient samples.

Developments in this area are therefore essential if ATE limits are to manage defending their current widespread use.

The conventional model assumes that ATE can generally be derived from biological variation. The following values are commonly used when calculating ATE limits:

Bias: <0.25(CVi+CVg); imprecision CVa: <0.5CVi

With CV_i=within-subject variation, CV_g=between-subject variation; CV_a=analytical variation.

Westgard advocates these equations and constants in his paper. It is important to note, these values represent maximum values that result from theories based on the combined influence of bias and imprecision on reference values [6]. These values were derived under mutual exclusive conditions: the imprecision term is valid only under the condition of zero bias, and conversely, the bias term is only valid with zero imprecision (a hypothetical condition). It has been shown that the use of these equations and constants can lead to a serious over estimation of ATE [7].

Mixed models and performance specifications

Westgard refers to alternative models as mixed models, as both state-of-the-art analytical variation and biological variation are mixed when calculating ATE limits.

When applying a model based on reference values and reference change values, the “state-of-the-art” analytical variation should be included into the model. The reason is, that the reference (change) value is in part determined by that analytical variation. Quality control procedures will be aimed at keeping analytical methods stable, and reference (change) values valid.

Two phases in the lifecycle of a test should be clearly distinguished. During the validation phase, the different performance measures of a test are established and decision is taken whether it is fir for purpose. In a second phase where the test is in routine use, quality control procedures are established to maintain the quality including maintaining the validity of reference values, and in the case of monitoring applications, validity of reference change values. The models we are discussing here are appropriate for the second phase. The performance of the methods is known (and presumably state-of-the-art), and quality measures are aimed at maintaining this level. Including state-of-the-art into the model is not a problem in this case.

The ratio of bias and imprecision should be known for the estimation of ATE in those mixed models [8] and the effects of these factors on the total error of the measurement in theory need to be separated. However, this can only be done in retrospect, and not at the instance of the interpretation of the latest results of the internal quality control. This can be solved by assuming the imprecision to be constant, with additional error ascribed to bias [8]. This assumption in part harmonizes with the Westgard model, where the power curves suppose a stable initial imprecision.

The total error model and measurement uncertainty

The two main issues favoring the MU paradigm are that most if not all other fields of metrology are using it and that it encourages estimation of the major components of uncertainty and favors actions for their minimization.

GUM defines uncertainty as “a parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.” This parameter is usually a standard deviation, or the width of a confidence interval. This uncertainty can be expanded by a factor (e.g. 1.96): this is interpreted as an interval in which the value of the result of a measurement resides with a defined probability. With a coverage factor of 1.96, there is a 95% probability (two-sided) that the measurement result is within this interval, assuming a normal distribution of the results [9].

Although the bottom-up approach of the GUM is widely considered to be not well applicable for the clinical laboratory, and the expected new CLSI guideline is assumed not to follow GUM, several concepts of GUM are valuable. GUM states that all bias should be corrected where possible. This is one of the important contradictions within the TE model: why maintain the error (bias), when the value is known and can be used in the calculation of TE? In routine practice however, bias is often not known exactly as imprecision and bias cannot be separated easily. For example, when we try to estimate bias from internal quality control results, the result will depend on the time frame: the wider the time frame, the more bias will be averaged out and be included in the calculation as imprecision.

MU is useful for a number of reasons. It gives information about the quality of the measurements; is useful for comparing the metrological quality of several clinical laboratories (among accredited clinical laboratories, provided that it is calculated in the same way) and helps in interpretation of measurement results, especially when close to critical values (e.g. disease defining values, ethanol concentration in blood for drivers, etc.). In fact, when comparing a result with a decision limit (e.g. 7.0 mmol/L in the diagnosis of diabetes) we can give clear information to the clinician only if the limit is not included within the uncertainty around the result. So, there is no doubt that the concept is valuable. Outside the laboratory we can encounter the MU model, when speed measurements by traffic police are compared with speed limits only after a correction based upon the uncertainty of the measurement.

Are the differences between TE and MU reconcilable?

Differences and similarities of MU and TE have been reviewed [9–11]. These models have their differences, but these should not be over-emphasized. TE e.g. defines a region around the reference (“true”) value where measured analytical results can be found with a defined probability while MU defines a region around the measured analytical result where the “true” value can be found with a defined probability. The similarity between both models is, that they both express the reliability of the test result, however from a different perspective.

When the MU model is made as simple as possible eliminating bias and only dealing with few independent causes of variation, the two models become mathematically identical, e.g. in internal quality control of a single method where many sources of variation encountered when measuring patient samples are excluded. In practice causes of variation are much more complex, including e.g. variation within method, between methods, within and between measurement systems, within and between laboratories, within and between reagent lots and calibrations, within and between individuals and preanalytical variation.

In his article [2], Westgard appreciably attempts to reconcile total analytical error and MU. He considers “total analytical error [TAE] as a measure of accuracy and MU as a measure of traceability” and believes that “TAE is essential to manage quality within a medical laboratory and MU and trueness are essential to achieve comparability of results across laboratories”. We find this solution all too simplistic. TE/ATE methods need to take into account the complex realities catered for by the MU methods and the latter need to be developed into the practical reality that the TE/TAE methods have catered for in clinical chemistry for decades. MU models have been used primarily to estimate the reliability of measurement results. Application for goals setting is not well developed. However, in internal quality control the TE/MU models become quite the same, and the goal setting developed in the TE model is applicable for both. It should be noted that, however, that the models developed to estimate the maximum allowable bias and imprecision based on reference (change) values, are neither theoretically nor practically based on any of the two models and can be applied to both.

Measurement uncertainty is inherently included into the measurement result, and as such expressed in reference (change) values and estimates of sensitivity and specificity and other measures of diagnostic accuracy. Quality control procedures aimed at controlling analytical variation based on internal quality control will not capture all sources of variation. Only procedures that include all sources of variation will enable us to get to grips with the real total MU when measuring patient samples [12].

Conclusions

The TE and ATE concepts originated by Jim Westgard and co-workers have served clinical chemistry well for decades and still represent a dominating influence on the theory and practice of clinical chemistry. The uncertainty paradigm is widely accepted in other fields of metrology but has suffered from complex mathematics and conceived impracticability. TE is not recognized either in the International Vocabulary of Metrology (4) nor in the GUM (5) which favors MU. The TE and MU paradigms may however appear as two sides of the same coin. Their equations describing variation, e.g. become the same when bias is eliminated and few independent causes of variation are at play. However, their fundamental philosophy is different and has different consequences on practical priorities in clinical chemistry.

Neither the separation proposed by Westgard nor replacing TAE with MU will solve the tensions between TAE and MU within clinical chemistry. In our opinion the TE/ATE models implemented in clinical chemistry can be substantially improved by implementing uncertainty component estimations and methods for uncertainty calculations already applied in other disciplines within metrology. Initial developments in this direction are currently being implemented in clinical chemistry [10, 11, 13].

The pros and cons of the TE and ATE concepts need to be debated, making way for methods that can incorporate all relevant causes of uncertainty when making medical diagnoses and monitoring treatment effects. The calculation of the uncertainty of our results represents an important new opportunity to improve the quality of the services of clinical chemistry to the diagnosis and monitoring the effects of treatment of patients. To be maximally productive this development should preferably proceed not as a revolution but as an evolution of the “total error” concept. However, such evolution depends on freedom from allegiance to all aspects of paradigms that have already served their purpose. This is a fitting lesson to be learnt from the edict of Milan 313 AD.

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

Research funding: None declared.

Employment or leadership: None declared.

Honorarium: None declared.

Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

Corresponding author: Wytze P. Oosterhuis, Department of Clinical Chemistry and Hematology, Zuyderland Medical Center, Henri Dunantstraat 5, 6419 PC Heerlen, The Netherlands, Phone: +31 45 5766341, E-mail: w.oosterhuis@atriummc.nl

References

1. Sandberg S, Fraser CG, Horvath AR, Jansen R, Jones G, Oosterhuis W, et al. Defining analytical performance specifications: Consensus Statement from the 1st Strategic Conference of the European Federation of Clinical Chemistry and Laboratory Medicine. Clin Chem Lab Med 2015;53:833–5.10.1515/cclm-2015-0067Search in Google Scholar PubMed

2. Westgard JO. Useful measures and models for analytical quality management in medical laboratories. Clin Chem Lab Med 2016;54:223–33.10.1515/cclm-2015-0710Search in Google Scholar PubMed

3. Peat FD. From certainty to uncertainty: the story of science and ideas in the twentieth century. Washington, DC: Joseph Henry Press; 2002:xiv, 230.Search in Google Scholar

4. JCGM. Evaluation of measurement data – guide to the expression of uncertainty in measurement. JCGM 100:2008, GUM 1995 with minor corrections: Joint Committee for Guides in Metrology; 2008. Available at: http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf. Accessed 24 October, 2015.Search in Google Scholar

5. JCGM. Evaluation of measurement data – Supplement 1 to the “Guide to the expression of uncertainty in measurement” – Propagation of distributions using a Monte Carlo method. Joint Committee for Guides in Metrology: 2008. Available at: http://www.bipm.org/en/publications/guides/. Accessed 24 October, 2015.Search in Google Scholar

6. Gowans EM, Hyltoft Peteresen P, Blaabjerg O, Hörder M. Analytical goals for the acceptance of common reference intervals for laboratories throughout a geographical area. Scand J Clin Lab Invest 1988;48:757–64.10.3109/00365518809088757Search in Google Scholar PubMed

7. Oosterhuis WP. Gross overestimation of total allowable error based on biological variation. Clin Chem 2011;57:1334–6.10.1373/clinchem.2011.165308Search in Google Scholar PubMed

8. Oosterhuis WP, Sandberg S. Proposal for the modification of the conventional model for establishing performance specifications. Clin Chem Lab Med 2015;53:925–37.Search in Google Scholar

9. Rozet E, Marini RD, Ziemons E, Dewé W, Rudaz S, Boulton AJ, et al. Total error and uncertainty: friends or foes? Trend Anal Chem 2011;30:797–806.10.1016/j.trac.2010.12.009Search in Google Scholar

10. Farrance I, Frenkel R. Uncertainty of measurement: a review of the rules for calculating uncertainty components through functional relationships. Clin Biochem Rev 2012;33:49–75.Search in Google Scholar

11. Farrance I, Frenkel R. Uncertainty in measurement: a review of monte carlo simulation using microsoft excel for the calculation of uncertainties through functional relationships, including uncertainties in empirically derived constants. Clin Biochem Rev 2014;35:37–61.Search in Google Scholar

12. O’Donnell GE, Hibbert DB. Treatment of bias in estimating measurement uncertainty. Analyst 2005;130:721–9.10.1039/b414843fSearch in Google Scholar PubMed

13. White GH, Farrance I, Group AUoMW. Uncertainty of measurement in quantitative medical testing: a laboratory implementation guide. Clin Biochem Rev 2004;25:S1–24.Search in Google Scholar

Received: 2015-10-8

Accepted: 2015-10-12

Published Online: 2015-11-5

Published in Print: 2016-2-1

Articles in the same Issue

https://doi.org/10.1515/cclm-2015-0997

Keywords for this article

allowable total error; analytical performance goals; error; total analytical error; uncertainty