Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm
-
Eirik Åsen Røys
, Kristin Viste
To the Editor,
Within-subject biological variation (CVI) refers to the natural fluctuations in an individual’s biological markers over time not accounted for by known physiological effects. Accurate estimation of CVI is pivotal for setting analytical performance specifications and identifying clinically significant changes in repeated laboratory tests. Recent research suggests that using data from laboratory information systems (LIS) offers a practical way to estimate the CVI [1], 2].
One indirect method for estimating CVI involves characterizing a central Gaussian peak from the frequency distribution of ‘result ratios’, the ratios of consecutive results from serial patient measurements [1], 2]. This total variation (CVTotal) includes the random biological (CVI) and relevant analytical variation (CVA) of the two measurements. A challenge when estimating CVI from LIS data is to distinguish this peak from pathological or other non-random changes [3]. A recent study showed that the refineR algorithm could determine the result ratio peak and further estimate reference change values (RCV) [4]. RefineR identifies a central ‘Box-Cox normal distribution’ within the data [5]. The Box-Cox transformation is robust and can make a range of either left and right skewed distributions resemble Gaussian distributions [5], which is valuable because ratios for some analytes have skewed distributions. In the current study, we applied the refineR algorithm to Gaussian and skewed distributions to derive indirect-CVI estimates and compare these to CVI values derived from a state-of-the-art biological variation (BV) study [1].
Several aspects must be considered when estimating the CVI from ratios [6]. First, consider the case where two independent measurements, X1 and X2, follow the same positive Gaussian distribution: X ∼ N (μx, σx), with mean μx and standard deviation σx, forming the ratio Z = X1/X2. The distribution of Z does not have a simple mathematical description and can be nearly Gaussian or markedly skewed [2], 4], 6], 7]. In the case of nearly Gaussian ratios, i.e. Z ∼ N (μz, σz), μz = 1 and σz = √2 × σx/μx are good approximations [1], 2]. Rearranging for σx gives σx = σz × μx/√2 and as CVx = σx/μx we get:
In the case of a skewed Z, the approximation is no longer valid, as the skew increases the mean and standard deviation of the ratios. To mitigate this, we can apply the Box-Cox transformation to make Z approximately Gaussian. From the standard deviation of the transformed ratios Z′, the CV% of X can be estimated using formula (1).
Secondly, consider the case where the two independent measurements, X1 and X2, are drawn from the same lognormal distribution ln(X) ∼ N (μx, σx). By the properties of logarithms, the ratio Z = X1/X2 corresponds to ln(Z) = ln(X1) – ln(X2), and ln(Z) will be normally distributed with mean, μz = μx – μx= 0, and sd, σz = √(σx2 + σx2) = √2σx. Rearranging for σx gives σx = σz/√2. σx can be used to calculate the CV% of a lognormal distribution (ln-CV%) [8]:
We used Monte Carlo-simulated data to validate the assumption that the standard deviation of a Box-Cox transformed ratio distribution could estimate the CV% and ln-CV%. Two Gaussian or two lognormal distributions were simulated using one million observations, applying refineR to estimate the standard deviation of a Box-Cox transformed ratio distribution. The Box-Cox(σ) was divided by √2 to estimate the CV% or ln-CV% as described above. We compared the input CV% and ln-CV% in the simulation with the estimates from refineR, see Table 1. These results showed that refineR could accurately estimate the CV from a ratio of two Gaussian distributions for CVs ≤15 % and with less accuracy up to CVs of ≤20 %, reflecting the error propagation in ratios for larger Gaussian CVs [9]. The ln-CV could be estimated accurately up to at least 50 %.
Monte Carlo simulation results comparing the input CV% values of Gaussian or lognormal distributions against the estimated CV% from their respective ratio distributions using refineR. Each simulated ratio distribution contained 1 million ratios, with results averaged across 100 iterations.
Ratio distribution | Input CV% | Estimated CV% |
---|---|---|
Ratio of Gaussian | 2.5 | 2.5 |
5.0 | 5.0 | |
7.5 | 7.5 | |
10.0 | 10.0 | |
12.5 | 12.4 | |
15.0 | 14.8 | |
17.5 | 17.1 | |
20.0 | 19.4 | |
Ratio of lognormal | 5.0 | 5.0 |
10.0 | 10.0 | |
15.0 | 15.0 | |
20.0 | 20.0 | |
25.0 | 25.0 | |
30.0 | 30.0 | |
35.0 | 35.0 | |
40.0 | 40.0 | |
45.0 | 45.0 | |
50.0 | 50.0 |
With this proof of concept validated, we applied our approach to result ratios from LIS data. Here, we used the same dataset as our previous study [4], applying refineR to ratios of repeated patient measurements to estimate the total variability of a single measurement (CVTotal) using either formula (1) or (2). Briefly, this dataset contained adult patient results (18–110 years) with biomarkers selected to represent a wide range of CVI levels (∼2–50 %): albumin, creatinine, phosphate, cortisone, cortisol, testosterone, androstenedione, 17-hydroxyprogesterone and 11-deoxycortisol extracted from local databases at Haukeland University Hospital from July 1, 2015, to July 1, 2023. The indirect-CVI was then estimated by subtracting long-term CVA from quality controls from the CVTotal as:
The results are shown in Table 2. The 95 % confidence interval for the Box-Cox(σ) and the indirect-CVI was derived using refineR’s bootstrapping capabilities (1,000 repetitions). As in our previous paper [4], we compared these results with the estimated CVI from a local state-of-the-art BV study combining CV-ANOVA with outlier exclusion [1]. This study included 10 weekly measurements from 30 healthy subjects for the same analytes and instrumentation as for the LIS data.
Indirect CVI estimates obtained from applying the refineR algorithm on a result ratio distribution of data from the laboratory information system and from CVI estimates from a biological variation study [1] using nested-ANOVA.
Measurand | Subgroup | Number of ratios | Long-term CVA | Box-Cox(σ) [95 % CI] | Indirect CVI [95 % CI] | CVI [95 % CI] |
---|---|---|---|---|---|---|
Albumin | All | 949,295 | 1.8 | 0.052 [0.052, 0.053] |
3.2 [3.2, 3.3] |
2.3 [2.1, 2.6] |
Creatinine | All | 400,152 | 3.0 | 0.076 [0.074, 0.077] |
4.4 [4.3, 4.6] |
4.4 [4.0, 4.8] |
Phosphate | All | 25,901 | 1.7 | 0.141 [0.121, 0.148] |
9.8 [8.4, 10.3] |
9.5 [8.7, 10.4] |
Phosphate | Female | 15,426 | 1.7 | 0.132 [0.124, 0.137] |
9.2 [8.6, 9.6] |
8.2 [7.6, 9.8] |
Phosphate | Male | 10,475 | 1.7 | 0.143 [0.125, 0.158] |
10.0 [8.6, 11.0] |
9.4 [8.3, 11.0] |
Cortisone | Male | 6,882 | 7.0 | 0.202 [0.191, 0.209] |
12.5 [11.6, 13.0] |
12.6 [11.0, 15.0] |
Cortisol | Male | 4,983 | 4.4 | 0.271 [0.247, 0.310] |
18.6 [16.9, 21.4] |
15.1 [12.5, 18.6] |
Testosterone | Male | 1,261 | 4.0 | 0.244 [0.178, 0.269] |
16.9a | 15.3a [13.3, 17.6] |
[12.0, 18.8] | ||||||
Androstenedione | Male | 6,023 | 5.3 | 0.334 [0.295, 0.357] |
23.0a | 19.2a [17.1, 22.6] |
[20.2, 24.7] | ||||||
17-Hydroxyprogesteron | Male | 6,839 | 4.8 | 0.351 [0.304, 0.378] |
24.7a [21.2, 26.7] |
23.0a [20.1, 27.0] |
11-Deoxycortisol | Male | 6,469 | 5.3 | 0.584 [0.546, 0.620] |
42.6a [39.6, 45.6] |
50.0a [44.9, 61.1] |
-
aCalculated as ln-CV.
Although the indirect-CVI estimates were generally higher than those derived from the BV study, they were comparable, with overlapping confidence intervals (Table 2). The differences between the two estimates are likely due to larger preanalytical variability in the LIS data, as these were produced under daily routine conditions, while the BV study was performed in accordance with recommendations from BIVAC [10], minimizing preanalytical variability. Cortisol was the analyte with the greatest relative difference between the estimates, and this probably relates to its pronounced diurnal variation and the effect of even small changes in sample collection timing.
One of the main benefits of the refineR algorithm is that it can effectively separate healthy and pathological ratios [4]. A strength of this study is that the indirect-CVI ratio approach was validated by Monte Carlo simulation and by comparing it with a local BV study. However, the approach has limitations that may affect the accuracy of the CVI estimates: As previously shown, some heavily skewed ratio distributions cannot be Box-Cox transformed to Gaussian [4]. The ratios also do not reveal the underlying distribution’s shape, as Gaussian and lognormal distributions can produce similarly distributed ratios. The errors introduced by choosing a non-applicable formula are minimal for small CVIs (<20 %) but will increase if the CVI is large.
A final limitation is that using non-standardized LIS data can produce inaccurate CVI estimates. Care must be taken to minimize known effects such as time of day, effect of dietary intake, and physiological events such as pregnancy or the menstrual cycle. The lack of standardized sampling procedures associated with LIS data may bias the CVI estimates compared with a direct sampling approach, where the preanalytical factors are controlled. Based on the indirect-RCV study [4], we recommend ≥5,000 ratios and a pathological fraction of <30 % for robust indirect-CVI estimates. Datasets commonly used for indirect reference intervals such as outpatient data, wellness checkups, repeated testing in blood donors or unrequested results should be considered.
Despite its limitations, the refineR algorithm provides a valuable, cost-effective tool for laboratories to estimate CVI using readily available data, especially for subgroups with limited direct studies, such as children. To make the refineR approach available to laboratories, we have developed an application at (https://gocrunch.shinyapps.io/CVi_app/) that can estimate the indirect-CVI, RCV, and reference intervals from LIS datasets.
Funding source: The Western Norway Regional Health Authority
Award Identifier / Grant number: F-12821-D10484/ 4800007771
Funding source: British Heart Foundation
Award Identifier / Grant number: FS/18/78/33902
Funding source: St Thomas’ Charity, London, UK
Award Identifier / Grant number: R060701
Award Identifier / Grant number: R100404
Funding source: County Council of Ostergotland
Funding source: Haukeland University Hospital
Funding source: University of Oslo
-
Research ethics: We conducted the study according to the Declaration of Helsinki Ethical Principles and Good Clinical Practice. The extraction of patient results from the Laboratory Information System (LIS) without patient consent was approved by the Regional Committee for Medical and Health Research Ethics in Bergen (ID number 252125). The biological variation study was approved by the respective regional ethics committees at the inclusion sites, the Regional Committees for Medical and Health Research Ethics in Bergen and Oslo (ID number 2018/92), and the South-Central Berkshire Research Ethics Committee (London).
-
Informed consent: All included volunteers in the biological variation study gave informed written consent before participating.
-
Author contributions: Eirik Åsen Røys: led conceptualization, data analysis, investigation, methodology, validation, and the original draft preparation. Kristin Viste supported conceptualization, data curation, formal analysis, funding acquisition and methodology. She led supervision and contributed equally to the review and editing of the manuscript. Christopher-John Farrell contributed to data analysis, resources, and software, with additional support in validation and manuscript review and editing. Ralf Kellmann played an equal role in data curation, resources, and software, and supported the review and editing of the manuscript. Nora Alicia Guldhaug contributed equally to data curation and participated equally in the review and editing process. Elvar Theodorsson contributed equally to conceptualization, methodology, supervision, validation, and manuscript review and editing. Graham Ross Dallas Jones shared equal contributions in conceptualization, methodology, supervision, validation, and manuscript review and editing. Kristin Moberg Aakre contributed equally to conceptualization, methodology, supervision, and validation and led project administration. She also participated equally in the review and editing of the manuscript.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: E. Theodorsson received a consulting fee from NordicBiomarker. K.M. Aakre has served on advisory boards for Roche Diagnostics, Siemens Healthineers, Radiometer and SpinChip; received consulting honoraria from CardiNor, lecture honorarium from Mindray, Siemens Healthineers, and Snibe Diagnostics, and research grants from Siemens Healthineers and Roche Diagnostics; is Associate Editor of Clinical Biochemistry and Chair of the IFCC Committee of Clinical Application of Cardiac Biomarkers.
-
Research funding: The study was funded by research grants from Haukeland University Hospital, the University of Oslo, the British Heart Foundation (FS/18/78/33902) and Guy’s and St Thomas’ Charity (London, UK; R060701, R100404). This work was funded by The Western Norway Regional Health Authority through providing PhD scholarship for E.Å. Røys (Grant number: F-12821-D10484/ 4800007771). E. Theodorsson received support from the County Council of Ostergotland.
-
Data availability: The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
1. Røys, EÅ, Guldhaug, NA, Viste, K, Jones, GD, Alaour, B, Sylte, MS, et al.. Sex hormones and adrenal steroids: biological variation estimated using direct and indirect methods. Clin Chem 2022;69:100–9. https://doi.org/10.1093/clinchem/hvac175.Search in Google Scholar PubMed
2. Jones, GRD. Estimates of within-subject biological variation derived from pathology databases: an approach to allow assessment of the effects of age, sex, time between sample collections, and analyte concentration on reference change values. Clin Chem 2019;65:579–88. https://doi.org/10.1373/clinchem.2018.290841.Search in Google Scholar PubMed
3. Tan, RZ, Markus, C, Vasikaran, S, Loh, TP, APFCB Harmonization of Reference Intervals Working Group. Comparison of four indirect (data mining) approaches to derive within-subject biological variation. CCLM 2022;60:636–44. https://doi.org/10.1515/cclm-2021-0442.Search in Google Scholar PubMed
4. Røys, EÅ, Viste, K, Kellmann, R, Guldhaug, NA, Alaour, B, Sylte, MS, et al.. Estimating reference change values using routine patient data: a novel pathology database approach. Clin Chem 2024:hvae166. https://doi.org/10.1093/clinchem/hvae166.Search in Google Scholar PubMed
5. Ammer, T, Schützenmeister, A, Prokosch, HU, Rauh, M, Rank, CM, Zierk, J. refineR: a novel algorithm for reference interval estimation from real-world data. Sci Rep 2021;11:16023. https://doi.org/10.1038/s41598-021-95301-2.Search in Google Scholar PubMed PubMed Central
6. Springer, MD. The algebra of random variables. Wiley 1979;22:522.10.1137/1022108Search in Google Scholar
7. Díaz-Francés, E, Rubio, F. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat Pap 2013;54. https://doi.org/10.1007/s00362-012-0429-2.Search in Google Scholar
8. Fokkema, MR, Herrmann, Z, Muskiet, FA, Moecks, J. Reference change values for brain natriuretic peptides revisited. Clin Chem 2006;52:1602–3. https://doi.org/10.1373/clinchem.2006.069369.Search in Google Scholar PubMed
9. Holmes, DT, Buhr, KA. Error propagation in calculated ratios. Clin Biochem 2007;40:728–34. https://doi.org/10.1016/j.clinbiochem.2006.12.014.Search in Google Scholar PubMed
10. Aarsand, AK, Røraas, T, Fernandez-Calle, P, Ricos, C, Díaz-Garzón, J, Jonker, N, et al.. The biological variation data critical appraisal checklist: a standard for evaluating studies on biological variation. Clin Chem 2018;64:501–14. https://doi.org/10.1373/clinchem.2017.281808.Search in Google Scholar PubMed
© 2024 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Editorials
- The Friedewald formula strikes back
- Liquid biopsy in oncology: navigating technical hurdles and future transition for precision medicine
- The neglected issue of pyridoxal- 5′ phosphate
- Reviews
- Health literacy: a new challenge for laboratory medicine
- Clinical applications of circulating tumor cell detection: challenges and strategies
- Opinion Papers
- Pleural effusion as a sample matrix for laboratory analyses in cancer management: a perspective
- Interest of hair tests to discriminate a tail end of a doping regimen from a possible unpredictable source of a prohibited substance in case of challenging an anti-doping rule violation
- Perspectives
- Sigma Metrics misconceptions and limitations
- EN ISO 15189 revision: EFLM Committee Accreditation and ISO/CEN standards (C: A/ISO) analysis and general remarks on the changes
- General Clinical Chemistry and Laboratory Medicine
- Evaluation of current indirect methods for measuring LDL-cholesterol
- Verification of automated review, release and reporting of results with assessment of the risk of harm for patients: the procedure algorithm proposal for clinical laboratories
- Progranulin measurement with a new automated method: a step forward in the diagnostic approach to neurodegenerative disorders
- A comparative analysis of current С-peptide assays compared to a reference method: can we overcome inertia to standardization?
- Blood samples for ammonia analysis do not require transport to the laboratory on ice: a study of ammonia stability and cause of in vitro ammonia increase in samples from patients with hyperammonaemia
- A physio-chemical mathematical model of the effects of blood analysis delay on acid-base, metabolite and electrolyte status: evaluation in blood from critical care patients
- Evolution of autoimmune diagnostics over the past 10 years: lessons learned from the UK NEQAS external quality assessment EQA programs
- Comparison between monotest and traditional batch-based ELISA assays for therapeutic drug monitoring of infliximab and adalimumab levels and anti-drug antibodies
- Evaluation of pre-analytical factors impacting urine test strip and chemistry results
- Evaluation of AUTION EYE AI-4510 flow cell morphology analyzer for counting particles in urine
- Reference Values and Biological Variations
- Estimation of the allowable total error of the absolute CD34+ cell count by flow cytometry using data from UK NEQAS exercises 2004–2024
- Establishment of gender– and age–related reference intervals for serum uric acid in adults based on big data from Zhejiang Province in China
- Cancer Diagnostics
- Tumor specific protein 70 targeted tumor cell isolation technology can improve the accuracy of cytopathological examination
- Cardiovascular Diseases
- Diagnostic performance of Mindray CL1200i high sensitivity cardiac troponin I assay compared to Abbott Alinity cardiac troponin I assay for the diagnosis of type 1 and 2 acute myocardial infarction in females and males: MERITnI study
- Infectious Diseases
- Evidence-based assessment of the application of Six Sigma to infectious disease serology quality control
- Letters to the Editor
- Evaluating the accuracy of ChatGPT in classifying normal and abnormal blood cell morphology
- Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm
- Early rule-out high-sensitivity troponin protocols require continuous analytical robustness: a caution regarding the potential for troponin assay down-calibration
- Biochemical evidence of vitamin B12 deficiency: a crucial issue to address supplementation in pregnant women
- Plasmacytoid dendritic cell proliferation and acute myeloid leukemia with minimal differentiation (AML-M0)
- Failing methemoglobin blood gas analyses in a sodium nitrite intoxication
Articles in the same Issue
- Frontmatter
- Editorials
- The Friedewald formula strikes back
- Liquid biopsy in oncology: navigating technical hurdles and future transition for precision medicine
- The neglected issue of pyridoxal- 5′ phosphate
- Reviews
- Health literacy: a new challenge for laboratory medicine
- Clinical applications of circulating tumor cell detection: challenges and strategies
- Opinion Papers
- Pleural effusion as a sample matrix for laboratory analyses in cancer management: a perspective
- Interest of hair tests to discriminate a tail end of a doping regimen from a possible unpredictable source of a prohibited substance in case of challenging an anti-doping rule violation
- Perspectives
- Sigma Metrics misconceptions and limitations
- EN ISO 15189 revision: EFLM Committee Accreditation and ISO/CEN standards (C: A/ISO) analysis and general remarks on the changes
- General Clinical Chemistry and Laboratory Medicine
- Evaluation of current indirect methods for measuring LDL-cholesterol
- Verification of automated review, release and reporting of results with assessment of the risk of harm for patients: the procedure algorithm proposal for clinical laboratories
- Progranulin measurement with a new automated method: a step forward in the diagnostic approach to neurodegenerative disorders
- A comparative analysis of current С-peptide assays compared to a reference method: can we overcome inertia to standardization?
- Blood samples for ammonia analysis do not require transport to the laboratory on ice: a study of ammonia stability and cause of in vitro ammonia increase in samples from patients with hyperammonaemia
- A physio-chemical mathematical model of the effects of blood analysis delay on acid-base, metabolite and electrolyte status: evaluation in blood from critical care patients
- Evolution of autoimmune diagnostics over the past 10 years: lessons learned from the UK NEQAS external quality assessment EQA programs
- Comparison between monotest and traditional batch-based ELISA assays for therapeutic drug monitoring of infliximab and adalimumab levels and anti-drug antibodies
- Evaluation of pre-analytical factors impacting urine test strip and chemistry results
- Evaluation of AUTION EYE AI-4510 flow cell morphology analyzer for counting particles in urine
- Reference Values and Biological Variations
- Estimation of the allowable total error of the absolute CD34+ cell count by flow cytometry using data from UK NEQAS exercises 2004–2024
- Establishment of gender– and age–related reference intervals for serum uric acid in adults based on big data from Zhejiang Province in China
- Cancer Diagnostics
- Tumor specific protein 70 targeted tumor cell isolation technology can improve the accuracy of cytopathological examination
- Cardiovascular Diseases
- Diagnostic performance of Mindray CL1200i high sensitivity cardiac troponin I assay compared to Abbott Alinity cardiac troponin I assay for the diagnosis of type 1 and 2 acute myocardial infarction in females and males: MERITnI study
- Infectious Diseases
- Evidence-based assessment of the application of Six Sigma to infectious disease serology quality control
- Letters to the Editor
- Evaluating the accuracy of ChatGPT in classifying normal and abnormal blood cell morphology
- Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm
- Early rule-out high-sensitivity troponin protocols require continuous analytical robustness: a caution regarding the potential for troponin assay down-calibration
- Biochemical evidence of vitamin B12 deficiency: a crucial issue to address supplementation in pregnant women
- Plasmacytoid dendritic cell proliferation and acute myeloid leukemia with minimal differentiation (AML-M0)
- Failing methemoglobin blood gas analyses in a sodium nitrite intoxication