Home Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm
Article Open Access

Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm

  • Eirik Åsen Røys ORCID logo , Kristin Viste , Christopher-John Farrell , Ralf Kellmann , Nora Alicia Guldhaug , Elvar Theodorsson , Graham Ross Dallas Jones and Kristin Moberg Aakre ORCID logo EMAIL logo
Published/Copyright: December 30, 2024

To the Editor,

Within-subject biological variation (CVI) refers to the natural fluctuations in an individual’s biological markers over time not accounted for by known physiological effects. Accurate estimation of CVI is pivotal for setting analytical performance specifications and identifying clinically significant changes in repeated laboratory tests. Recent research suggests that using data from laboratory information systems (LIS) offers a practical way to estimate the CVI [1], 2].

One indirect method for estimating CVI involves characterizing a central Gaussian peak from the frequency distribution of ‘result ratios’, the ratios of consecutive results from serial patient measurements [1], 2]. This total variation (CVTotal) includes the random biological (CVI) and relevant analytical variation (CVA) of the two measurements. A challenge when estimating CVI from LIS data is to distinguish this peak from pathological or other non-random changes [3]. A recent study showed that the refineR algorithm could determine the result ratio peak and further estimate reference change values (RCV) [4]. RefineR identifies a central ‘Box-Cox normal distribution’ within the data [5]. The Box-Cox transformation is robust and can make a range of either left and right skewed distributions resemble Gaussian distributions [5], which is valuable because ratios for some analytes have skewed distributions. In the current study, we applied the refineR algorithm to Gaussian and skewed distributions to derive indirect-CVI estimates and compare these to CVI values derived from a state-of-the-art biological variation (BV) study [1].

Several aspects must be considered when estimating the CVI from ratios [6]. First, consider the case where two independent measurements, X1 and X2, follow the same positive Gaussian distribution: X ∼ N (μx, σx), with mean μx and standard deviation σx, forming the ratio Z = X1/X2. The distribution of Z does not have a simple mathematical description and can be nearly Gaussian or markedly skewed [2], 4], 6], 7]. In the case of nearly Gaussian ratios, i.e. Z ∼ N (μz, σz), μz = 1 and σz = √2 × σxx are good approximations [1], 2]. Rearranging for σx gives σx = σz × μx/√2 and as CVx = σxx we get:

(1) CV x % = σ z 2 × 100 %

In the case of a skewed Z, the approximation is no longer valid, as the skew increases the mean and standard deviation of the ratios. To mitigate this, we can apply the Box-Cox transformation to make Z approximately Gaussian. From the standard deviation of the transformed ratios Z′, the CV% of X can be estimated using formula (1).

Secondly, consider the case where the two independent measurements, X1 and X2, are drawn from the same lognormal distribution ln(X) ∼ N (μx, σx). By the properties of logarithms, the ratio Z = X1/X2 corresponds to ln(Z) = ln(X1) – ln(X2), and ln(Z) will be normally distributed with mean, μz = μx – μx= 0, and sd, σz = √(σx2 + σx2) = √2σx. Rearranging for σx gives σx = σz/√2. σx can be used to calculate the CV% of a lognormal distribution (ln-CV%) [8]:

(2) lnCV % = exp ( σ x 2 ) 1 × 100 %

We used Monte Carlo-simulated data to validate the assumption that the standard deviation of a Box-Cox transformed ratio distribution could estimate the CV% and ln-CV%. Two Gaussian or two lognormal distributions were simulated using one million observations, applying refineR to estimate the standard deviation of a Box-Cox transformed ratio distribution. The Box-Cox(σ) was divided by √2 to estimate the CV% or ln-CV% as described above. We compared the input CV% and ln-CV% in the simulation with the estimates from refineR, see Table 1. These results showed that refineR could accurately estimate the CV from a ratio of two Gaussian distributions for CVs ≤15 % and with less accuracy up to CVs of ≤20 %, reflecting the error propagation in ratios for larger Gaussian CVs [9]. The ln-CV could be estimated accurately up to at least 50 %.

Table 1:

Monte Carlo simulation results comparing the input CV% values of Gaussian or lognormal distributions against the estimated CV% from their respective ratio distributions using refineR. Each simulated ratio distribution contained 1 million ratios, with results averaged across 100 iterations.

Ratio distribution Input CV% Estimated CV%
Ratio of Gaussian 2.5 2.5
5.0 5.0
7.5 7.5
10.0 10.0
12.5 12.4
15.0 14.8
17.5 17.1
20.0 19.4
Ratio of lognormal 5.0 5.0
10.0 10.0
15.0 15.0
20.0 20.0
25.0 25.0
30.0 30.0
35.0 35.0
40.0 40.0
45.0 45.0
50.0 50.0

With this proof of concept validated, we applied our approach to result ratios from LIS data. Here, we used the same dataset as our previous study [4], applying refineR to ratios of repeated patient measurements to estimate the total variability of a single measurement (CVTotal) using either formula (1) or (2). Briefly, this dataset contained adult patient results (18–110 years) with biomarkers selected to represent a wide range of CVI levels (∼2–50 %): albumin, creatinine, phosphate, cortisone, cortisol, testosterone, androstenedione, 17-hydroxyprogesterone and 11-deoxycortisol extracted from local databases at Haukeland University Hospital from July 1, 2015, to July 1, 2023. The indirect-CVI was then estimated by subtracting long-term CVA from quality controls from the CVTotal as:

(3) Indirect CV I = CV Total 2 CV A 2

The results are shown in Table 2. The 95 % confidence interval for the Box-Cox(σ) and the indirect-CVI was derived using refineR’s bootstrapping capabilities (1,000 repetitions). As in our previous paper [4], we compared these results with the estimated CVI from a local state-of-the-art BV study combining CV-ANOVA with outlier exclusion [1]. This study included 10 weekly measurements from 30 healthy subjects for the same analytes and instrumentation as for the LIS data.

Table 2:

Indirect CVI estimates obtained from applying the refineR algorithm on a result ratio distribution of data from the laboratory information system and from CVI estimates from a biological variation study [1] using nested-ANOVA.

Measurand Subgroup Number of ratios Long-term CVA Box-Cox(σ) [95 % CI] Indirect CVI [95 % CI] CVI [95 % CI]
Albumin All 949,295 1.8 0.052

[0.052, 0.053]
3.2

[3.2, 3.3]
2.3

[2.1, 2.6]
Creatinine All 400,152 3.0 0.076

[0.074, 0.077]
4.4

[4.3, 4.6]
4.4

[4.0, 4.8]
Phosphate All 25,901 1.7 0.141

[0.121, 0.148]
9.8

[8.4, 10.3]
9.5

[8.7, 10.4]
Phosphate Female 15,426 1.7 0.132

[0.124, 0.137]
9.2

[8.6, 9.6]
8.2

[7.6, 9.8]
Phosphate Male 10,475 1.7 0.143

[0.125, 0.158]
10.0

[8.6, 11.0]
9.4

[8.3, 11.0]
Cortisone Male 6,882 7.0 0.202

[0.191, 0.209]
12.5

[11.6, 13.0]
12.6

[11.0, 15.0]
Cortisol Male 4,983 4.4 0.271

[0.247, 0.310]
18.6

[16.9, 21.4]
15.1

[12.5, 18.6]
Testosterone Male 1,261 4.0 0.244

[0.178, 0.269]
16.9a 15.3a

[13.3, 17.6]
[12.0, 18.8]
Androstenedione Male 6,023 5.3 0.334

[0.295, 0.357]
23.0a 19.2a

[17.1, 22.6]
[20.2, 24.7]
17-Hydroxyprogesteron Male 6,839 4.8 0.351

[0.304, 0.378]
24.7a

[21.2, 26.7]
23.0a

[20.1, 27.0]
11-Deoxycortisol Male 6,469 5.3 0.584

[0.546, 0.620]
42.6a

[39.6, 45.6]
50.0a

[44.9, 61.1]
  1. aCalculated as ln-CV.

Although the indirect-CVI estimates were generally higher than those derived from the BV study, they were comparable, with overlapping confidence intervals (Table 2). The differences between the two estimates are likely due to larger preanalytical variability in the LIS data, as these were produced under daily routine conditions, while the BV study was performed in accordance with recommendations from BIVAC [10], minimizing preanalytical variability. Cortisol was the analyte with the greatest relative difference between the estimates, and this probably relates to its pronounced diurnal variation and the effect of even small changes in sample collection timing.

One of the main benefits of the refineR algorithm is that it can effectively separate healthy and pathological ratios [4]. A strength of this study is that the indirect-CVI ratio approach was validated by Monte Carlo simulation and by comparing it with a local BV study. However, the approach has limitations that may affect the accuracy of the CVI estimates: As previously shown, some heavily skewed ratio distributions cannot be Box-Cox transformed to Gaussian [4]. The ratios also do not reveal the underlying distribution’s shape, as Gaussian and lognormal distributions can produce similarly distributed ratios. The errors introduced by choosing a non-applicable formula are minimal for small CVIs (<20 %) but will increase if the CVI is large.

A final limitation is that using non-standardized LIS data can produce inaccurate CVI estimates. Care must be taken to minimize known effects such as time of day, effect of dietary intake, and physiological events such as pregnancy or the menstrual cycle. The lack of standardized sampling procedures associated with LIS data may bias the CVI estimates compared with a direct sampling approach, where the preanalytical factors are controlled. Based on the indirect-RCV study [4], we recommend ≥5,000 ratios and a pathological fraction of <30 % for robust indirect-CVI estimates. Datasets commonly used for indirect reference intervals such as outpatient data, wellness checkups, repeated testing in blood donors or unrequested results should be considered.

Despite its limitations, the refineR algorithm provides a valuable, cost-effective tool for laboratories to estimate CVI using readily available data, especially for subgroups with limited direct studies, such as children. To make the refineR approach available to laboratories, we have developed an application at (https://gocrunch.shinyapps.io/CVi_app/) that can estimate the indirect-CVI, RCV, and reference intervals from LIS datasets.


Corresponding author: Kristin Moberg Aakre, Hormone Laboratory, Department of Medical Biochemistry and Pharmacology, Haukeland University Hospital, 5021 Bergen, Norway; Department of Clinical Science, University of Bergen, Bergen, Norway; and Department of Heart Disease, Haukeland University Hospital, Bergen, Norway, E-mail:

Funding source: The Western Norway Regional Health Authority

Award Identifier / Grant number: F-12821-D10484/ 4800007771

Funding source: British Heart Foundation

Award Identifier / Grant number: FS/18/78/33902

Funding source: St Thomas’ Charity, London, UK

Award Identifier / Grant number: R060701

Award Identifier / Grant number: R100404

Funding source: County Council of Ostergotland

Funding source: Haukeland University Hospital

Funding source: University of Oslo

  1. Research ethics: We conducted the study according to the Declaration of Helsinki Ethical Principles and Good Clinical Practice. The extraction of patient results from the Laboratory Information System (LIS) without patient consent was approved by the Regional Committee for Medical and Health Research Ethics in Bergen (ID number 252125). The biological variation study was approved by the respective regional ethics committees at the inclusion sites, the Regional Committees for Medical and Health Research Ethics in Bergen and Oslo (ID number 2018/92), and the South-Central Berkshire Research Ethics Committee (London).

  2. Informed consent: All included volunteers in the biological variation study gave informed written consent before participating.

  3. Author contributions: Eirik Åsen Røys: led conceptualization, data analysis, investigation, methodology, validation, and the original draft preparation. Kristin Viste supported conceptualization, data curation, formal analysis, funding acquisition and methodology. She led supervision and contributed equally to the review and editing of the manuscript. Christopher-John Farrell contributed to data analysis, resources, and software, with additional support in validation and manuscript review and editing. Ralf Kellmann played an equal role in data curation, resources, and software, and supported the review and editing of the manuscript. Nora Alicia Guldhaug contributed equally to data curation and participated equally in the review and editing process. Elvar Theodorsson contributed equally to conceptualization, methodology, supervision, validation, and manuscript review and editing. Graham Ross Dallas Jones shared equal contributions in conceptualization, methodology, supervision, validation, and manuscript review and editing. Kristin Moberg Aakre contributed equally to conceptualization, methodology, supervision, and validation and led project administration. She also participated equally in the review and editing of the manuscript.

  4. Use of Large Language Models, AI and Machine Learning Tools: None declared.

  5. Conflict of interest: E. Theodorsson received a consulting fee from NordicBiomarker. K.M. Aakre has served on advisory boards for Roche Diagnostics, Siemens Healthineers, Radiometer and SpinChip; received consulting honoraria from CardiNor, lecture honorarium from Mindray, Siemens Healthineers, and Snibe Diagnostics, and research grants from Siemens Healthineers and Roche Diagnostics; is Associate Editor of Clinical Biochemistry and Chair of the IFCC Committee of Clinical Application of Cardiac Biomarkers.

  6. Research funding: The study was funded by research grants from Haukeland University Hospital, the University of Oslo, the British Heart Foundation (FS/18/78/33902) and Guy’s and St Thomas’ Charity (London, UK; R060701, R100404). This work was funded by The Western Norway Regional Health Authority through providing PhD scholarship for E.Å. Røys (Grant number: F-12821-D10484/ 4800007771). E. Theodorsson received support from the County Council of Ostergotland.

  7. Data availability: The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

1. Røys, EÅ, Guldhaug, NA, Viste, K, Jones, GD, Alaour, B, Sylte, MS, et al.. Sex hormones and adrenal steroids: biological variation estimated using direct and indirect methods. Clin Chem 2022;69:100–9. https://doi.org/10.1093/clinchem/hvac175.Search in Google Scholar PubMed

2. Jones, GRD. Estimates of within-subject biological variation derived from pathology databases: an approach to allow assessment of the effects of age, sex, time between sample collections, and analyte concentration on reference change values. Clin Chem 2019;65:579–88. https://doi.org/10.1373/clinchem.2018.290841.Search in Google Scholar PubMed

3. Tan, RZ, Markus, C, Vasikaran, S, Loh, TP, APFCB Harmonization of Reference Intervals Working Group. Comparison of four indirect (data mining) approaches to derive within-subject biological variation. CCLM 2022;60:636–44. https://doi.org/10.1515/cclm-2021-0442.Search in Google Scholar PubMed

4. Røys, EÅ, Viste, K, Kellmann, R, Guldhaug, NA, Alaour, B, Sylte, MS, et al.. Estimating reference change values using routine patient data: a novel pathology database approach. Clin Chem 2024:hvae166. https://doi.org/10.1093/clinchem/hvae166.Search in Google Scholar PubMed

5. Ammer, T, Schützenmeister, A, Prokosch, HU, Rauh, M, Rank, CM, Zierk, J. refineR: a novel algorithm for reference interval estimation from real-world data. Sci Rep 2021;11:16023. https://doi.org/10.1038/s41598-021-95301-2.Search in Google Scholar PubMed PubMed Central

6. Springer, MD. The algebra of random variables. Wiley 1979;22:522.10.1137/1022108Search in Google Scholar

7. Díaz-Francés, E, Rubio, F. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat Pap 2013;54. https://doi.org/10.1007/s00362-012-0429-2.Search in Google Scholar

8. Fokkema, MR, Herrmann, Z, Muskiet, FA, Moecks, J. Reference change values for brain natriuretic peptides revisited. Clin Chem 2006;52:1602–3. https://doi.org/10.1373/clinchem.2006.069369.Search in Google Scholar PubMed

9. Holmes, DT, Buhr, KA. Error propagation in calculated ratios. Clin Biochem 2007;40:728–34. https://doi.org/10.1016/j.clinbiochem.2006.12.014.Search in Google Scholar PubMed

10. Aarsand, AK, Røraas, T, Fernandez-Calle, P, Ricos, C, Díaz-Garzón, J, Jonker, N, et al.. The biological variation data critical appraisal checklist: a standard for evaluating studies on biological variation. Clin Chem 2018;64:501–14. https://doi.org/10.1373/clinchem.2017.281808.Search in Google Scholar PubMed

Received: 2024-11-26
Accepted: 2024-12-19
Published Online: 2024-12-30
Published in Print: 2025-05-26

© 2024 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Frontmatter
  2. Editorials
  3. The Friedewald formula strikes back
  4. Liquid biopsy in oncology: navigating technical hurdles and future transition for precision medicine
  5. The neglected issue of pyridoxal- 5′ phosphate
  6. Reviews
  7. Health literacy: a new challenge for laboratory medicine
  8. Clinical applications of circulating tumor cell detection: challenges and strategies
  9. Opinion Papers
  10. Pleural effusion as a sample matrix for laboratory analyses in cancer management: a perspective
  11. Interest of hair tests to discriminate a tail end of a doping regimen from a possible unpredictable source of a prohibited substance in case of challenging an anti-doping rule violation
  12. Perspectives
  13. Sigma Metrics misconceptions and limitations
  14. EN ISO 15189 revision: EFLM Committee Accreditation and ISO/CEN standards (C: A/ISO) analysis and general remarks on the changes
  15. General Clinical Chemistry and Laboratory Medicine
  16. Evaluation of current indirect methods for measuring LDL-cholesterol
  17. Verification of automated review, release and reporting of results with assessment of the risk of harm for patients: the procedure algorithm proposal for clinical laboratories
  18. Progranulin measurement with a new automated method: a step forward in the diagnostic approach to neurodegenerative disorders
  19. A comparative analysis of current С-peptide assays compared to a reference method: can we overcome inertia to standardization?
  20. Blood samples for ammonia analysis do not require transport to the laboratory on ice: a study of ammonia stability and cause of in vitro ammonia increase in samples from patients with hyperammonaemia
  21. A physio-chemical mathematical model of the effects of blood analysis delay on acid-base, metabolite and electrolyte status: evaluation in blood from critical care patients
  22. Evolution of autoimmune diagnostics over the past 10 years: lessons learned from the UK NEQAS external quality assessment EQA programs
  23. Comparison between monotest and traditional batch-based ELISA assays for therapeutic drug monitoring of infliximab and adalimumab levels and anti-drug antibodies
  24. Evaluation of pre-analytical factors impacting urine test strip and chemistry results
  25. Evaluation of AUTION EYE AI-4510 flow cell morphology analyzer for counting particles in urine
  26. Reference Values and Biological Variations
  27. Estimation of the allowable total error of the absolute CD34+ cell count by flow cytometry using data from UK NEQAS exercises 2004–2024
  28. Establishment of gender– and age–related reference intervals for serum uric acid in adults based on big data from Zhejiang Province in China
  29. Cancer Diagnostics
  30. Tumor specific protein 70 targeted tumor cell isolation technology can improve the accuracy of cytopathological examination
  31. Cardiovascular Diseases
  32. Diagnostic performance of Mindray CL1200i high sensitivity cardiac troponin I assay compared to Abbott Alinity cardiac troponin I assay for the diagnosis of type 1 and 2 acute myocardial infarction in females and males: MERITnI study
  33. Infectious Diseases
  34. Evidence-based assessment of the application of Six Sigma to infectious disease serology quality control
  35. Letters to the Editor
  36. Evaluating the accuracy of ChatGPT in classifying normal and abnormal blood cell morphology
  37. Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm
  38. Early rule-out high-sensitivity troponin protocols require continuous analytical robustness: a caution regarding the potential for troponin assay down-calibration
  39. Biochemical evidence of vitamin B12 deficiency: a crucial issue to address supplementation in pregnant women
  40. Plasmacytoid dendritic cell proliferation and acute myeloid leukemia with minimal differentiation (AML-M0)
  41. Failing methemoglobin blood gas analyses in a sodium nitrite intoxication
Downloaded on 7.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/cclm-2024-1386/html
Scroll to top button