Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm

Eirik Åsen Røys; Kristin Viste; Christopher-John Farrell; Ralf Kellmann; Nora Alicia Guldhaug; Elvar Theodorsson; Graham Ross Dallas Jones; Kristin Moberg Aakre

doi:10.1515/cclm-2024-1386

Article Open Access

Refining within-subject biological variation estimation using routine laboratory data: practical applications of the refineR algorithm

Eirik Åsen Røys , Kristin Viste , Christopher-John Farrell , Ralf Kellmann , Nora Alicia Guldhaug , Elvar Theodorsson , Graham Ross Dallas Jones and Kristin Moberg Aakre

Published/Copyright: December 30, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Clinical Chemistry and Laboratory Medicine (CCLM) Volume 63 Issue 6

Keywords: indirect methods; laboratory information systems; biological variation

To the Editor,

Within-subject biological variation (CV_I) refers to the natural fluctuations in an individual’s biological markers over time not accounted for by known physiological effects. Accurate estimation of CV_I is pivotal for setting analytical performance specifications and identifying clinically significant changes in repeated laboratory tests. Recent research suggests that using data from laboratory information systems (LIS) offers a practical way to estimate the CV_I [1], 2].

One indirect method for estimating CV_I involves characterizing a central Gaussian peak from the frequency distribution of ‘result ratios’, the ratios of consecutive results from serial patient measurements [1], 2]. This total variation (CV_Total) includes the random biological (CV_I) and relevant analytical variation (CV_A) of the two measurements. A challenge when estimating CV_I from LIS data is to distinguish this peak from pathological or other non-random changes [3]. A recent study showed that the refineR algorithm could determine the result ratio peak and further estimate reference change values (RCV) [4]. RefineR identifies a central ‘Box-Cox normal distribution’ within the data [5]. The Box-Cox transformation is robust and can make a range of either left and right skewed distributions resemble Gaussian distributions [5], which is valuable because ratios for some analytes have skewed distributions. In the current study, we applied the refineR algorithm to Gaussian and skewed distributions to derive indirect-CV_I estimates and compare these to CV_I values derived from a state-of-the-art biological variation (BV) study [1].

Several aspects must be considered when estimating the CV_I from ratios [6]. First, consider the case where two independent measurements, X₁ and X₂, follow the same positive Gaussian distribution: X ∼ N (μ_x, σ_x), with mean μ_x and standard deviation σ_x, forming the ratio Z = X₁/X₂. The distribution of Z does not have a simple mathematical description and can be nearly Gaussian or markedly skewed [2], 4], 6], 7]. In the case of nearly Gaussian ratios, i.e. Z ∼ N (μ_z, σ_z), μ_z= 1 and σ_z= √2 × σ_x/μ_x are good approximations [1], 2]. Rearranging for σ_x gives σ_x= σ_z × μ_x/√2 and as CV_x= σ_x/μ_x we get:

(1) CV x % = σ z 2 × 100 %

In the case of a skewed Z, the approximation is no longer valid, as the skew increases the mean and standard deviation of the ratios. To mitigate this, we can apply the Box-Cox transformation to make Z approximately Gaussian. From the standard deviation of the transformed ratios Z′, the CV% of X can be estimated using formula (1).

Secondly, consider the case where the two independent measurements, X₁ and X₂, are drawn from the same lognormal distribution ln(X) ∼ N (μ_x, σ_x). By the properties of logarithms, the ratio Z = X₁/X₂ corresponds to ln(Z) = ln(X₁) – ln(X₂), and ln(Z) will be normally distributed with mean, μ_z= μ_x – μ_x= 0, and sd, σ_z= √(σ_x² + σ_x²) = √2σ_x. Rearranging for σ_x gives σ_x= σ_z/√2. σ_x can be used to calculate the CV% of a lognormal distribution (ln-CV%) [8]:

(2) lnCV % = exp ( σ x 2 ) − 1 × 100 %

We used Monte Carlo-simulated data to validate the assumption that the standard deviation of a Box-Cox transformed ratio distribution could estimate the CV% and ln-CV%. Two Gaussian or two lognormal distributions were simulated using one million observations, applying refineR to estimate the standard deviation of a Box-Cox transformed ratio distribution. The Box-Cox(σ) was divided by √2 to estimate the CV% or ln-CV% as described above. We compared the input CV% and ln-CV% in the simulation with the estimates from refineR, see Table 1. These results showed that refineR could accurately estimate the CV from a ratio of two Gaussian distributions for CVs ≤15 % and with less accuracy up to CVs of ≤20 %, reflecting the error propagation in ratios for larger Gaussian CVs [9]. The ln-CV could be estimated accurately up to at least 50 %.

Table 1:

Monte Carlo simulation results comparing the input CV% values of Gaussian or lognormal distributions against the estimated CV% from their respective ratio distributions using refineR. Each simulated ratio distribution contained 1 million ratios, with results averaged across 100 iterations.

Ratio distribution	Input CV%	Estimated CV%
Ratio of Gaussian	2.5	2.5
	5.0	5.0
	7.5	7.5
	10.0	10.0
	12.5	12.4
	15.0	14.8
	17.5	17.1
	20.0	19.4
Ratio of lognormal	5.0	5.0
	10.0	10.0
	15.0	15.0
	20.0	20.0
	25.0	25.0
	30.0	30.0
	35.0	35.0
	40.0	40.0
	45.0	45.0
	50.0	50.0

With this proof of concept validated, we applied our approach to result ratios from LIS data. Here, we used the same dataset as our previous study [4], applying refineR to ratios of repeated patient measurements to estimate the total variability of a single measurement (CV_Total) using either formula (1) or (2). Briefly, this dataset contained adult patient results (18–110 years) with biomarkers selected to represent a wide range of CV_I levels (∼2–50 %): albumin, creatinine, phosphate, cortisone, cortisol, testosterone, androstenedione, 17-hydroxyprogesterone and 11-deoxycortisol extracted from local databases at Haukeland University Hospital from July 1, 2015, to July 1, 2023. The indirect-CV_I was then estimated by subtracting long-term CV_A from quality controls from the CV_Total as:

(3) Indirect CV I = CV Total 2 − CV A 2

The results are shown in Table 2. The 95 % confidence interval for the Box-Cox(σ) and the indirect-CV_I was derived using refineR’s bootstrapping capabilities (1,000 repetitions). As in our previous paper [4], we compared these results with the estimated CV_I from a local state-of-the-art BV study combining CV-ANOVA with outlier exclusion [1]. This study included 10 weekly measurements from 30 healthy subjects for the same analytes and instrumentation as for the LIS data.

Table 2:

Indirect CV_I estimates obtained from applying the refineR algorithm on a result ratio distribution of data from the laboratory information system and from CV_I estimates from a biological variation study [1] using nested-ANOVA.

Measurand	Subgroup	Number of ratios	Long-term CV_A	Box-Cox(σ) [95 % CI]	Indirect CV_I [95 % CI]	CV_I [95 % CI]
Albumin	All	949,295	1.8	0.052 [0.052, 0.053]	3.2 [3.2, 3.3]	2.3 [2.1, 2.6]
Creatinine	All	400,152	3.0	0.076 [0.074, 0.077]	4.4 [4.3, 4.6]	4.4 [4.0, 4.8]
Phosphate	All	25,901	1.7	0.141 [0.121, 0.148]	9.8 [8.4, 10.3]	9.5 [8.7, 10.4]
Phosphate	Female	15,426	1.7	0.132 [0.124, 0.137]	9.2 [8.6, 9.6]	8.2 [7.6, 9.8]
Phosphate	Male	10,475	1.7	0.143 [0.125, 0.158]	10.0 [8.6, 11.0]	9.4 [8.3, 11.0]
Cortisone	Male	6,882	7.0	0.202 [0.191, 0.209]	12.5 [11.6, 13.0]	12.6 [11.0, 15.0]
Cortisol	Male	4,983	4.4	0.271 [0.247, 0.310]	18.6 [16.9, 21.4]	15.1 [12.5, 18.6]
Testosterone	Male	1,261	4.0	0.244 [0.178, 0.269]	16.9^a	15.3^a [13.3, 17.6]
Testosterone	Male	1,261	4.0	0.244 [0.178, 0.269]	[12.0, 18.8]	15.3^a [13.3, 17.6]
Androstenedione	Male	6,023	5.3	0.334 [0.295, 0.357]	23.0^a	19.2^a [17.1, 22.6]
Androstenedione	Male	6,023	5.3	0.334 [0.295, 0.357]	[20.2, 24.7]	19.2^a [17.1, 22.6]
17-Hydroxyprogesteron	Male	6,839	4.8	0.351 [0.304, 0.378]	24.7^a [21.2, 26.7]	23.0^a [20.1, 27.0]
11-Deoxycortisol	Male	6,469	5.3	0.584 [0.546, 0.620]	42.6^a [39.6, 45.6]	50.0^a [44.9, 61.1]

^aCalculated as ln-CV.

Although the indirect-CV_I estimates were generally higher than those derived from the BV study, they were comparable, with overlapping confidence intervals (Table 2). The differences between the two estimates are likely due to larger preanalytical variability in the LIS data, as these were produced under daily routine conditions, while the BV study was performed in accordance with recommendations from BIVAC [10], minimizing preanalytical variability. Cortisol was the analyte with the greatest relative difference between the estimates, and this probably relates to its pronounced diurnal variation and the effect of even small changes in sample collection timing.

One of the main benefits of the refineR algorithm is that it can effectively separate healthy and pathological ratios [4]. A strength of this study is that the indirect-CV_I ratio approach was validated by Monte Carlo simulation and by comparing it with a local BV study. However, the approach has limitations that may affect the accuracy of the CV_I estimates: As previously shown, some heavily skewed ratio distributions cannot be Box-Cox transformed to Gaussian [4]. The ratios also do not reveal the underlying distribution’s shape, as Gaussian and lognormal distributions can produce similarly distributed ratios. The errors introduced by choosing a non-applicable formula are minimal for small CV_Is (<20 %) but will increase if the CV_I is large.

A final limitation is that using non-standardized LIS data can produce inaccurate CV_I estimates. Care must be taken to minimize known effects such as time of day, effect of dietary intake, and physiological events such as pregnancy or the menstrual cycle. The lack of standardized sampling procedures associated with LIS data may bias the CV_I estimates compared with a direct sampling approach, where the preanalytical factors are controlled. Based on the indirect-RCV study [4], we recommend ≥5,000 ratios and a pathological fraction of <30 % for robust indirect-CV_I estimates. Datasets commonly used for indirect reference intervals such as outpatient data, wellness checkups, repeated testing in blood donors or unrequested results should be considered.

Despite its limitations, the refineR algorithm provides a valuable, cost-effective tool for laboratories to estimate CV_I using readily available data, especially for subgroups with limited direct studies, such as children. To make the refineR approach available to laboratories, we have developed an application at (https://gocrunch.shinyapps.io/CVi_app/) that can estimate the indirect-CV_I, RCV, and reference intervals from LIS datasets.

Corresponding author: Kristin Moberg Aakre, Hormone Laboratory, Department of Medical Biochemistry and Pharmacology, Haukeland University Hospital, 5021 Bergen, Norway; Department of Clinical Science, University of Bergen, Bergen, Norway; and Department of Heart Disease, Haukeland University Hospital, Bergen, Norway, E-mail: kristin.moberg.aakre@helse-bergen.no

Funding source: The Western Norway Regional Health Authority

Award Identifier / Grant number: F-12821-D10484/ 4800007771

Funding source: British Heart Foundation

Award Identifier / Grant number: FS/18/78/33902

Funding source: St Thomas’ Charity, London, UK

Award Identifier / Grant number: R060701

Award Identifier / Grant number: R100404

Funding source: County Council of Ostergotland

Funding source: Haukeland University Hospital

Funding source: University of Oslo

Research ethics: We conducted the study according to the Declaration of Helsinki Ethical Principles and Good Clinical Practice. The extraction of patient results from the Laboratory Information System (LIS) without patient consent was approved by the Regional Committee for Medical and Health Research Ethics in Bergen (ID number 252125). The biological variation study was approved by the respective regional ethics committees at the inclusion sites, the Regional Committees for Medical and Health Research Ethics in Bergen and Oslo (ID number 2018/92), and the South-Central Berkshire Research Ethics Committee (London).
Informed consent: All included volunteers in the biological variation study gave informed written consent before participating.
Author contributions: Eirik Åsen Røys: led conceptualization, data analysis, investigation, methodology, validation, and the original draft preparation. Kristin Viste supported conceptualization, data curation, formal analysis, funding acquisition and methodology. She led supervision and contributed equally to the review and editing of the manuscript. Christopher-John Farrell contributed to data analysis, resources, and software, with additional support in validation and manuscript review and editing. Ralf Kellmann played an equal role in data curation, resources, and software, and supported the review and editing of the manuscript. Nora Alicia Guldhaug contributed equally to data curation and participated equally in the review and editing process. Elvar Theodorsson contributed equally to conceptualization, methodology, supervision, validation, and manuscript review and editing. Graham Ross Dallas Jones shared equal contributions in conceptualization, methodology, supervision, validation, and manuscript review and editing. Kristin Moberg Aakre contributed equally to conceptualization, methodology, supervision, and validation and led project administration. She also participated equally in the review and editing of the manuscript.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: E. Theodorsson received a consulting fee from NordicBiomarker. K.M. Aakre has served on advisory boards for Roche Diagnostics, Siemens Healthineers, Radiometer and SpinChip; received consulting honoraria from CardiNor, lecture honorarium from Mindray, Siemens Healthineers, and Snibe Diagnostics, and research grants from Siemens Healthineers and Roche Diagnostics; is Associate Editor of Clinical Biochemistry and Chair of the IFCC Committee of Clinical Application of Cardiac Biomarkers.
Research funding: The study was funded by research grants from Haukeland University Hospital, the University of Oslo, the British Heart Foundation (FS/18/78/33902) and Guy’s and St Thomas’ Charity (London, UK; R060701, R100404). This work was funded by The Western Norway Regional Health Authority through providing PhD scholarship for E.Å. Røys (Grant number: F-12821-D10484/ 4800007771). E. Theodorsson received support from the County Council of Ostergotland.
Data availability: The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

1. Røys, EÅ, Guldhaug, NA, Viste, K, Jones, GD, Alaour, B, Sylte, MS, et al.. Sex hormones and adrenal steroids: biological variation estimated using direct and indirect methods. Clin Chem 2022;69:100–9. https://doi.org/10.1093/clinchem/hvac175.Search in Google Scholar PubMed

2. Jones, GRD. Estimates of within-subject biological variation derived from pathology databases: an approach to allow assessment of the effects of age, sex, time between sample collections, and analyte concentration on reference change values. Clin Chem 2019;65:579–88. https://doi.org/10.1373/clinchem.2018.290841.Search in Google Scholar PubMed

3. Tan, RZ, Markus, C, Vasikaran, S, Loh, TP, APFCB Harmonization of Reference Intervals Working Group. Comparison of four indirect (data mining) approaches to derive within-subject biological variation. CCLM 2022;60:636–44. https://doi.org/10.1515/cclm-2021-0442.Search in Google Scholar PubMed

4. Røys, EÅ, Viste, K, Kellmann, R, Guldhaug, NA, Alaour, B, Sylte, MS, et al.. Estimating reference change values using routine patient data: a novel pathology database approach. Clin Chem 2024:hvae166. https://doi.org/10.1093/clinchem/hvae166.Search in Google Scholar PubMed

5. Ammer, T, Schützenmeister, A, Prokosch, HU, Rauh, M, Rank, CM, Zierk, J. refineR: a novel algorithm for reference interval estimation from real-world data. Sci Rep 2021;11:16023. https://doi.org/10.1038/s41598-021-95301-2.Search in Google Scholar PubMed PubMed Central

6. Springer, MD. The algebra of random variables. Wiley 1979;22:522.10.1137/1022108Search in Google Scholar

7. Díaz-Francés, E, Rubio, F. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat Pap 2013;54. https://doi.org/10.1007/s00362-012-0429-2.Search in Google Scholar

8. Fokkema, MR, Herrmann, Z, Muskiet, FA, Moecks, J. Reference change values for brain natriuretic peptides revisited. Clin Chem 2006;52:1602–3. https://doi.org/10.1373/clinchem.2006.069369.Search in Google Scholar PubMed

9. Holmes, DT, Buhr, KA. Error propagation in calculated ratios. Clin Biochem 2007;40:728–34. https://doi.org/10.1016/j.clinbiochem.2006.12.014.Search in Google Scholar PubMed

10. Aarsand, AK, Røraas, T, Fernandez-Calle, P, Ricos, C, Díaz-Garzón, J, Jonker, N, et al.. The biological variation data critical appraisal checklist: a standard for evaluating studies on biological variation. Clin Chem 2018;64:501–14. https://doi.org/10.1373/clinchem.2017.281808.Search in Google Scholar PubMed

Received: 2024-11-26

Accepted: 2024-12-19

Published Online: 2024-12-30

Published in Print: 2025-05-26

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/cclm-2024-1386

Keywords for this article

indirect methods; laboratory information systems; biological variation

Creative Commons

BY 4.0