A visualization tool for continuous reference intervals based on GAMLSS

Sandra Klawitter; Tim Kacprowski

doi:10.1515/labmed-2023-0033

Article Open Access

A visualization tool for continuous reference intervals based on GAMLSS

Sandra Klawitter and Tim Kacprowski

Published/Copyright: May 31, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Laboratory Medicine Volume 47 Issue 4

Abstract

Reference intervals are an important component in the interpretation of medical laboratory findings. Especially in children and adolescents, their limits sometimes can change very rapidly with age. We suggest continuous methods to better represent the age-dependent progression of reference intervals. The generalized additive models for location, scale, and shape parameters (GAMLSS) from the R package gamlss generates continuous percentile plots of laboratory values. A user-friendly Shiny application called AdRI_GAMLSS (Age-dependent Reference Intervals), available at github.com/SandraKla/AdRI_GAMLSS, has been developed for this purpose. Using alkaline phosphatase (ALP) as an example, we obtain different smoothed percentile curves depending on the model used. We demonstrate the superiority of continuously modeled reference intervals compared to fixed age groups and provide the Shiny application AdRI_GAMLSS to make the technique easily accessible to clinicians and other experts.

Keywords: generalized additive models for location, scale, and shape parameters (GAMLSS); reference intervals; statistical learning methods

Introduction

Reference intervals are used to support medical diagnosis. By definition, they represent the central 95 % range of laboratory values measured in a healthy population [1]. It is important to note that laboratory values change with age, especially in newborns or during puberty, due to anatomical and physiological conditions [2]. The effects of age can be so significant, particularly in pediatrics, that it is difficult to correctly interpret laboratory values measured consecutively over a certain period of time [3]. In laboratory practice, reference intervals are usually grouped into age groups which hardly reflect the true age dependence of a laboratory value [3]. At the boundary between two age groups, there may be large jumps in the lower/upper limits, which in the worst case may lead to an incorrect medical diagnosis [1, 4, 5].

Continuous models for reference limits over age can be a solution to this problem but are rarely available due to ethical and practical problems [6]. However, they can better represent the variations of the reference intervals than predefined age groups, especially for laboratory values that show strong changes with age [5], [6], [7]. Age-dependent reference intervals for different laboratory values can be obtained using statistical methods such as generalized additive models for location, scale, and shape (GAMLSS) [7].

For the generation of continuous models, the Shiny application AdRI_GAMLSS was developed, which is based on GAMLSS with different additive terms. Data of alkaline phosphatase (ALP) from Colantonio et al. [1] were used as an illustrative example to build the models and to generate percentile plots. Alkaline phosphatase (ALP) is strongly sex- and age-dependent according to the database from the Canadian Laboratory Initiative on Pediatric Reference Intervals (CALIPER) [1] and shows large jumps in the reference limits between age groups [4].

Materials and methods

Generalized additive models for location, scale, and shape parameters (GAMLSS) were developed by Rigby and Stasinopoulos in 2005 and implemented in the gamlss R package, which provides a variety of features and capabilities for univariate statistical regression modeling and statistical learning [8, 9]. GAMLSS were invented to overcome some of the limitations associated with generalized linear models and generalized additive models [2, 8]. Lambda, Mu, and Sigma (LMS)-based statistical modeling, invented by Cole, is commonly used to compute age-dependent percentiles of laboratory values [5, 10, 11]. Although both GAMLSS and LMS-based statistical modeling use model smoothing terms, they are considered semiparametric methods because the response variable requires the assumption of a parametric distribution [2].

Different additive terms are used for smoothing, which are trained with a backfitting algorithm [8]. For the parametric additive terms, we used third- and fourth-degree polynomials. For the non-parametric additive terms, we used splines (P-splines and cubic splines) and machine learning (Neural Networks and Decision Trees). The penalized maximum likelihood estimation method is used to fit the GAMLSS to the data [8]. For this purpose, we use the Rigby & Stasinopoulos (RS) algorithm with 50 epochs to maximize the penalized likelihood. For our models, the Box-Cox Power Exponential Distribution (BCPE) was used. The Generalized Akaike Information Criterion (GAIC) has been the basis for model comparison.

The alkaline phosphatase data from Colantonio et al. [1] were taken from the Supplemental Table and converted into a readable form for the Shiny application AdRI_GAMLSS (available at github.com/SandraKla/AdRI_GAMLSS). The open-source Shiny application also allows to analyze own data sets. The datasets must be formatted as CSV and contain the following information with column names ID for patient number, SEX for sex, AGE_DAYS for age in days, AGE_YEARS for age in years, VALUE for lab value, and ANALYTE for analyte name. The data is automatically truncated from the missing data in the laboratory values. If the ID is missing, it is assumed to be unique, and the values are numbered in an ascending order. The Shiny application can be started in R with the following commands when the shiny package [12] has been loaded:

library(shiny)

runGitHub(“AdRI_GAMLSS”,“SandraKla”)

Figure 1 shows an image of the Shiny application AdRI_GAMLSS. Data analysis was performed in R (Version: 4.2.0, “Vigorous Calisthenics”) [13] using the Shiny application AdRI_GAMLSS, and several packages were downloaded from the Comprehensive R Archive Network (CRAN): boot (1.3-28), dplyr (1.0.9), DT (0.22), gamlss (5.4-3), gamlss.add (5.1-6), plotly (4.10.0), rpart (4.1.16), rpart.plot (3.1.0), shiny (1.7.1) and zoo (1.8-10) [12, 14–21].

Figure 1:

Presentation of the Shiny application AdRI_GAMLSS.

Results

Figure 2 shows the age- and sex-dependent course of ALP up to the age of 18 years, remarkable changes can be seen in the first years of life and during puberty. We have analyzed this stratified by sex using GAMLSS from the gamlss R package.

Figure 2:

Presentation of CALIPER measurements from ALP using the Shiny application AdRI_GAMLSS. Females are represented by red circles and males by blue triangles.

The continuous percentile plots with GAMLSS fitted by third- and fourth-degree polynomials, P-splines and cubic splines are shown in Figures 3 and 4. The GAMLSS using Neural Networks and Decision Trees are shown in Figure 5. The figures show the age-dependent courses of the reference limits (2.5th and 97.5th percentiles) and medians (50th percentiles) up to the age of 18 years.

Figure 3:

GAMLSS with the polynomials using ALP with the Shiny application AdRI_GAMLSS. The 2.5th (red) and 97.5th (blue) percentiles are dashed. The upper Figure shows the trend for males, the lower one for females.

Figure 4:

GAMLSS with the splines using ALP with the Shiny application AdRI_GAMLSS. The 2.5th (red) and 97.5th (blue) percentiles are dashed. The upper Figure shows the trend for males, the lower one for females.

Figure 5:

GAMLSS with the Neural Network and Decision Tree using ALP with the Shiny application AdRI_GAMLSS. The 2.5th (red) and 97.5th (blue) percentiles are dashed. The upper Figure shows the trend for males, the lower one for females.

Despite the ALP data being BCPE-distributed, the specific fits vary with the additive term used in the model. The GAMLSS with P-splines for females and Decision Trees for males has the smallest GAIC of 6,658.16 (female) and 6,500.24 (male). All models recognize that the reference range is wider after birth and then becomes narrower to follow a relatively constant course during a long period in childhood. With the onset of puberty, the range widens again for both sexes and decreases strongly as it approaches adulthood. The polynomials and cubic splines combine the data after birth and output a wide percentile range at the beginning. In contrast, the P-splines result in narrower but quickly expanding reference ranges. The GAMLSS with the Neural Network does not detect the decrease and increase of ALP concentration from childhood to puberty in females, assuming an almost constant value. This trend is also seen in females with the Decision Tree stepwise model. In males, more age groups are formed, and the Neural Network adapts better to the measurement results.

Discussion

The calculation of continuous reference intervals was introduced about 20 years ago by Lindberg et al. [22] and has become a valuable addition to traditional reference intervals for discrete age groups the last decade [23]. Particularly in pediatrics, large jumps in reference limits may lead to diagnostic misjudgments [4], which can be avoided by introducing mathematical functions instead of rigid age intervals [2]. In a few cases, functions spanning the entire age range from birth to adulthood can be constructed with a single formula [24], but usually the complex physiological changes that occur at different stages of life require more sophisticated mathematical approaches to correctly represent their dynamics.

Generalized non-linear regression models for arbitrary distributions have been developed more recently in the context of statistical learning but have only reached the laboratory community in the 2020s [2, 5]. The gamlss R package combines a wide range of such advanced methods under a common operating philosophy [8, 9]. It is certainly not the only R package suitable for this purpose, but it is one that has recently been used successfully [5]. To make the treasure trove of GAMLSS features available to domain-experts without a strong programming/statistics background, we have developed the Shiny application AdRI_GAMLSS. This tool is not intended to replace expert knowledge of curve fitting to complex data with push-button automation, but to demonstrate to laboratory practitioners how to construct continuous reference intervals, the differences between the various techniques and their limitations.

Our graphs show that the choice of technique has a major influence on the result of the statistical analysis. Figure 2 shows the complex time course of ALP measurements, with strongly scattering values immediately after birth, an only slightly curved course during childhood, which widens again during puberty and then shows a rapid decline to adult values with decreasing scattering. This progression is reproduced quite plausibly by all the techniques used, but with varying resolution. The greatest differences are seen in the consideration of the inhomogeneous distribution of values measured in newborns. While the GAMLSS with Neural Networks with their median curves particularly emphasize the dense point cloud in the low value range (Figure 5), the third- and fourth-degree spline functions reflect this inhomogeneity to a much lesser extent (Figures 3 and 4). The differences in the 2.5th and 97.5th percentiles are less pronounced but also clearly visible. The GAMLSS with the Decision Tree is unique in that it does not model continuous reference limits but introduces statistically based decision points corresponding to traditional age groups. In this respect, it can be seen as an interesting complement to the classical partitioning methods of laboratory medicine [25] which have been in use since the 1990s and are supported by all laboratory information systems (LIS) [26]. The GAMLSS in our Shiny application are currently used as a direct method which relies on laboratory results obtained in non-diseased reference individuals. Its suitability for other applications, especially for the assessment of routine laboratory data, the influence of pathological values, hyperparameters and the distribution of data over age should be the subject of further investigation. This work is specifically important due to the fact that the data base for laboratory values in newborns and children is generally small [27, 28]. Anyway, the expertise of a laboratory physician is still required to select the correct GAMLSS for the age-dependent course of laboratory values.

Corresponding author: Sandra Klawitter, Trillium GmbH Medizinischer Fachverlag, Jesenwanger Str. 42b, 82284 Grafrath, Germany; and Department of Computer Science, Ostfalia University of Applied Sciences, Salzdahlumer Str. 46/48, 38302 Wolfenbüttel, Germany, Phone: (+49) 5331 939 31390, E-mail: s.klawitter@ostfalia.de

Research funding: None declared.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: Authors state no conflict of interest.
Informed consent: Not applicable.
Ethical approval: Not applicable.

References

1. Colantonio, DA, Kyriakopoulou, L, Chan, MK, Daly, CH, Brinc, D, Venner, AA, et al.. Closing the gaps in pediatric laboratory reference intervals: a CALIPER database of 40 biochemical markers in a healthy and multiethnic population of children. Clin Chem 2012;58:854–68. https://doi.org/10.1373/clinchem.2011.177741.Search in Google Scholar PubMed

2. Li, K, Hu, L, Peng, Y, Yan, R, Li, Q, Peng, X, et al.. Comparison of four algorithms on establishing continuous reference intervals for pediatric analytes with age-dependent trend. BMC Med Res Methodol 2020;20:136. https://doi.org/10.1186/s12874-020-01021-y.Search in Google Scholar PubMed PubMed Central

3. Zierk, J, Arzideh, F, Haeckel, R, Cario, H, Fruhwald, MC, Gross, HJ, et al.. Pediatric reference intervals for alkaline phosphatase. Clin Chem Lab Med 2017;55:102–10. https://doi.org/10.1515/cclm-2016-0318.Search in Google Scholar PubMed

4. Klawitter, S, Hoffmann, G, Holdenrieder, S, Kacprowski, T, Klawonn, F. A zlog-based algorithm and tool for plausibility checks of reference intervals. Clin Chem Lab Med 2022;61:260–5. https://doi.org/10.1515/cclm-2022-0688.Search in Google Scholar PubMed

5. Wilson, SM, Bohn, MK, Madsen, A, Hundhausen, T, Adeli, K. LMS-based continuous reference percentiles for 14 laboratory parameters in the CALIPER cohort of healthy children and adolescents. Clin Chem Lab Med 2023;61:1105–15. https://doi.org/10.1515/cclm-2022-1077.Search in Google Scholar PubMed

6. Zierk, J, Arzideh, F, Rechenauer, T, Haeckel, R, Rascher, W, Metzler, M, et al.. Age- and sex-specific dynamics in 22 hematologic and biochemical analytes from birth to adolescence. Clin Chem 2015;61:964–73. https://doi.org/10.1373/clinchem.2015.239731.Search in Google Scholar PubMed

7. Kiess, A, Green, J, Willenberg, A, Ceglarek, U, Dahnert, I, Jurkutat, A, et al.. Age-dependent reference values for hs-troponin T and NT-proBNP and determining factors in a cohort of healthy children (The LIFE Child Study). Pediatr Cardiol 2022;43:1071–83. https://doi.org/10.1007/s00246-022-02827-x.Search in Google Scholar PubMed PubMed Central

8. Stasinopoulos, MD, Rigby, RA, Heller, GZ, Voudouris, V, De Bastiani, F. Flexible regression and smoothing: using GAMLSS in R, 1st ed. New York: Chapman and Hall/CRC; 2017.10.1201/b21973Search in Google Scholar

9. Rigby, RA, Stasinopoulos, MD. Generalized additive models for location, scale and shape. Appl Stat 2005;54:507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.Search in Google Scholar

10. Asgari, S, Higgins, V, McCudden, C, Adeli, K. Continuous reference intervals for 38 biochemical markers in healthy children and adolescents: comparisons to traditionally partitioned reference intervals. Clin Biochem 2019;73:82–9. https://doi.org/10.1016/j.clinbiochem.2019.08.010.Search in Google Scholar PubMed

11. Cole, TJ. The LMS method for constructing normalized growth standards. Eur J Clin Nutr 1990;44:45–60.Search in Google Scholar

12. Chang, W, Cheng, J, Allaire, JJ, Sievert, C, Schloerke, B, Xie, Y, et al.. Shiny: web application framework for R. R package version 1.7.1; 2021. Available from: https://CRAN.R-project.org/package=Shiny.Search in Google Scholar

13. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. Available from: https://www.R-project.org/.Search in Google Scholar

14. Canty, A, Ripley, B. boot: bootstrap R (S-Plus) functions. R package version 1.3-28; 2021. Available from: https://cran.r-project.org/web/packages/boot/citation.html.Search in Google Scholar

15. Wickham, H, François, R, Henry, L, Müller, K. dplyr: a grammar of data manipulation. R package version 1.0.8; 2022. Available from: https://CRAN.R-project.org/package=dplyr.Search in Google Scholar

16. Xie, Y, Cheng, J, Tan, X. DT: a wrapper of the JavaScript library ‘DataTables’. R package version 0.21; 2022. Available from: https://CRAN.R-project.org/package=DT.Search in Google Scholar

17. Stasinopoulos, M, Rigby, B, Voudouris, V, Kiose, D. gamlss.add: extra additive terms for generalized additive models for location scale and shape. R package version 5.1-6; 2020. Available from: https://CRAN.R-project.org/package=gamlss.add.Search in Google Scholar

18. Sievert, C. Interactive web-based data visualization with R, plotly, and shiny. New York: Chapman and Hall/CRC Florida; 2020.10.1201/9780429447273Search in Google Scholar

19. Therneau, T, Atkinson, B. rpart: recursive partitioning and regression trees. R package version 4.1-15; 2019. Available from: https://CRAN.R-project.org/package=rpart.Search in Google Scholar

20. Milborrow, S. rpart.plot: plot ‘rpart’ models: an enhanced version of ‘plot.rpart’. R package version 3.1.0; 2021. Available from: https://CRAN.R-project.org/package=rpart.plot.Search in Google Scholar

21. Zeileis, A, Grothendieck, G. zoo: S3 infrastructure for regular and irregular time series. J Stat Software 2005;14:1–27. https://doi.org/10.18637/jss.v014.i06.Search in Google Scholar

22. Lindberg, M, Hole, A, Johnsen, H, Asberg, A, Rydning, A, Myrvold, HE, et al.. Reference intervals for procalcitonin and C-reactive protein after major abdominal surgery. Scand J Clin Lab Invest 2002;62:189–94. https://doi.org/10.1080/003655102317475443.Search in Google Scholar PubMed

23. Zierk, J, Arzideh, F, Haeckel, R, Rascher, W, Rauh, M, Metzler, M. Indirect determination of pediatric blood count reference intervals. Clin Chem Lab Med 2013;51:863–72. https://doi.org/10.1515/cclm-2012-0684.Search in Google Scholar PubMed

24. Palm, J, Hoffmann, G, Klawonn, F, Tutarel, O, Palm, H, Holdenrieder, S, et al.. Continuous, complete and comparable NT-proBNP reference ranges in healthy children. Clin Chem Lab Med 2020;58:1509–16. https://doi.org/10.1515/cclm-2019-1185.Search in Google Scholar PubMed

25. Sikaris, KA. Physiology and its importance for reference intervals. Clin Biochem Rev 2014;35:3–14.Search in Google Scholar

26. Harris, EH, Boyd, JC. On dividing reference data into subgroups to produce separate reference ranges. Clin Chem 1990;36:265–70. https://doi.org/10.1093/clinchem/36.2.265.Search in Google Scholar

27. Zierk, J, Metzler, M, Rauh, M. Data mining of pediatric reference intervals. J Lab Med 2021;45:311–7. https://doi.org/10.1515/labmed-2021-0120.Search in Google Scholar

28. Zierk, J, Hirschmann, J, Toddenroth, D, Arzideh, F, Haeckel, R, Bertram, A, et al.. Next-generation reference intervals for pediatric hematology. Clin Chem Lab Med 2019;57:1595–607. https://doi.org/10.1515/cclm-2018-1236.Search in Google Scholar PubMed

Received: 2023-03-24

Accepted: 2023-05-12

Published Online: 2023-05-31

Published in Print: 2023-08-28

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/labmed-2023-0033

Keywords for this article

generalized additive models for location, scale, and shape parameters (GAMLSS); reference intervals; statistical learning methods

Creative Commons

BY 4.0