Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements

Claus Thorn Ekstrøm; Bendix Carstensen

doi:10.1515/ijb-2023-0037

Article

Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements

Claus Thorn Ekstrøm and Bendix Carstensen

Published/Copyright: February 22, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal The International Journal of Biostatistics Volume 20 Issue 2

Abstract

Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland–Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].

Keywords: agreement; limits og agreement; mixed models; random raters; method comparison

Corresponding author: Claus Thorn Ekstrøm, Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark, E-mail: ekstrom@sund.ku.dk

Research ethics: None applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors states no conflict of interest.
Research funding: None declared.
Data availability: The raw data can be obtained on request from the corresponding author or from the link provided in the manuscript.

References

1. Bland, JM, Altman, DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med 1990;20:337–40. https://doi.org/10.1016/0010-4825(90)90013-f.Search in Google Scholar PubMed

2. Shoukri, MM. Measures of interobserver agreement and reliability, 2nd ed Boca Raton, FL, USA: Chapman & Hall; 2010.10.1201/b10433Search in Google Scholar

3. Gwet, KL. Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among multiple raters, 3rd ed Gaithersburg, MD, USA: Advanced Analytics; 2012.Search in Google Scholar

4. ISO. Accuracy (trueness and precision) of measurement methods and results — part 1: general principles and definitions Technical Report. International Organization for Standardization; 1994.Search in Google Scholar

5. Altman, DG, Bland, JM. Measurement in medicine: the analysis of method comparison studies. J R Stat Soc Ser A Stat 1983;32:307–17. https://doi.org/10.2307/2987937.Search in Google Scholar

6. Barnhart, HX, Haber, MJ, Lin, L. An overview of assessing agreement with continuous measurements. J Biopharm Stat 2007;17:529–69. https://doi.org/10.1080/10543400701376480.Search in Google Scholar PubMed

7. Rousson, V, Gasser, T, Seifert, B. Assessing interrater, interrater and test-retest reliability of continuous measurements. J Stat Plan Inference 2002;21:3431–46. https://doi.org/10.1002/sim.1253.Search in Google Scholar PubMed

8. Bland, JM, Altman, DG. Agreement between methods of measurement with multiple observations per individual. J Biopharm Stat 2007;17:571–82. https://doi.org/10.1080/10543400701329422.Search in Google Scholar PubMed

9. Carstensen, B, Simpson, J, Gurrin, LC. Statistical models for assessing agreement in method comparison studies with replicates measurements. Int J Biostat 2008;4:16.10.2202/1557-4679.1107Search in Google Scholar PubMed

10. Choudhary, PK, Nagaraja, HN. Measuring agreement: models, methods, and applications. Hoboken, NJ, USA: John Wiley & Sons; 2018.10.1002/9781118553282Search in Google Scholar

11. Bland, JM, Altman, DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60. https://doi.org/10.1177/096228029900800204.Search in Google Scholar PubMed

12. Parker, RA, Scott, C, Inácio, V, Stevens, NT. Using multiple agreement methods for continuous repeated measures data: a tutorial for practitioners. BMC Med Res Methodol 2020;20:154. https://doi.org/10.1186/s12874-020-01022-x.Search in Google Scholar PubMed PubMed Central

13. Husson, F, Le, S, Cadoret, M. SensoMineR: sensory data analysis with R; 2014. R package version 1.20.Search in Google Scholar

14. Carstensen, B, Gurrin, L, Ekstrøm, CT, Figurski, M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22.Search in Google Scholar

15. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012.Search in Google Scholar

16. Carstensen, B. Comparing clinical measurement methods: a practical guide. Chichester, UK: John Wiley & Sons; 2010.10.1002/9780470683019Search in Google Scholar

17. Fleiss, JL, Levin, B, Paik, MC. Statistical methods for rates and proportions, chapter the measurement of interrater agreement, 3rd ed. Hoboken, NJ, USA: John Wiley & Sons; 2004.10.1002/0471445428Search in Google Scholar

18. Parker, RA, Weir, CJ, Rubio, N, Rabinovich, R, Pinnock, H, Hanley, J, et al.. Application of mixed effects limits of agreement in the presence of multiple sources of variability: exemplar from the comparison of several devices to measure respiratory rate in copd patients. PLoS One 2016;11:e0168321.10.1371/journal.pone.0168321Search in Google Scholar PubMed PubMed Central

19. Choudhary, PK. A tolerance interval approach for assessment of agreement in method comparison studies with repeated measurements. J Stat Plan Inference 2008;138:1102–15. https://doi.org/10.1016/j.jspi.2007.03.056.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2023-0037).

Received: 2023-03-11

Accepted: 2023-12-28

Published Online: 2024-02-22

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/ijb-2023-0037

Keywords for this article

agreement; limits og agreement; mixed models; random raters; method comparison