Home Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
Article
Licensed
Unlicensed Requires Authentication

Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements

  • Claus Thorn Ekstrøm ORCID logo EMAIL logo and Bendix Carstensen
Published/Copyright: February 22, 2024

Abstract

Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland–Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].


Corresponding author: Claus Thorn Ekstrøm, Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark, E-mail:

  1. Research ethics: None applicable.

  2. Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Competing interests: The authors states no conflict of interest.

  4. Research funding: None declared.

  5. Data availability: The raw data can be obtained on request from the corresponding author or from the link provided in the manuscript.

References

1. Bland, JM, Altman, DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med 1990;20:337–40. https://doi.org/10.1016/0010-4825(90)90013-f.Search in Google Scholar PubMed

2. Shoukri, MM. Measures of interobserver agreement and reliability, 2nd ed Boca Raton, FL, USA: Chapman & Hall; 2010.10.1201/b10433Search in Google Scholar

3. Gwet, KL. Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among multiple raters, 3rd ed Gaithersburg, MD, USA: Advanced Analytics; 2012.Search in Google Scholar

4. ISO. Accuracy (trueness and precision) of measurement methods and results — part 1: general principles and definitions Technical Report. International Organization for Standardization; 1994.Search in Google Scholar

5. Altman, DG, Bland, JM. Measurement in medicine: the analysis of method comparison studies. J R Stat Soc Ser A Stat 1983;32:307–17. https://doi.org/10.2307/2987937.Search in Google Scholar

6. Barnhart, HX, Haber, MJ, Lin, L. An overview of assessing agreement with continuous measurements. J Biopharm Stat 2007;17:529–69. https://doi.org/10.1080/10543400701376480.Search in Google Scholar PubMed

7. Rousson, V, Gasser, T, Seifert, B. Assessing interrater, interrater and test-retest reliability of continuous measurements. J Stat Plan Inference 2002;21:3431–46. https://doi.org/10.1002/sim.1253.Search in Google Scholar PubMed

8. Bland, JM, Altman, DG. Agreement between methods of measurement with multiple observations per individual. J Biopharm Stat 2007;17:571–82. https://doi.org/10.1080/10543400701329422.Search in Google Scholar PubMed

9. Carstensen, B, Simpson, J, Gurrin, LC. Statistical models for assessing agreement in method comparison studies with replicates measurements. Int J Biostat 2008;4:16.10.2202/1557-4679.1107Search in Google Scholar PubMed

10. Choudhary, PK, Nagaraja, HN. Measuring agreement: models, methods, and applications. Hoboken, NJ, USA: John Wiley & Sons; 2018.10.1002/9781118553282Search in Google Scholar

11. Bland, JM, Altman, DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60. https://doi.org/10.1177/096228029900800204.Search in Google Scholar PubMed

12. Parker, RA, Scott, C, Inácio, V, Stevens, NT. Using multiple agreement methods for continuous repeated measures data: a tutorial for practitioners. BMC Med Res Methodol 2020;20:154. https://doi.org/10.1186/s12874-020-01022-x.Search in Google Scholar PubMed PubMed Central

13. Husson, F, Le, S, Cadoret, M. SensoMineR: sensory data analysis with R; 2014. R package version 1.20.Search in Google Scholar

14. Carstensen, B, Gurrin, L, Ekstrøm, CT, Figurski, M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22.Search in Google Scholar

15. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012.Search in Google Scholar

16. Carstensen, B. Comparing clinical measurement methods: a practical guide. Chichester, UK: John Wiley & Sons; 2010.10.1002/9780470683019Search in Google Scholar

17. Fleiss, JL, Levin, B, Paik, MC. Statistical methods for rates and proportions, chapter the measurement of interrater agreement, 3rd ed. Hoboken, NJ, USA: John Wiley & Sons; 2004.10.1002/0471445428Search in Google Scholar

18. Parker, RA, Weir, CJ, Rubio, N, Rabinovich, R, Pinnock, H, Hanley, J, et al.. Application of mixed effects limits of agreement in the presence of multiple sources of variability: exemplar from the comparison of several devices to measure respiratory rate in copd patients. PLoS One 2016;11:e0168321.10.1371/journal.pone.0168321Search in Google Scholar PubMed PubMed Central

19. Choudhary, PK. A tolerance interval approach for assessment of agreement in method comparison studies with repeated measurements. J Stat Plan Inference 2008;138:1102–15. https://doi.org/10.1016/j.jspi.2007.03.056.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2023-0037).


Received: 2023-03-11
Accepted: 2023-12-28
Published Online: 2024-02-22

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Research Articles
  3. Random forests for survival data: which methods work best and under what conditions?
  4. Flexible variable selection in the presence of missing data
  5. An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
  6. MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
  7. Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
  8. Hypothesis testing for detecting outlier evaluators
  9. Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
  10. Commentary
  11. Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
  12. Research Articles
  13. Optimizing personalized treatments for targeted patient populations across multiple domains
  14. Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
  15. History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
  16. Revisiting incidence rates comparison under right censorship
  17. Ensemble learning methods of inference for spatially stratified infectious disease systems
  18. The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
  19. Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
  20. Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
  21. Improving the mixed model for repeated measures to robustly increase precision in randomized trials
  22. Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
  23. A modified rule of three for the one-sided binomial confidence interval
  24. Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
  25. Bayesian estimation and prediction for network meta-analysis with contrast-based approach
  26. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Downloaded on 14.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2023-0037/html
Scroll to top button