Abstract
Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland–Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].
-
Research ethics: None applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The authors states no conflict of interest.
-
Research funding: None declared.
-
Data availability: The raw data can be obtained on request from the corresponding author or from the link provided in the manuscript.
References
1. Bland, JM, Altman, DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med 1990;20:337–40. https://doi.org/10.1016/0010-4825(90)90013-f.Search in Google Scholar PubMed
2. Shoukri, MM. Measures of interobserver agreement and reliability, 2nd ed Boca Raton, FL, USA: Chapman & Hall; 2010.10.1201/b10433Search in Google Scholar
3. Gwet, KL. Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among multiple raters, 3rd ed Gaithersburg, MD, USA: Advanced Analytics; 2012.Search in Google Scholar
4. ISO. Accuracy (trueness and precision) of measurement methods and results — part 1: general principles and definitions Technical Report. International Organization for Standardization; 1994.Search in Google Scholar
5. Altman, DG, Bland, JM. Measurement in medicine: the analysis of method comparison studies. J R Stat Soc Ser A Stat 1983;32:307–17. https://doi.org/10.2307/2987937.Search in Google Scholar
6. Barnhart, HX, Haber, MJ, Lin, L. An overview of assessing agreement with continuous measurements. J Biopharm Stat 2007;17:529–69. https://doi.org/10.1080/10543400701376480.Search in Google Scholar PubMed
7. Rousson, V, Gasser, T, Seifert, B. Assessing interrater, interrater and test-retest reliability of continuous measurements. J Stat Plan Inference 2002;21:3431–46. https://doi.org/10.1002/sim.1253.Search in Google Scholar PubMed
8. Bland, JM, Altman, DG. Agreement between methods of measurement with multiple observations per individual. J Biopharm Stat 2007;17:571–82. https://doi.org/10.1080/10543400701329422.Search in Google Scholar PubMed
9. Carstensen, B, Simpson, J, Gurrin, LC. Statistical models for assessing agreement in method comparison studies with replicates measurements. Int J Biostat 2008;4:16.10.2202/1557-4679.1107Search in Google Scholar PubMed
10. Choudhary, PK, Nagaraja, HN. Measuring agreement: models, methods, and applications. Hoboken, NJ, USA: John Wiley & Sons; 2018.10.1002/9781118553282Search in Google Scholar
11. Bland, JM, Altman, DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60. https://doi.org/10.1177/096228029900800204.Search in Google Scholar PubMed
12. Parker, RA, Scott, C, Inácio, V, Stevens, NT. Using multiple agreement methods for continuous repeated measures data: a tutorial for practitioners. BMC Med Res Methodol 2020;20:154. https://doi.org/10.1186/s12874-020-01022-x.Search in Google Scholar PubMed PubMed Central
13. Husson, F, Le, S, Cadoret, M. SensoMineR: sensory data analysis with R; 2014. R package version 1.20.Search in Google Scholar
14. Carstensen, B, Gurrin, L, Ekstrøm, CT, Figurski, M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22.Search in Google Scholar
15. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012.Search in Google Scholar
16. Carstensen, B. Comparing clinical measurement methods: a practical guide. Chichester, UK: John Wiley & Sons; 2010.10.1002/9780470683019Search in Google Scholar
17. Fleiss, JL, Levin, B, Paik, MC. Statistical methods for rates and proportions, chapter the measurement of interrater agreement, 3rd ed. Hoboken, NJ, USA: John Wiley & Sons; 2004.10.1002/0471445428Search in Google Scholar
18. Parker, RA, Weir, CJ, Rubio, N, Rabinovich, R, Pinnock, H, Hanley, J, et al.. Application of mixed effects limits of agreement in the presence of multiple sources of variability: exemplar from the comparison of several devices to measure respiratory rate in copd patients. PLoS One 2016;11:e0168321.10.1371/journal.pone.0168321Search in Google Scholar PubMed PubMed Central
19. Choudhary, PK. A tolerance interval approach for assessment of agreement in method comparison studies with repeated measurements. J Stat Plan Inference 2008;138:1102–15. https://doi.org/10.1016/j.jspi.2007.03.056.Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2023-0037).
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods