Abstract
In epidemiological studies, the measurements of disease outcomes are carried out by different evaluators. In this paper, we propose a two-stage procedure for detecting outlier evaluators. In the first stage, a regression model is fitted to obtain the evaluators’ effects. Outlier evaluators have different effects than normal evaluators. In the second stage, stepwise hypothesis tests are performed to detect outlier evaluators. The true positive rate and true negative rate of the proposed procedure are assessed in a simulation study. We apply the proposed method to detect potential outlier audiologists among the audiologists who measured hearing threshold levels of the participants in the Audiology Assessment Arm of the Conservation of Hearing Study, which is an epidemiological study for examining risk factors of hearing loss.
Funding source: National Institutes of Health
Award Identifier / Grant number: R01DC017717
-
Research Ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: All other authors state no conflict of interest.
-
Research funding: This work is supported by NIH grant R01DC017717.
-
Data availability: We include the code link here https://github.com/tgh1122334/ESDGen.
Appendix: Technical details on deriving the critical values
To simulate the critical values from
H t−1 is the hypothesis that after removing t outliers, the remaining M − t + 1 evaluators are not outliers. The pseudo-code to obtain λ 1, …, λ k through simulation is in Algorithm 2 below.
Algorithm 2.
Pseudo-code for finding the critical values using simulation.

To derive the approximated critical values λ t , t = 1, …, k. From Equation (2), we have
Since
Define
To determine λ t ,
so,
References
1. Sanders, L, Geffner, R, Bucky, S, Ribner, N, Patino, AJ. A qualitative study of child custody evaluators’ beliefs and opinions. J Child Custody 2015;12:205–30. https://doi.org/10.1080/15379418.2015.1120476.Search in Google Scholar
2. Dogan, S, Ricardo Schwedhelm, E, Heindl, H, Mancl, L, Raigrodski, AJ. Clinical efficacy of polyvinyl siloxane impression materials using the one-step two-viscosity impression technique. J Prosthet Dent 2015;114:217–22. https://doi.org/10.1016/j.prosdent.2015.03.019.Search in Google Scholar PubMed
3. Miller, B, Carr, KC. Integrating standardized patients and objective structured clinical examinations into a nurse practitioner curriculum. J Nurse Pract 2016;12:201–10. https://doi.org/10.1016/j.nurpra.2016.01.017.Search in Google Scholar
4. Beckler, DT, Thumser, ZC, Schofield, JS, Marasco, PD. Reliability in evaluator-based tests: using simulation-constructed models to determine contextually relevant agreement thresholds. BMC Med Res Methodol 2018;18:1–12. https://doi.org/10.1186/s12874-018-0606-7.Search in Google Scholar PubMed PubMed Central
5. Iglewicz, B, Hoaglin, DC. How to detect and handle outliers. Milwaukee, WI, USA: Quality Press; 1993, 16.Search in Google Scholar
6. Rosner, B. On the detection of many outliers. Technometrics 1975;17:221–7. https://doi.org/10.2307/1268354.Search in Google Scholar
7. Ali, SH, Simonoff, JS. Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 1993;88:1264–72. https://doi.org/10.2307/2291266.Search in Google Scholar
8. Davies, L, Gather, U. The identification of multiple outliers. J Am Stat Assoc 1993;88:782–92. https://doi.org/10.2307/2290763.Search in Google Scholar
9. Wu, Y, Curhan, S, Rosner, B, Curhan, G, Wang, M. Analytical method for detecting outlier evaluators. BMC Med Res Methodol 2023;23:177. https://doi.org/10.1186/s12874-023-01988-4.Search in Google Scholar PubMed PubMed Central
10. Malini, N, Pushpa, M. Analysis on credit card fraud identification techniques based on knn and outlier detection. In: 2017 third international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB); 2017:255–8 pp.10.1109/AEEICB.2017.7972424Search in Google Scholar
11. Dey, P, Zhang, Z, Dunson, DB. Outlier detection for multi-network data. arXiv preprint arXiv:2205.06398, 2022. https://doi.org/10.1093/bioinformatics/btac431,Search in Google Scholar PubMed PubMed Central
12. Huang, K, Wen, H, Yang, C, Gui, W, Hu, S. Outlier detection for process monitoring in industrial cyber-physical systems. IEEE Trans Autom Sci Eng 2021;19:2487–98. https://doi.org/10.1109/tase.2021.3087599.Search in Google Scholar
13. Zhu, J, Deng, F, Zhao, J, Ye, Z, Chen, J. Gaussian mixture variational autoencoder with whitening score for multimodal time series anomaly detection. In: 2022 IEEE 17th international conference on control & automation (ICCA). IEEE; 2022:480–5 pp.10.1109/ICCA54724.2022.9831885Search in Google Scholar
14. Cabana, E, Lillo, RE, Laniado, H. Multivariate outlier detection based on a robust mahalanobis distance with shrinkage estimators. Stat Pap 2021;62:1583–609. https://doi.org/10.1007/s00362-019-01148-1.Search in Google Scholar
15. Vens, M, Ziegler, A. Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: a case study. Comput Stat Data Anal 2012;56:1232–42. https://doi.org/10.1016/j.csda.2011.04.010.Search in Google Scholar
16. Liang, KY, Zeger, SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13–22. https://doi.org/10.2307/2336267.Search in Google Scholar
17. Osorio, F, Gárate, Á, Russo, CM. The gradient test statistic for outlier detection in generalized estimating equations. Stat Probab Lett 2024;209:110087. https://doi.org/10.1016/j.spl.2024.110087.Search in Google Scholar
18. Curhan, SG, Stankovic, K, Halpin, C, Wang, M, Eavey, RD, Paik, JM, et al.. Osteoporosis, bisphosphonate use, and risk of moderate or worse hearing loss in women. J Am Geriatr Soc 2021;69:3103–13. https://doi.org/10.1111/jgs.17275.Search in Google Scholar PubMed PubMed Central
19. Bao, Y, Bertoia, ML, Lenart, EB, Stampfer, MJ, Willett, WC, Speizer, FE, et al.. Origin, methods, and evolution of the three nurses’ health studies. Am J Publ Health 2016;106:1573–81. https://doi.org/10.2105/ajph.2016.303338.Search in Google Scholar PubMed PubMed Central
20. Curhan, SG, Halpin, C, Wang, M, Eavey, RD, Curhan, GC. Prospective study of dietary patterns and hearing threshold elevation. Am J Epidemiol 2020;189:204–14. https://doi.org/10.1093/aje/kwz223.Search in Google Scholar PubMed PubMed Central
21. Rosner, B. Percentage points for a generalized esd many-outlier procedure. Technometrics 1983;25:165–72. https://doi.org/10.1080/00401706.1983.10487848.Search in Google Scholar
22. Zeger, SL, Liang, KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986:121–30. https://doi.org/10.2307/2531248.Search in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/ijb-2023-0004).
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods