Hypothesis testing for detecting outlier evaluators

Li Xu; David M. Zucker; Molin Wang

doi:10.1515/ijb-2023-0004

Article

Hypothesis testing for detecting outlier evaluators

Li Xu , David M. Zucker and Molin Wang

Published/Copyright: November 4, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal The International Journal of Biostatistics Volume 20 Issue 2

Abstract

In epidemiological studies, the measurements of disease outcomes are carried out by different evaluators. In this paper, we propose a two-stage procedure for detecting outlier evaluators. In the first stage, a regression model is fitted to obtain the evaluators’ effects. Outlier evaluators have different effects than normal evaluators. In the second stage, stepwise hypothesis tests are performed to detect outlier evaluators. The true positive rate and true negative rate of the proposed procedure are assessed in a simulation study. We apply the proposed method to detect potential outlier audiologists among the audiologists who measured hearing threshold levels of the participants in the Audiology Assessment Arm of the Conservation of Hearing Study, which is an epidemiological study for examining risk factors of hearing loss.

Keywords: outlier detection; evaluator outliers; audiometric data; quality control

Corresponding author: Molin Wang, Department of Epidemiology, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Channing Division of Network Medicine, Harvard Medical School, Brigham and Women’s Hospital, Boston, MA, USA; and Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA, E-mail: stmow@channing.harvard.edu

Funding source: National Institutes of Health

Award Identifier / Grant number: R01DC017717

Research Ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: All other authors state no conflict of interest.
Research funding: This work is supported by NIH grant R01DC017717.
Data availability: We include the code link here https://github.com/tgh1122334/ESDGen.

Appendix: Technical details on deriving the critical values

To simulate the critical values from

1 − α = Pr ⋂ l = t k ( R l ≤ λ l ) | H t − 1 , t = 1 , … , k ,

H _t−1 is the hypothesis that after removing t outliers, the remaining M − t + 1 evaluators are not outliers. The pseudo-code to obtain λ ₁, …, λ _k through simulation is in Algorithm 2 below.

Algorithm 2.

Pseudo-code for finding the critical values using simulation.

To derive the approximated critical values λ _t, t = 1, …, k. From Equation (2), we have

1 − α = Pr ⋂ l = t k ( R l ≤ λ l ) | H t − 1 ≈ Pr R t ≤ λ t | H t − 1 = Pr max m ∈ I t L m , t T β ̂ I t 2 L m , t T Ω β I t L m , t ≤ λ t | H t − 1 = Pr ⋂ m ∈ I t L m , t T β ̂ I t 2 L m , t T Ω β I t L m , t ≤ λ t | H t − 1 , for t = 1 , … , k .

Since β ̂ I t follows the multivariate normal distribution when N is large,

β ̂ I t ∼ N β I t , Ω β I t .

Define Z = ( Z 1 , … , Z M − t + 1 ) T as Z = A t β ̂ I t where A _t is a matrix with rows equals to L m , t T L m , t T Ω β I t L m , t for m ∈ I _t. Then

Z | H t − 1 ∼ N 0 , A t Ω β I t A t T , R t = max m ∈ I t L m , t T β ̂ I t 2 L m , t T Ω β I t L m , t = max b = 1 M − t − 1 Z b 2 .

To determine λ _t,

1 − α ≈ Pr R t ≤ λ t | H t − 1 = Pr max b = 1 M − t − 1 Z b 2 ≤ λ t = Pr ⋂ b = 1 M − t − 1 | Z b | ≤ λ t ,

so, λ t is the 1 − α two-sided quantile of the distribution N 0 , A t Ω β I t A t T which we obtain using the function qmvnorm in the R package mvtnorm.

References

1. Sanders, L, Geffner, R, Bucky, S, Ribner, N, Patino, AJ. A qualitative study of child custody evaluators’ beliefs and opinions. J Child Custody 2015;12:205–30. https://doi.org/10.1080/15379418.2015.1120476.Search in Google Scholar

2. Dogan, S, Ricardo Schwedhelm, E, Heindl, H, Mancl, L, Raigrodski, AJ. Clinical efficacy of polyvinyl siloxane impression materials using the one-step two-viscosity impression technique. J Prosthet Dent 2015;114:217–22. https://doi.org/10.1016/j.prosdent.2015.03.019.Search in Google Scholar PubMed

3. Miller, B, Carr, KC. Integrating standardized patients and objective structured clinical examinations into a nurse practitioner curriculum. J Nurse Pract 2016;12:201–10. https://doi.org/10.1016/j.nurpra.2016.01.017.Search in Google Scholar

4. Beckler, DT, Thumser, ZC, Schofield, JS, Marasco, PD. Reliability in evaluator-based tests: using simulation-constructed models to determine contextually relevant agreement thresholds. BMC Med Res Methodol 2018;18:1–12. https://doi.org/10.1186/s12874-018-0606-7.Search in Google Scholar PubMed PubMed Central

5. Iglewicz, B, Hoaglin, DC. How to detect and handle outliers. Milwaukee, WI, USA: Quality Press; 1993, 16.Search in Google Scholar

6. Rosner, B. On the detection of many outliers. Technometrics 1975;17:221–7. https://doi.org/10.2307/1268354.Search in Google Scholar

7. Ali, SH, Simonoff, JS. Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 1993;88:1264–72. https://doi.org/10.2307/2291266.Search in Google Scholar

8. Davies, L, Gather, U. The identification of multiple outliers. J Am Stat Assoc 1993;88:782–92. https://doi.org/10.2307/2290763.Search in Google Scholar

9. Wu, Y, Curhan, S, Rosner, B, Curhan, G, Wang, M. Analytical method for detecting outlier evaluators. BMC Med Res Methodol 2023;23:177. https://doi.org/10.1186/s12874-023-01988-4.Search in Google Scholar PubMed PubMed Central

10. Malini, N, Pushpa, M. Analysis on credit card fraud identification techniques based on knn and outlier detection. In: 2017 third international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB); 2017:255–8 pp.10.1109/AEEICB.2017.7972424Search in Google Scholar

11. Dey, P, Zhang, Z, Dunson, DB. Outlier detection for multi-network data. arXiv preprint arXiv:2205.06398, 2022. https://doi.org/10.1093/bioinformatics/btac431,Search in Google Scholar PubMed PubMed Central

12. Huang, K, Wen, H, Yang, C, Gui, W, Hu, S. Outlier detection for process monitoring in industrial cyber-physical systems. IEEE Trans Autom Sci Eng 2021;19:2487–98. https://doi.org/10.1109/tase.2021.3087599.Search in Google Scholar

13. Zhu, J, Deng, F, Zhao, J, Ye, Z, Chen, J. Gaussian mixture variational autoencoder with whitening score for multimodal time series anomaly detection. In: 2022 IEEE 17th international conference on control & automation (ICCA). IEEE; 2022:480–5 pp.10.1109/ICCA54724.2022.9831885Search in Google Scholar

14. Cabana, E, Lillo, RE, Laniado, H. Multivariate outlier detection based on a robust mahalanobis distance with shrinkage estimators. Stat Pap 2021;62:1583–609. https://doi.org/10.1007/s00362-019-01148-1.Search in Google Scholar

15. Vens, M, Ziegler, A. Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: a case study. Comput Stat Data Anal 2012;56:1232–42. https://doi.org/10.1016/j.csda.2011.04.010.Search in Google Scholar

16. Liang, KY, Zeger, SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13–22. https://doi.org/10.2307/2336267.Search in Google Scholar

17. Osorio, F, Gárate, Á, Russo, CM. The gradient test statistic for outlier detection in generalized estimating equations. Stat Probab Lett 2024;209:110087. https://doi.org/10.1016/j.spl.2024.110087.Search in Google Scholar

18. Curhan, SG, Stankovic, K, Halpin, C, Wang, M, Eavey, RD, Paik, JM, et al.. Osteoporosis, bisphosphonate use, and risk of moderate or worse hearing loss in women. J Am Geriatr Soc 2021;69:3103–13. https://doi.org/10.1111/jgs.17275.Search in Google Scholar PubMed PubMed Central

19. Bao, Y, Bertoia, ML, Lenart, EB, Stampfer, MJ, Willett, WC, Speizer, FE, et al.. Origin, methods, and evolution of the three nurses’ health studies. Am J Publ Health 2016;106:1573–81. https://doi.org/10.2105/ajph.2016.303338.Search in Google Scholar PubMed PubMed Central

20. Curhan, SG, Halpin, C, Wang, M, Eavey, RD, Curhan, GC. Prospective study of dietary patterns and hearing threshold elevation. Am J Epidemiol 2020;189:204–14. https://doi.org/10.1093/aje/kwz223.Search in Google Scholar PubMed PubMed Central

21. Rosner, B. Percentage points for a generalized esd many-outlier procedure. Technometrics 1983;25:165–72. https://doi.org/10.1080/00401706.1983.10487848.Search in Google Scholar

22. Zeger, SL, Liang, KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986:121–30. https://doi.org/10.2307/2531248.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/ijb-2023-0004).

Received: 2023-01-06

Accepted: 2024-09-13

Published Online: 2024-11-04

You are currently not able to access this content.

Supplementary Material Details

Articles in the same Issue

https://doi.org/10.1515/ijb-2023-0004

Keywords for this article

outlier detection; evaluator outliers; audiometric data; quality control