Abstract
Many health care professionals and institutions manage longitudinal databases, involving follow-ups for different patients over time. Longitudinal data frequently manifest additional complexities such as high variability, correlated measurements and missing data. Mixed effects models have been widely used to overcome these difficulties. This work proposes the use of linear mixed effects models as a tool that allows to search conceptually different types of anomalies in the data simultaneously.
Funding source: Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This study was funded by Consejo Nacional de Investigaciones Cientficas y Tcnicas (CONICET).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. Peek, N, Holmes, JH, Sun, J. Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics. Yearb Med Inform 2014;23:42–7. https://doi.org/10.15265/iy-2014-0018.Search in Google Scholar
2. Bellazzi, R, Diomidous, M, Sarkar, IN, Takabayashi, K, Ziegler, A, McCray, AT. Data analysis and data mining: current issues in biomedical informatics. Methods Inf Med 2011;50:536–44. https://doi.org/10.3414/me11-06-0002.Search in Google Scholar
3. Doukas, C, Pliakas, T, Maglogiannis, I. Mobile healthcare information management utilizing cloud computing and android os. In: 2010 annual international conference of the IEEE engineering in medicine and biology. IEEE; 2010:1037–40 pp.10.1109/IEMBS.2010.5628061Search in Google Scholar PubMed
4. Hansen, M, Miron-Shatz, T, Lau, A, Paton, C. Big data in science and healthcare: a review of recent literature and perspectives. Yearb Med Inform 2014;23:21–6. https://doi.org/10.15265/iy-2014-0004.Search in Google Scholar
5. Yoo, I, Alafaireet, P, Marinov, M, Pena-Hernandez, K, Gopidi, R, Chang, JF, et al.. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 2012;36:2431–48. https://doi.org/10.1007/s10916-011-9710-5.Search in Google Scholar PubMed
6. Cowie, MR, Blomster, JI, Curtis, LH, Duclaux, S, Ford, I, Fritz, F, et al.. Electronic health records to facilitate clinical research. Clin Res Cardiol 2017;106:1–9. https://doi.org/10.1007/s00392-016-1025-6.Search in Google Scholar PubMed PubMed Central
7. Lau, F, Price, M, Boyd, J, Partridge, C, Bell, H, Raworth, R. Impact of electronic medical record on physician practice in office settings: a systematic review. BMC Med Inf Decis Making 2012;12:10. https://doi.org/10.1186/1472-6947-12-10.Search in Google Scholar PubMed PubMed Central
8. Fitzmaurice, GM, Laird, NM, Ware, JH. Applied longitudinal analysis, John Wiley & Sons; 2012, vol 998.10.1002/9781119513469Search in Google Scholar
9. Newman, DA. Longitudinal modeling with randomly and systematically missing data: a simulation of ad hoc, maximum likelihood, and multiple imputation techniques. Organ Res Methods 2003;6:328–62. https://doi.org/10.1177/1094428103254673.Search in Google Scholar
10. Zhang, D, Davidian, M. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 2001;57:795–802. https://doi.org/10.1111/j.0006-341x.2001.00795.x.Search in Google Scholar PubMed
11. Suling, M, Pigeot, I. Signal detection and monitoring based on longitudinal healthcare data. Pharmaceutics 2012;4:607–40. https://doi.org/10.3390/pharmaceutics4040607.Search in Google Scholar PubMed PubMed Central
12. Chawla, NV, Davis, DA. Bringing big data to personalized healthcare: a patient-centered framework. J Gen Intern Med 2013;28:660–5. https://doi.org/10.1007/s11606-013-2455-8.Search in Google Scholar PubMed PubMed Central
13. Chandola, V, Banerjee, A, Kumar, V. Anomaly detection: a survey. ACM Comput Surv 2009;41:1–58. https://doi.org/10.1145/1541880.1541882.Search in Google Scholar
14. Schubert, E, Zimek, A, Kriegel, HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 2014;28:190–237. https://doi.org/10.1007/s10618-012-0300-z.Search in Google Scholar
15. Ramaswamy, S, Rastogi, R, Shim, K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data; 2000:427–38 pp.10.1145/335191.335437Search in Google Scholar
16. Breunig, MM, Kriegel, HP, Ng, RT, Sander, J. Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data; 2000:93–104 pp.10.1145/335191.335388Search in Google Scholar
17. Ester, M, Kriegel, HP, Sander, J, Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, Sponsored by the Association for the Advancement of Artificial Intelligence (AAAI). Portland, Oregon: AAAI; 1996, 96:226–31 pp.Search in Google Scholar
18. Rousseeuw, PJ, Van Zomeren, BC. Unmasking multivariate outliers and leverage points. J Am Stat Assoc 1990;85:633–9. https://doi.org/10.1080/01621459.1990.10474920.Search in Google Scholar
19. Billor, N, Hadi, AS, Velleman, PF. Bacon: blocked adaptive computationally efficient outlier nominators. Comput Stat Data Anal 2000;34:279–98. https://doi.org/10.1016/s0167-9473(99)00101-2.Search in Google Scholar
20. Kriegel, HP, Kröger, P, Schubert, E, Zimek, A. A general framework for increasing the robustness of pca-based correlation clustering algorithms. In: International conference on scientific and statistical database management. Springer; 2008:418–35 pp.10.1007/978-3-540-69497-7_27Search in Google Scholar
21. Delannay, N, Archambeau, C, Verleysen, M. Improving the robustness to outliers of mixtures of probabilistic pcas. In: Pacific-Asia conference on knowledge discovery and data mining. Springer; 2008:527–35 pp.10.1007/978-3-540-68125-0_47Search in Google Scholar
22. Hardin, J, Rocke, DM. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput Stat Data Anal 2004;44:625–38. https://doi.org/10.1016/s0167-9473(02)00280-3.Search in Google Scholar
23. Leroy, AM, Rousseeuw, PJ. Robust regression and outlier detection. In: Wiley series in probability and mathematical statistics; 1987.10.1002/0471725382Search in Google Scholar
24. Davies, L, Gather, U. The identification of multiple outliers. J Am Stat Assoc 1993;88:782–92. https://doi.org/10.1080/01621459.1993.10476339.Search in Google Scholar
25. Sim, CH, Gan, FF, Chang, TC. Outlier labeling with boxplot procedures. J Am Stat Assoc 2005;100:642–52. https://doi.org/10.1198/016214504000001466.Search in Google Scholar
26. Abraham, B, Box, GE. Bayesian analysis of some outlier problems in time series. Biometrika 1979;66:229–36. https://doi.org/10.1093/biomet/66.2.229.Search in Google Scholar
27. Fox, AJ. Outliers in time series. J Roy Stat Soc B 1972;34:350–63. https://doi.org/10.1111/j.2517-6161.1972.tb00912.x.Search in Google Scholar
28. Bianco, AM, Garcia Ben, M, Martinez, E, Yohai, VJ. Outlier detection in regression models with arima errors using robust estimates. J Forecast 2001;20:565–79. https://doi.org/10.1002/for.768.Search in Google Scholar
29. Roberts, SJ. Extreme value statistics for novelty detection in biomedical data processing. IEE Proc Sci Meas Technol 2000;147:363–7. https://doi.org/10.1049/ip-smt:20000841.10.1049/ip-smt:20000841Search in Google Scholar
30. Lin, J, Keogh, E, Fu, A, Van Herle, H. Approximations to magic: finding unusual medical time series. In: 18th IEEE symposium on computer-based medical systems (CBMS’05). IEEE; 2005:329–34 pp.10.1109/CBMS.2005.34Search in Google Scholar
31. Tsay, RS, Pena, D, Pankratz, AE. Outliers in multivariate time series. Biometrika 2000;87:789–804. https://doi.org/10.1093/biomet/87.4.789.Search in Google Scholar
32. Zewotir, T, Galpin, JS. A unified approach on residuals, leverages and outliers in the linear mixed model. Test 2007;16:58–75. https://doi.org/10.1007/s11749-006-0001-2.Search in Google Scholar
33. Verbeke, G, Lesaffre, E. The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput Stat Data Anal 1997;23:541–56. https://doi.org/10.1016/s0167-9473(96)00047-3.Search in Google Scholar
34. Verbeke, G, Molenberghs, G. A model for longitudinal data. In: Linear mixed models for longitudinal data. New York: Springer-Verlag; 2000:19–29 pp.10.1007/978-1-4419-0300-6Search in Google Scholar
35. Kannan, KS, Manoj, K, Arumugam, S. Labeling methods for identifying outliers. Int J Stat Syst 2015;10:231–8.Search in Google Scholar
36. Rubin, DB. Inference and missing data. Biometrika 1976;63:581–92. https://doi.org/10.1093/biomet/63.3.581.Search in Google Scholar
37. Molenberghs, G, Fitzmaurice, G, Kenward, MG, Tsiatis, A, Verbeke, G. Handbook of missing data methodology. Boca Raton: CRC Press, Taylor & Francis Group; 2014.10.1201/b17622Search in Google Scholar
38. Dockery, D, Berkey, C, Ware, J, Speizer, F, Ferris, BJr. Distribution of forced vital capacity and forced expiratory volume in one second in children 6 to 11 years of age. Am Rev Respir Dis 1983;128:405–12. https://doi.org/10.1164/arrd.1983.128.3.405.Search in Google Scholar PubMed
39. Wei, L, Lachin, J. Two-sample asymptotically distribution-free tests for incomplete multivariate observations. J Am Stat Assoc 1984;79:653–61. https://doi.org/10.1080/01621459.1984.10478093.Search in Google Scholar
40. Rogan, W, Bornschein, R, Chisolm, J, Damokosh, A, Dockery, D, Fay, M, et al.. Safety and efficacy of succimer in toddlers with blood lead levels of 20-44 μg/dL. Pediatr Res 2000;48:593–9.10.1203/00006450-200011000-00007Search in Google Scholar PubMed
41. Abraham, B, Chuang, A. Outlier detection and time series modeling. Technometrics 1989;31:241–8. https://doi.org/10.1080/00401706.1989.10488517.Search in Google Scholar
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Part-1: SMAC 2021 Webconference
- Statistics, philosophy, and health: the SMAC 2021 webconference
- Part-2: Regular Articles
- “Show me the DAG!”
- Causal inference for oncology: past developments and current challenges
- The EBM+ movement
- Bayesianism from a philosophical perspective and its application to medicine
- Bayesian inference for optimal dynamic treatment regimes in practice
- Agent-based modeling in medical research, virtual baseline generator and change in patients’ profile issue
- Agent based modeling in health care economics: examples in the field of thyroid cancer
- A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes
- Detection of atypical response trajectories in biomedical longitudinal databases
- Potential application of elastic nets for shared polygenicity detection with adapted threshold selection
- Error analysis of the PacBio sequencing CCS reads
- A SIMEX approach for meta-analysis of diagnostic accuracy studies with attention to ROC curves
- Statistical modelling of COVID-19 and drug data via an INAR(1) process with a recent thinning operator and cosine Poisson innovations
- The balanced discrete triplet Lindley model and its INAR(1) extension: properties and COVID-19 applications
Articles in the same Issue
- Frontmatter
- Part-1: SMAC 2021 Webconference
- Statistics, philosophy, and health: the SMAC 2021 webconference
- Part-2: Regular Articles
- “Show me the DAG!”
- Causal inference for oncology: past developments and current challenges
- The EBM+ movement
- Bayesianism from a philosophical perspective and its application to medicine
- Bayesian inference for optimal dynamic treatment regimes in practice
- Agent-based modeling in medical research, virtual baseline generator and change in patients’ profile issue
- Agent based modeling in health care economics: examples in the field of thyroid cancer
- A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes
- Detection of atypical response trajectories in biomedical longitudinal databases
- Potential application of elastic nets for shared polygenicity detection with adapted threshold selection
- Error analysis of the PacBio sequencing CCS reads
- A SIMEX approach for meta-analysis of diagnostic accuracy studies with attention to ROC curves
- Statistical modelling of COVID-19 and drug data via an INAR(1) process with a recent thinning operator and cosine Poisson innovations
- The balanced discrete triplet Lindley model and its INAR(1) extension: properties and COVID-19 applications