Detection of atypical response trajectories in biomedical longitudinal databases

Lucio José Pantazis; Rafael Antonio García

doi:10.1515/ijb-2020-0076

Article

Detection of atypical response trajectories in biomedical longitudinal databases

Lucio José Pantazis and Rafael Antonio García

Published/Copyright: October 24, 2022

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal The International Journal of Biostatistics Volume 19 Issue 2

Abstract

Many health care professionals and institutions manage longitudinal databases, involving follow-ups for different patients over time. Longitudinal data frequently manifest additional complexities such as high variability, correlated measurements and missing data. Mixed effects models have been widely used to overcome these difficulties. This work proposes the use of linear mixed effects models as a tool that allows to search conceptually different types of anomalies in the data simultaneously.

Keywords: longitudinal data; mixed effects models; outlier detection

Corresponding author: Lucio José Pantazis, ITBA, Buenos Aires, Lavardén 315, CP 1437, Argentina, E-mail: lpantazis@itba.edu.ar

Funding source: Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This study was funded by Consejo Nacional de Investigaciones Cientficas y Tcnicas (CONICET).
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Peek, N, Holmes, JH, Sun, J. Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics. Yearb Med Inform 2014;23:42–7. https://doi.org/10.15265/iy-2014-0018.Search in Google Scholar

2. Bellazzi, R, Diomidous, M, Sarkar, IN, Takabayashi, K, Ziegler, A, McCray, AT. Data analysis and data mining: current issues in biomedical informatics. Methods Inf Med 2011;50:536–44. https://doi.org/10.3414/me11-06-0002.Search in Google Scholar

3. Doukas, C, Pliakas, T, Maglogiannis, I. Mobile healthcare information management utilizing cloud computing and android os. In: 2010 annual international conference of the IEEE engineering in medicine and biology. IEEE; 2010:1037–40 pp.10.1109/IEMBS.2010.5628061Search in Google Scholar PubMed

4. Hansen, M, Miron-Shatz, T, Lau, A, Paton, C. Big data in science and healthcare: a review of recent literature and perspectives. Yearb Med Inform 2014;23:21–6. https://doi.org/10.15265/iy-2014-0004.Search in Google Scholar

5. Yoo, I, Alafaireet, P, Marinov, M, Pena-Hernandez, K, Gopidi, R, Chang, JF, et al.. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 2012;36:2431–48. https://doi.org/10.1007/s10916-011-9710-5.Search in Google Scholar PubMed

6. Cowie, MR, Blomster, JI, Curtis, LH, Duclaux, S, Ford, I, Fritz, F, et al.. Electronic health records to facilitate clinical research. Clin Res Cardiol 2017;106:1–9. https://doi.org/10.1007/s00392-016-1025-6.Search in Google Scholar PubMed PubMed Central

7. Lau, F, Price, M, Boyd, J, Partridge, C, Bell, H, Raworth, R. Impact of electronic medical record on physician practice in office settings: a systematic review. BMC Med Inf Decis Making 2012;12:10. https://doi.org/10.1186/1472-6947-12-10.Search in Google Scholar PubMed PubMed Central

8. Fitzmaurice, GM, Laird, NM, Ware, JH. Applied longitudinal analysis, John Wiley & Sons; 2012, vol 998.10.1002/9781119513469Search in Google Scholar

9. Newman, DA. Longitudinal modeling with randomly and systematically missing data: a simulation of ad hoc, maximum likelihood, and multiple imputation techniques. Organ Res Methods 2003;6:328–62. https://doi.org/10.1177/1094428103254673.Search in Google Scholar

10. Zhang, D, Davidian, M. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 2001;57:795–802. https://doi.org/10.1111/j.0006-341x.2001.00795.x.Search in Google Scholar PubMed

11. Suling, M, Pigeot, I. Signal detection and monitoring based on longitudinal healthcare data. Pharmaceutics 2012;4:607–40. https://doi.org/10.3390/pharmaceutics4040607.Search in Google Scholar PubMed PubMed Central

12. Chawla, NV, Davis, DA. Bringing big data to personalized healthcare: a patient-centered framework. J Gen Intern Med 2013;28:660–5. https://doi.org/10.1007/s11606-013-2455-8.Search in Google Scholar PubMed PubMed Central

13. Chandola, V, Banerjee, A, Kumar, V. Anomaly detection: a survey. ACM Comput Surv 2009;41:1–58. https://doi.org/10.1145/1541880.1541882.Search in Google Scholar

14. Schubert, E, Zimek, A, Kriegel, HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 2014;28:190–237. https://doi.org/10.1007/s10618-012-0300-z.Search in Google Scholar

15. Ramaswamy, S, Rastogi, R, Shim, K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data; 2000:427–38 pp.10.1145/335191.335437Search in Google Scholar

16. Breunig, MM, Kriegel, HP, Ng, RT, Sander, J. Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data; 2000:93–104 pp.10.1145/335191.335388Search in Google Scholar

17. Ester, M, Kriegel, HP, Sander, J, Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, Sponsored by the Association for the Advancement of Artificial Intelligence (AAAI). Portland, Oregon: AAAI; 1996, 96:226–31 pp.Search in Google Scholar

18. Rousseeuw, PJ, Van Zomeren, BC. Unmasking multivariate outliers and leverage points. J Am Stat Assoc 1990;85:633–9. https://doi.org/10.1080/01621459.1990.10474920.Search in Google Scholar

19. Billor, N, Hadi, AS, Velleman, PF. Bacon: blocked adaptive computationally efficient outlier nominators. Comput Stat Data Anal 2000;34:279–98. https://doi.org/10.1016/s0167-9473(99)00101-2.Search in Google Scholar

20. Kriegel, HP, Kröger, P, Schubert, E, Zimek, A. A general framework for increasing the robustness of pca-based correlation clustering algorithms. In: International conference on scientific and statistical database management. Springer; 2008:418–35 pp.10.1007/978-3-540-69497-7_27Search in Google Scholar

21. Delannay, N, Archambeau, C, Verleysen, M. Improving the robustness to outliers of mixtures of probabilistic pcas. In: Pacific-Asia conference on knowledge discovery and data mining. Springer; 2008:527–35 pp.10.1007/978-3-540-68125-0_47Search in Google Scholar

22. Hardin, J, Rocke, DM. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput Stat Data Anal 2004;44:625–38. https://doi.org/10.1016/s0167-9473(02)00280-3.Search in Google Scholar

23. Leroy, AM, Rousseeuw, PJ. Robust regression and outlier detection. In: Wiley series in probability and mathematical statistics; 1987.10.1002/0471725382Search in Google Scholar

24. Davies, L, Gather, U. The identification of multiple outliers. J Am Stat Assoc 1993;88:782–92. https://doi.org/10.1080/01621459.1993.10476339.Search in Google Scholar

25. Sim, CH, Gan, FF, Chang, TC. Outlier labeling with boxplot procedures. J Am Stat Assoc 2005;100:642–52. https://doi.org/10.1198/016214504000001466.Search in Google Scholar

26. Abraham, B, Box, GE. Bayesian analysis of some outlier problems in time series. Biometrika 1979;66:229–36. https://doi.org/10.1093/biomet/66.2.229.Search in Google Scholar

27. Fox, AJ. Outliers in time series. J Roy Stat Soc B 1972;34:350–63. https://doi.org/10.1111/j.2517-6161.1972.tb00912.x.Search in Google Scholar

28. Bianco, AM, Garcia Ben, M, Martinez, E, Yohai, VJ. Outlier detection in regression models with arima errors using robust estimates. J Forecast 2001;20:565–79. https://doi.org/10.1002/for.768.Search in Google Scholar

29. Roberts, SJ. Extreme value statistics for novelty detection in biomedical data processing. IEE Proc Sci Meas Technol 2000;147:363–7. https://doi.org/10.1049/ip-smt:20000841.10.1049/ip-smt:20000841Search in Google Scholar

30. Lin, J, Keogh, E, Fu, A, Van Herle, H. Approximations to magic: finding unusual medical time series. In: 18th IEEE symposium on computer-based medical systems (CBMS’05). IEEE; 2005:329–34 pp.10.1109/CBMS.2005.34Search in Google Scholar

31. Tsay, RS, Pena, D, Pankratz, AE. Outliers in multivariate time series. Biometrika 2000;87:789–804. https://doi.org/10.1093/biomet/87.4.789.Search in Google Scholar

32. Zewotir, T, Galpin, JS. A unified approach on residuals, leverages and outliers in the linear mixed model. Test 2007;16:58–75. https://doi.org/10.1007/s11749-006-0001-2.Search in Google Scholar

33. Verbeke, G, Lesaffre, E. The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput Stat Data Anal 1997;23:541–56. https://doi.org/10.1016/s0167-9473(96)00047-3.Search in Google Scholar

34. Verbeke, G, Molenberghs, G. A model for longitudinal data. In: Linear mixed models for longitudinal data. New York: Springer-Verlag; 2000:19–29 pp.10.1007/978-1-4419-0300-6Search in Google Scholar

35. Kannan, KS, Manoj, K, Arumugam, S. Labeling methods for identifying outliers. Int J Stat Syst 2015;10:231–8.Search in Google Scholar

36. Rubin, DB. Inference and missing data. Biometrika 1976;63:581–92. https://doi.org/10.1093/biomet/63.3.581.Search in Google Scholar

37. Molenberghs, G, Fitzmaurice, G, Kenward, MG, Tsiatis, A, Verbeke, G. Handbook of missing data methodology. Boca Raton: CRC Press, Taylor & Francis Group; 2014.10.1201/b17622Search in Google Scholar

38. Dockery, D, Berkey, C, Ware, J, Speizer, F, Ferris, BJr. Distribution of forced vital capacity and forced expiratory volume in one second in children 6 to 11 years of age. Am Rev Respir Dis 1983;128:405–12. https://doi.org/10.1164/arrd.1983.128.3.405.Search in Google Scholar PubMed

39. Wei, L, Lachin, J. Two-sample asymptotically distribution-free tests for incomplete multivariate observations. J Am Stat Assoc 1984;79:653–61. https://doi.org/10.1080/01621459.1984.10478093.Search in Google Scholar

40. Rogan, W, Bornschein, R, Chisolm, J, Damokosh, A, Dockery, D, Fay, M, et al.. Safety and efficacy of succimer in toddlers with blood lead levels of 20-44 μg/dL. Pediatr Res 2000;48:593–9.10.1203/00006450-200011000-00007Search in Google Scholar PubMed

41. Abraham, B, Chuang, A. Outlier detection and time series modeling. Technometrics 1989;31:241–8. https://doi.org/10.1080/00401706.1989.10488517.Search in Google Scholar

Received: 2020-05-27

Accepted: 2022-10-03

Published Online: 2022-10-24

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/ijb-2020-0076

Keywords for this article

longitudinal data; mixed effects models; outlier detection