Abstract
In this paper, a Markov Regime Switching Model of Conditional Mean with covariates, is proposed and investigated for the analysis of incidence rate data. The components of the model are selected by both penalized likelihood techniques in conjunction with the Expectation Maximization algorithm, with the goal of achieving a high level of robustness regarding the modeling of dynamic behaviors of epidemiological data. In addition to statistical inference, Changepoint Detection Analysis is performed for the selection of the number of regimes, which reduces the complexity associated with Likelihood Ratio Tests. Within this framework, a three-phase procedure for modeling incidence data is proposed and tested via real and simulated data.
Acknowledgment
The authors wish to express their appreciation to the Editor and two anonymous Referees for their comments, suggestions, and recommendations which helped in improving both the quality and the presentation of the manuscript. In addition, the authors would like to thank the Hellenic National Meteorological Service (HNMS) for providing the meteorological data as well as the Department of Epidemiological Surveillance and Intervention of the National Public Health Organization (NPHO) of Greece for providing the Influenza-Like Illness (ILI) incidence data, collected weekly through the sentinel surveillance system. Finally, note that this work was carried out at the Lab of Statistics and Data Analysis of the University of the Aegean.
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: None declared.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. Oliveira, TA, Oliveira, A, Monteiro, AA. Microarray experiments on risk analysis using R. In: Kitsos, C, Oliveira, T, Rigas, A, Gulati, S, editors. Theory and practice of risk assessment, springer proceedings in mathematics & statistics. Springer; 2015, vol 136:147–58 pp.10.1007/978-3-319-18029-8_12Search in Google Scholar
2. Stehlík, M, Kiseľák, J, Potocký Jordanova, P. Generalized interest rate dynamics and its impacts on finance and pensions. Stoch Anal Appl 2017;35:178–90. https://doi.org/10.1080/07362994.2016.1224975.Search in Google Scholar
3. Huber, C. Efficient regression estimation under general censoring and truncation. In: Rykov, V, Balakrishnan, N, Nikulin, M, editors. Mathematical and statistical models and methods in reliability. Statistics for industry and technology. Boston, MA: Birkhäuser; 2010, vol 12:235–41 pp.10.1007/978-0-8176-4971-5_17Search in Google Scholar
4. Kalligeris, EN, Karagrigoriou, A, Parpoula, C. On mixed PARMA modeling of epidemiological time series data. Commun Stat Case Stud Data Anal 2019;6:36–49. https://doi.org/10.1080/23737484.2019.1644253.Search in Google Scholar
5. Valente, V, Oliveira, T. Hierarchical linear models in education sciences: an application. Biom Lett 2009;46:71–86.Search in Google Scholar
6. Cavanaugh, J, Johnson, W. Assessing the predictive influence of cases in a state space process. Biometrika 1999;86:183–90. https://doi.org/10.1093/biomet/86.1.183.Search in Google Scholar
7. Pelat, C, Boëlle, PY, Cowling, BJ, Carrat, F, Flahault, A, Ansart, S, et al.. Online detection and quantification of epidemics. BMC Med Inf Decis Making 2007;5:29. https://doi.org/10.1186/1472-6947-7-29.Search in Google Scholar PubMed PubMed Central
8. Tong, H. Nonlinear time series analysis since 1990: some personal reflections. Acta Math Appl Sin Engl Ser 2002;18:177. https://doi.org/10.1007/s102550200017.Search in Google Scholar
9. Granger, CWJ. Strategies for modelling nonlinear time-series relationships. Econ Rec 1993;69:233–8. https://doi.org/10.1111/j.1475-4932.1993.tb02103.x.Search in Google Scholar
10. Lindgren, G. Markov regime models for mixed distributions and switching regressions. Scand J Stat 1978;5:81–91.Search in Google Scholar
11. Green, PJ. On use of the EM algorithm for penalized likelihood estimation. J R Stat Soc B Stat Methodol 1990;52:443–52. https://doi.org/10.1111/j.2517-6161.1990.tb01798.x.Search in Google Scholar
12. Pan, J, Shang, J. A simultaneous variable selection methodology for linear mixed models. J Stat Comput Simulat 2018;88:3323–37. https://doi.org/10.1080/00949655.2018.1515948.Search in Google Scholar
13. Barbu, VS, Karagrigoriou, A, Makrides, A. Semi Markov modelling for multi state systems. Methodol Comput Appl Probab 2017;19:1011–28. https://doi.org/10.1007/s11009-016-9510-y.Search in Google Scholar
14. Karagrigoriou, A, Makrides, A, Tsapanos, T, Vougiouka, G. Earthquake forecasting based on multi state system methodology. Methodol Comput Appl Probab 2016;18:547–61. https://doi.org/10.1007/s11009-015-9451-x.Search in Google Scholar
15. Votsi, I, Limnios, N, Tsaklidis, G, Papadimitriou, E. Hidden Markov models revealing the stress field underlying the earthquake generation. Physica A 2013;392:2868–85. https://doi.org/10.1016/j.physa.2012.12.043.Search in Google Scholar
16. Shaby, BA, Reich, B, Cooley, D, Kaufman, CG. A Markov switching model for heat waves. Ann Appl Stat 2016;10:74–93. https://doi.org/10.1214/15-aoas873.Search in Google Scholar
17. Clements, MP, Krolzig, HM. A comparison of the forecast performance of Markov-switching and threshold autoregressive models of US GNP. Econom J 1998;1:C47–75. https://doi.org/10.1111/1368-423x.11004.Search in Google Scholar
18. Dempster, A, Laird, N, Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B Stat Methodol 1977;39:1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.Search in Google Scholar
19. Hamilton, JD. Analysis of time series subject to changes in regime. J Econom 1990;45:9–70. https://doi.org/10.1016/0304-4076(90)90093-9.Search in Google Scholar
20. Liporace, LA. Maximum likelihood estimation for multivariate observations of Markov sources. IEEE Trans Inf Theor 1982;28:729–34. https://doi.org/10.1109/tit.1982.1056544.Search in Google Scholar
21. Hamilton, JD. A new approach of the economic analysis of nonstationary time series and the business cycle. Econometrica 1989;57:357–84. https://doi.org/10.2307/1912559.Search in Google Scholar
22. Kim, CJ. Dynamic linear models with Markov-switching. J Econom 1994;60:1–22. https://doi.org/10.1016/0304-4076(94)90036-1.Search in Google Scholar
23. Chen, MY. Markov switching models. China: Department of Finance, National Chung Hsing University; 2013.Search in Google Scholar
24. Kim, CJ, Nelson, CR. State-space models with regime switching: classical and gibbs-sampling approaches with applications. US: The MIT Press; 1999.10.7551/mitpress/6444.001.0001Search in Google Scholar
25. Di Persio, L, Vettori, S. Markov switching model analysis of implied volatility for market indexes with applications to S & P 500 and DAX. J Math 2014;2014:1–17. https://doi.org/10.1155/2014/753852.Search in Google Scholar
26. Breiman, L. Better subset regression using the nonnegative garrote. Technometrics 1995;37:373–84. https://doi.org/10.1080/00401706.1995.10484371.Search in Google Scholar
27. Tibshirani, R. Regression shrinkage and selection via the Lasso. J R Stat Soc B Stat Methodol 1996;58:267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.Search in Google Scholar
28. Hoerl, AE, Kennard, RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970;12:55–67. https://doi.org/10.1080/00401706.1970.10488634.Search in Google Scholar
29. Zou, H, Hastie, T. Regularization and variable selection via the elastic net. J R Stat Soc B Stat Methodol 2005;67:301–20. https://doi.org/10.1111/j.1467-9868.2005.00503.x.Search in Google Scholar
30. Hastie, T, Tibshirani, R, Friedman, J. The elements of statistical learning: data mining, inference and prediction, 2nd ed. Germany: Springer; 2016.Search in Google Scholar
31. Nardi, Y, Rinaldo, A. Autoregressive process modeling via the Lasso procedure. J Multivariate Anal 2011;102:528–49. https://doi.org/10.1016/j.jmva.2010.10.012.Search in Google Scholar
32. Chen, K, Chan, KS. Subset ARMA selection via the adaptive Lasso. Stat Interface 2011;4:197–205. https://doi.org/10.4310/sii.2011.v4.n2.a14.Search in Google Scholar
33. Medeiros, CM, Eduardo, M. L1-Regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors. J Econom 2016;191:255–71. https://doi.org/10.1016/j.jeconom.2015.10.011.Search in Google Scholar
34. Bergmeir, C, Hyndman, RJ, Koo, BA. Note on the validity of cross-validation for evaluating time series prediction. Comput Stat Data Anal 2018;120:70–83. https://doi.org/10.1016/j.csda.2017.11.003.Search in Google Scholar
35. Mosteller, F, Tukey, JW. Data analysis, including statistics. In: Handbook of social psychology. Reading, MA: Addison-Wesley; 1968.Search in Google Scholar
36. McLachlan, GJ, Do, KA, Ambroise, C. Analyzing microarray gene expression data. US: Wiley; 2004.10.1002/047172842XSearch in Google Scholar
37. Di Sanzo, S. Testing for linearity in Markov switching models: a bootstrap approach. Stat Methods Appl 2009;18:153–68. https://doi.org/10.1007/s10260-007-0080-6.Search in Google Scholar
38. Page, ES. Continuous inspection schemes. Biometrika 1954;41:100–15. https://doi.org/10.2307/2333009.Search in Google Scholar
39. Lee, S, Lee, S. Change point test for the conditional mean of time series of counts based on support vector regression. Entropy 2021;23:433. https://doi.org/10.3390/e23040433.Search in Google Scholar PubMed PubMed Central
40. Sanchez-Espigares, JA, Lopez-Moreno, A. MSwM: fitting Markov switching models. CRAN; 2018. R package version 14. Available from: https://CRANR-projectorg/package=MSwM.Search in Google Scholar
41. Wong, CM, Yang, L, Chan, KP, Leung, GM, Chan, KH, Guan, Y, et al.. Influenza-associated hospitalization in a subtropical city. PLoS Med 2006;3:e121. https://doi.org/10.1371/journal.pmed.0030121.Search in Google Scholar PubMed PubMed Central
42. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010;33:1–22. https://doi.org/10.18637/jss.v033.i01.Search in Google Scholar
43. Simon, N, Friedman, J, Hastie, T. A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv 2013;1311. 6529:1–15.Search in Google Scholar
44. Noah, S, Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for cox’s proportional hazards model via coordinate descent. J Stat Software 2011;39:1–13. https://doi.org/10.18637/jss.v039.i05.Search in Google Scholar PubMed PubMed Central
45. Tibshirani, R, Bien, J, Friedman, J, Hastie, T, Noah, S, Taylor, J, et al.. Strong rules for discarding predictors in lasso-type problems. J Roy Stat Soc B 2012;74:245–66. https://doi.org/10.1111/j.1467-9868.2011.01004.x.Search in Google Scholar PubMed PubMed Central
46. Braun, JV, Braun, RK, Muller, HG. Multiple changepoint fitting via quasi likelihood, with application to DNA sequence segmentation. Biometrika 2000;87:301–14. https://doi.org/10.1093/biomet/87.2.301.Search in Google Scholar
47. Kalligeris, EN, Karagrigoriou, A, Parpoula, C. Periodic-type auto-regressive moving average modeling with covariates for time-series incidence data via changepoint detection. Stat Methods Med Res 2019;29:1639–49. https://doi.org/10.1177/0962280219871587.Search in Google Scholar PubMed
48. Andrieu, C, Doucet, A, Holenstein, R. Particle Markov chain Monte Carlo methods. J Roy Stat Soc B 2010;7:269–342. https://doi.org/10.1111/j.1467-9868.2009.00736.x.Search in Google Scholar
49. Votsi, I, Cournède, PH. A data augmentation scheme embedding a sequential Monte Carlo method for bayesian parameter inference in state space models. In: 48emes Journees de Statistique de la SFdS. Montpellier, France; 2016. hal-01355334.Search in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/ijb-2021-0134).
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Survival analysis using deep learning with medical imaging
- Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
- Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
- Sensitivity of estimands in clinical trials with imperfect compliance
- Highly robust causal semiparametric U-statistic with applications in biomedical studies
- Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
- Penalized logistic regression with prior information for microarray gene expression classification
- Bayesian learners in gradient boosting for linear mixed models
- Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
- HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
- Heterogeneity in meta-analysis: a comprehensive overview
- On stochastic dynamic modeling of incidence data
- Power of testing for exposure effects under incomplete mediation
- Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
- Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
- Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
- Prediction-based variable selection for component-wise gradient boosting
Articles in the same Issue
- Frontmatter
- Research Articles
- Survival analysis using deep learning with medical imaging
- Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
- Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
- Sensitivity of estimands in clinical trials with imperfect compliance
- Highly robust causal semiparametric U-statistic with applications in biomedical studies
- Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
- Penalized logistic regression with prior information for microarray gene expression classification
- Bayesian learners in gradient boosting for linear mixed models
- Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
- HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
- Heterogeneity in meta-analysis: a comprehensive overview
- On stochastic dynamic modeling of incidence data
- Power of testing for exposure effects under incomplete mediation
- Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
- Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
- Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
- Prediction-based variable selection for component-wise gradient boosting