Abstract
Count outcomes are often modelled using the Poisson regression. However, this model imposes a strict mean-variance relationship that is unappealing in many contexts. Several studies in the life sciences result in count outcomes with excessive amounts of zeros. The presence of the excess zeros introduces extra dispersion in the data which cannot be accounted for by the traditional Poisson regression. The zero-inflated Poisson (ZIP) and zero-inflated negative binomial models are popular alternative. The zero-inflated models comprise two key components; a logistic part which models the zeros, and a Poisson component to handle the positive counts. Both components allow the inclusion of covariates. Civettini and Hines [3] investigated misspecification effects in the zero-inflated negative binomial regression models. Long,Preisser, Herring and Golin [10] proposed a so-called marginalized zero-inflated Poisson (MZIP) model that allows direct marginal interpretation for fixed effect estimates to overcome the often sub-population specific interpretation of the traditional zero-inflated models. In this research, the effects of misspecification of components of the MZIP regression model are investigated through a comprehensive simulation study. Two different incorrect specifications of the components of an MZIP model were considered, namely ‘Omission’ and ‘Misspecification’. Bias, standard error (precision) of estimates and mean square error (MSE) are computed while varying the sample size. Type I error rates are also evaluated for the misspecified models. Results of a Monte Carlo simulation are reported. It was observed that omissions in both parts of the models lead to biases in the estimated parameters. The intercept parameters were the most severely affected. Furthermore, in all the types of omissions, parameters in the zero-inflated part of the models were much affected compared to the Poisson part in terms of both bias and MSE. Generally, bias and MSE decrease as sample sizes increase for all parameters. It was also found that misspecification can either increase, preserve or decrease the type I error rates depending on the sample size.
Funding statement: Samuel Iddi gratefully acknowledges financial support from University of Ghana, through ORID Research Grant.
A Supplementary appendix
Results of the correct and misspecified models based on 500 simulations for sample of size 500.
| Models | Quantity | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| CM | Est | 0.2395 | 0.4075 | 0.2496 | 1.39867 | 0.6039 | 0.2530 | ||
| Std Err | 0.0997 | 0.0793 | 0.0186 | 0.10830 | 0.1739 | 0.1871 | 0.0303 | 0.2497 | |
| Bias | 0.0075 | 0.0039 | 0.0030 | 0.0018 | |||||
| OMIT1 | Est | 0.1926 | 0.3106 | 0.3432 | 1.31915 | 0.9577 | – | ||
| Std Err | 0.0977 | 0.0811 | 0.0066 | 0.10580 | 0.1664 | 0.2101 | – | 0.2630 | |
| Bias | 0.0932 | 0.3577 | 0.0606 | – | 0.0895 | ||||
| OMIT2 | Est | 0.4505 | 0.4376 | – | 1.37737 | 0.2318 | 0.5151 | ||
| Std Err | 0.1060 | 0.0812 | – | 0.12110 | 0.1680 | 0.1430 | 0.0122 | 0.2226 | |
| Bias | 0.2005 | 0.0376 | – | 0.4297 | 0.2651 | 0.4223 | |||
| OMIT3 | Est | 0.6946 | 0.5295 | – | 1.36019 | 0.9309 | – | ||
| Std Err | 0.0957 | 0.0818 | – | 0.10340 | 0.1634 | 0.2065 | – | 0.2529 | |
| Bias | 0.4446 | 0.1295 | – | 0.3309 | 0.0187 | – | 0.0991 | ||
| OMIT4 | Est | 0.7166 | 0.4406 | – | 0.90191 | 0.5046 | – | ||
| Std Err | 0.0886 | 0.0846 | – | 0.07160 | 0.1215 | 0.1450 | 0.0115 | – | |
| Bias | 0.4666 | 0.0406 | – | 0.5156 | 0.2546 | – |
Results of the correct and misspecified models based on 500 simulations for sample of size 500.
| Models | Quantity | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| CMMIS1 | Est | 0.2453 | 0.4034 | – | 1.39386 | 0.6104 | 0.2522 | ||
| Std Err | 0.1067 | 0.0841 | – | 0.12190 | 0.1894 | 0.2122 | 0.0271 | 0.2878 | |
| Bias | 0.0034 | – | 0.0104 | 0.0022 | |||||
| MIS1 | Est | 0.2486 | 0.4040 | 0.0028 | 1.39413 | 0.6156 | 0.2533 | ||
| Std Err | 0.1106 | 0.0842 | 0.0252 | 0.12210 | 0.1955 | 0.2137 | 0.0697 | 0.2888 | |
| Bias | 0.0040 | 0.0028 | 0.0156 | 0.0033 | |||||
| CMMIS2 | Est | 0.2463 | 0.4030 | 0.2512 | 1.39887 | 0.5969 | 0.2484 | – | |
| Std Err | 0.1204 | 0.1145 | 0.0273 | 0.07170 | 0.1525 | 0.1894 | 0.0370 | – | |
| Bias | 0.0030 | 0.0012 | – | ||||||
| MIS2 | Est | 0.2433 | 0.4049 | 0.2506 | 1.40758 | 0.6068 | 0.2484 | 0.0052 | |
| Std Err | 0.1315 | 0.1155 | 0.0274 | 0.13200 | 0.1935 | 0.1898 | 0.0372 | 0.2334 | |
| Bias | 0.0049 | 0.0006 | 0.00760 | 0.0068 | 0.0052 | ||||
| CMMIS3 | Est | 0.2486 | 0.4022 | 0.2483 | – | 0.5948 | 0.2540 | – | |
| Std Err | 0.1213 | 0.1264 | 0.0347 | – | 0.1588 | 0.2006 | 0.0466 | – | |
| Bias | 0.0022 | – | 0.0040 | – | |||||
| MIS3 | Est | 0.2439 | 0.4027 | 0.2488 | 0.5858 | 0.2519 | |||
| Std Err | 0.1469 | 0.1266 | 0.0347 | 0.16540 | 0.2102 | 0.2015 | 0.0472 | 0.2766 | |
| Bias | 0.0027 | 0.0019 | |||||||
| CMMIS4 | Est | 0.2434 | 0.4049 | 0.2499 | – | 0.6034 | – | ||
| Std Err | 0.0746 | 0.0824 | 0.0046 | – | 0.1602 | 0.2727 | – | 0.2797 | |
| Bias | 0.0049 | – | 0.0034 | – | 0.0020 | ||||
| MIS4 | Est | 0.2540 | 0.4029 | 0.2491 | 0.6076 | 0.0030 | |||
| Std Err | 0.1026 | 0.0850 | 0.0087 | 0.12390 | 0.2015 | 0.2730 | 0.0308 | 0.3445 | |
| Bias | 0.0040 | 0.0029 | 0.0076 | 0.0030 |
References
[1] A. Agresti, Foundations of Linear and Generalized Linear Models, John Wiley & Sons, Hoboken, 2015. Suche in Google Scholar
[2] A. Agresti, B. Caffo and P. Ohman-Strickland, Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies, Comput. Statist. Data Anal. 47 (2004), no. 3, 639–653. 10.1016/j.csda.2003.12.009Suche in Google Scholar
[3] A. J. Civettini and E. Hines, Misspecification effects in zero-inflated negative binomial regression models: Common cases, Annual Meeting of the Southern Political Science Association, New Orleans, (2005). Suche in Google Scholar
[4] P. J. Heagerty, Marginally specified logistic-normal models for longitudinal binary data, Biometrics 55 (1999), 688–698. 10.1111/j.0006-341X.1999.00688.xSuche in Google Scholar
[5] S. Iddi and K. Doku-Amponsah, Statistical model for overdispersed count outcome with many zeros: An approach for direct marginal inference, South African J. Stat. 50 (2016), 313–330. 10.37920/sasj.2016.50.2.9Suche in Google Scholar
[6] S. Iddi and G. Molenberghs, A combined overdispersed and marginalized multilevel model, Comput. Statist. Data Anal. 56 (2012), 1944–1951. 10.1016/j.csda.2011.11.021Suche in Google Scholar
[7] D. Lambert, Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics 34 (1992), no. 1, 1–14. 10.2307/1269547Suche in Google Scholar
[8] S. Litière, A. Alonso and G. Molenberghs, Type I and type II error under random-effects misspecification in generalized linear mixed models, Biometrics 63 (2007), no. 4, 1038–1044. 10.1111/j.1541-0420.2007.00782.xSuche in Google Scholar PubMed
[9] S. Litière, A. Alonso and G. Molenberghs, The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models, Stat. Med. 27 (2008), 3125–3144. 10.1002/sim.3157Suche in Google Scholar PubMed
[10] D. L. Long, J. Preisser, A. Herring and C. Golin, A marginalized zero-inflated regression model with overall exposure effects, Stat. Med. 33 (2014), 5151–5165. 10.1002/sim.6293Suche in Google Scholar PubMed PubMed Central
[11] R. W. M. Wedderburn, Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method, Biometrika 61 (1974), 439–447. 10.1093/biomet/61.3.439Suche in Google Scholar
[12] W. F. W. Yaacob, M. A. Lazim and Y. B. Wah, A practical approach in modelling count data, Proceedings of the Regional Conference on Statistical Sciences (Malaysia 2010), IEEE Press, Piscataway (2010), 176–183. Suche in Google Scholar
© 2017 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Invariant density estimation for a reflected diffusion using an Euler scheme
- Feistel-inspired scrambling improves the quality of linear congruential generators
- Stochastic polynomial chaos expansion method for random Darcy equation
- Effect of covariate misspecifications in the marginalized zero-inflated Poisson model
- Stochastic mesh method for optimal stopping problems
- Computing with bivariate COM-Poisson model under different copulas
- Improved Markov Chain Monte Carlo method for cryptanalysis substitution-transposition cipher
Artikel in diesem Heft
- Frontmatter
- Invariant density estimation for a reflected diffusion using an Euler scheme
- Feistel-inspired scrambling improves the quality of linear congruential generators
- Stochastic polynomial chaos expansion method for random Darcy equation
- Effect of covariate misspecifications in the marginalized zero-inflated Poisson model
- Stochastic mesh method for optimal stopping problems
- Computing with bivariate COM-Poisson model under different copulas
- Improved Markov Chain Monte Carlo method for cryptanalysis substitution-transposition cipher