Home Mathematics Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
Article
Licensed
Unlicensed Requires Authentication

Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables

  • Malavika Babu , Thenmozhi Mani , Marimuthu Sappani , Sebastian George , Shrikant I. Bangdiwala and Lakshmanan Jeyaseelan EMAIL logo
Published/Copyright: May 10, 2023

Abstract

In case-control studies, odds ratios (OR) are calculated from 2 × 2 tables and in some instances, we observe small cell counts or zero counts in one of the cells. The corrections to calculate the ORs in the presence of empty cells are available in literature. Some of these include Yates continuity correction and Agresti and Coull correction. However, the available methods provided different corrections and the situations where each could be applied are not very apparent. Therefore, the current research proposes an iterative algorithm of estimating an exact (optimum) correction factor for the respective sample size. This was evaluated by simulating data with varying proportions and sample sizes. The estimated correction factor was considered after obtaining the bias, standard error of odds ratio, root mean square error and the coverage probability. Also, we have presented a linear function to identify the exact correction factor using sample size and proportion.


Corresponding author: Lakshmanan Jeyaseelan, College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates, E-mail:

  1. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. George, J, Thomas, K, Jeyaseelan, L, Peter, JV, Cherian, AM. Hyponatraemia and hiccups. Natl Med J India 1996;9:107–9.Search in Google Scholar

2. Sangeetha, U, Subbiah, M, Srinivasan, MR. Estimation of confidence intervals for Multinomial proportions of sparse contingency tables using Bayesian methods. Int J Sci Eng Res Pub 2013;3:7.Search in Google Scholar

3. Agresti, A. Introduction to categorical data analysis, 2nd ed. Hoboken: John Wiley & Sons, Inc; 2007:394 p.10.1002/0470114754Search in Google Scholar

4. Sweeting, MJ, Sutton, AJ, Lambert, PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med 2004;23:1351–75. https://doi.org/10.1002/sim.1761.Search in Google Scholar PubMed

5. Yates, F. Contingency tables involving small numbers and the χ 2 test. Supplement to the. J Roy Stat Soc 1934;1:217. https://doi.org/10.2307/2983604.Search in Google Scholar

6. Agresti, A, Coull, BA. Approximate is better than “exact” for interval estimation of binomial proportions. Am Stat 1998;52:119–26.10.1080/00031305.1998.10480550Search in Google Scholar

7. Haviland, MG. Yates’s correction for continuity and the analysis of 2 × 2 contingency tables. Stat Med 1990;9:363–7. https://doi.org/10.1002/sim.4780090403.Search in Google Scholar PubMed

8. Subbiah, M, Srinivasan, MR. Classification of 2×2 sparse data sets with zero cells. Stat Probabil Lett 2008;78:3212–5. https://doi.org/10.1016/j.spl.2008.06.023.Search in Google Scholar

9. Lyles, RH, Guo, Y, Greenland, S. Reducing bias and mean squared error associated with regression-based odds ratio estimators. J Stat Plan Inference 2012;142:3235–41. https://doi.org/10.1016/j.jspi.2012.05.005.Search in Google Scholar PubMed PubMed Central

10. Agresti, A, Hitchcock, DB. Bayesian inference for categorical data analysis. JISS 2005;14:297–330. https://doi.org/10.1007/s10260-005-0121-y.Search in Google Scholar

11. Greenland, S. Bayesian perspectives for epidemiological research. II. Regression analysis. Int J Epidemiol 2007;36:195–202. https://doi.org/10.1093/ije/dyl289.Search in Google Scholar PubMed

12. Galindo-Garre, F, Vermunt, JK, Ato-García, M. Bayesian approaches to the problem of sparse tables in log- linear modeling. In: Proceedings of the fifth International conference on logic and methodology; 2011.Search in Google Scholar

13. Greenland, S, Schwartzbaum, JA, Finkle, WD. Problems due to small samples and sparse data in conditional logistic regression analysis. Am J Epidemiol 2000;151:531–9. https://doi.org/10.1093/oxfordjournals.aje.a010240.Search in Google Scholar PubMed

14. Efron, B. Empirical Bayes methods for combining likelihoods. J Am Stat Assoc 1996;91:538–50. https://doi.org/10.1080/01621459.1996.10476919.Search in Google Scholar

15. Xie, M, Singh, K, Strawderman, WE. Confidence distributions and a unifying framework for meta-analysis. J Am Stat Assoc 2011;106:320–33. https://doi.org/10.1198/jasa.2011.tm09803.Search in Google Scholar

16. Walter, SD, Cook, RJ. A Comparison of several point estimators of the odds ratio in a single 2 X 2 contingency table. Biometrics 1991;47:795. https://doi.org/10.2307/2532640.Search in Google Scholar

17. Walter, SD. The distribution of Levin’s measure of attributable risk. Biometrika 1975;62:371–2. https://doi.org/10.1093/biomet/62.2.371.Search in Google Scholar

18. Efron, B, Tibshirani, RJ. An introduction to the bootstrap [Internet]. Boston, MA: Springer US; 1993. Available from: http://link.springer.com/10.1007/978-1-4899-4541-9 [Accessed 19 Apr 2021].10.1007/978-1-4899-4541-9Search in Google Scholar

19. Nair, BR, Rajshekhar, V. Factors predicting the need for prolonged (>24 Months) antituberculous treatment in patients with Brain tuberculomas. World Neurosurg 2019;125:e236–47. https://doi.org/10.1016/j.wneu.2019.01.053.Search in Google Scholar PubMed

20. Puhr, R, Heinze, G, Nold, M, Lusa, L, Geroldinger, A. Firth’s logistic regression with rare events: accurate effect estimates and predictions? Stat Med 2017;36:2302–17.10.1002/sim.7273Search in Google Scholar PubMed


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/ijb-2022-0040).


Received: 2022-01-05
Accepted: 2023-03-27
Published Online: 2023-05-10

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Research Articles
  3. Survival analysis using deep learning with medical imaging
  4. Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
  5. Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
  6. Sensitivity of estimands in clinical trials with imperfect compliance
  7. Highly robust causal semiparametric U-statistic with applications in biomedical studies
  8. Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
  9. Penalized logistic regression with prior information for microarray gene expression classification
  10. Bayesian learners in gradient boosting for linear mixed models
  11. Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
  12. HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
  13. Heterogeneity in meta-analysis: a comprehensive overview
  14. On stochastic dynamic modeling of incidence data
  15. Power of testing for exposure effects under incomplete mediation
  16. Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
  17. Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
  18. Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
  19. Prediction-based variable selection for component-wise gradient boosting
Downloaded on 12.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2022-0040/html
Scroll to top button