Abstract
Data description is the first step for understanding the nature of the problem at hand. Usually, it is a simple task that does not require any particular assumption. However, the interpretation of the used descriptive measures can be a source of confusion and misunderstanding. The incidence rate is the quotient between the number of observed events and the sum of time that the studied population was at risk of having this event (person-time). Despite this apparently simple definition, its interpretation is not free of complexity. In this piece of research, we revisit the incidence rate estimator under right-censorship. We analyze the effect that the censoring time distribution can have on the observed results, and its relevance in the comparison of two or more incidence rates. We propose a solution for limiting the impact that the data collection process can have on the results of the hypothesis testing. We explore the finite-sample behavior of the considered estimators from Monte Carlo simulations. Two examples based on synthetic data illustrate the considered problem. The R code and data used are provided as Supplementary Material.
Funding source: Ministerio de Ciencia e Innovación
Award Identifier / Grant number: PID2020-118101 GB-100
Funding source: Asturies Government
Award Identifier / Grant number: GRUPIN AYUD/2021/50897
-
Author contributions: All the authors have accepted responsibility forthe entire content of this submitted manuscript and approved submission.
-
Research funding: This work was supported from the Grants GRUPINAYUD/2021/50897 from the Asturies Government and PID2020 - 118101GB -I00 from Ministerio de Ciencia e Innovación (Spanish Government).
-
Conflict of interests: The authors declare no conflicts of interest regarding this article.
-
Data availability: The authors confirm that no new data were created oranalysed in this study. The data represented in this review article are from the journals listed in the references.
References
1. Rothman, KJ. Modern epidemiology, 3rd ed. Wolters Kluwer Health/Lippincott Williams & Wilkins: Philadelphia; 2008. Thoroughly revised and updated. edition.Search in Google Scholar
2. Giampaoli, S, Palmieri, L, Capocaccia, R, Pilotto, L, Vanuzzo, D. Estimating population-based incidence and prevalence of major coronary events. Int J Epidemiol 2001;S5–10. https://doi.org/10.1093/ije/30.suppl_1.s5.Search in Google Scholar PubMed
3. Sahai, H. Statistics in epidemiology: methods, techniques, and applications. Boca Raton: CRC Press; 1996.Search in Google Scholar
4. Vandormael, A, Dobra, A, Bärnighausen, TW, de Oliveira, T, Tanser, FC. Incidence rate estimation, periodic testing and the limitations of the mid-point imputation approach. Int J Epidemiol 2017;47:236–45. https://doi.org/10.1093/ije/dyx134.Search in Google Scholar PubMed PubMed Central
5. Becher, H, Winkler, V. Estimating the standardized incidence ratio (SIR) with incomplete follow-up data. BMC Med Res Methodol 2017;17:55. https://doi.org/10.1186/s12874-017-0335-3.Search in Google Scholar PubMed PubMed Central
6. Spronk, I, Korevaar, JC, Poos, R, Davids, R, Hilderink, H, Schellevis, FG, et al.. Calculating incidence rates and prevalence proportions: not as simple as it seems. BMC Publ Health 2019;19:1–9. https://doi.org/10.1186/s12889-019-6820-3.Search in Google Scholar PubMed PubMed Central
7. Ostropolets, A, Li, X, Makadia, R, Rao, G, Rijnbeek, PR, Duarte-Salles, T, et al.. Factors influencing background incidence rate calculation: systematic empirical evaluation across an international network of observational databases. Front Pharmacol 2022;13:1–10. https://doi.org/10.3389/fphar.2022.814198.Search in Google Scholar PubMed PubMed Central
8. Ulm, K. A simple method to calculate the confidence interval of a standardized mortality ratio (SMR). Am J Epidemiol 1990;131:373–5. https://doi.org/10.1093/oxfordjournals.aje.a115507.Search in Google Scholar PubMed
9. Bradley, E, Tibshirani, RJ. An introduction to the bootstrap. London: Chapman & Hall/CRC monographs on statistics and applied probability. Chapman and Hall; 1993.Search in Google Scholar
10. Martínez-Camblor, P, Corral, N. A general bootstrap algorithm for hypothesis testing. J Stat Plann Inference 2012;142:589–600. https://doi.org/10.1016/j.jspi.2011.09.003.Search in Google Scholar
11. Kalbfleisch, JD, Prentice, RL. The statistical analysis of failure time data, 2nd ed. New York: John Wiley & Sons; 2002.10.1002/9781118032985Search in Google Scholar
12. Martínez-Camblor, P. Testing the equality among distribution functions from independent and right censored samples via Cramér-von Mises criterion. J Appl Stat 2011;38:1117–31. https://doi.org/10.1080/02664763.2010.484486.Search in Google Scholar
13. Kaplan, EL, Meier, P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53:457–81. https://doi.org/10.1080/01621459.1958.10501452.Search in Google Scholar
14. Bagkavos, D, Ioannides, D. Local polynomial smoothing based on the Kaplan-Meier estimate. J Stat Plann Inference 2022;221:212–29. https://doi.org/10.1016/j.jspi.2022.04.006.Search in Google Scholar
15. Aalen, O. Nonparametric inference for a family of counting processes. Ann Stat 1978;6:701–26. https://doi.org/10.1214/aos/1176344247.Search in Google Scholar
16. Hernán, MA. The hazards of hazard ratios. Epidemiology 2010;21:13–5. https://doi.org/10.1097/ede.0b013e3181c1ea43.Search in Google Scholar PubMed PubMed Central
17. Martínez-Camblor, P, MacKenzie, TA, O’Malley, AJ. A robust hazard ratio for general modeling of survival-times. Int J Biostat 2022;18:537–51. https://doi.org/10.1515/ijb-2021-0003.Search in Google Scholar PubMed
18. Martinussen, T, Vansteelandt, S, Andersen, PK. Subtleties in the interpretation of hazard contrasts. Lifetime Data Anal 2020;26:833–55. https://doi.org/10.1007/s10985-020-09501-5.Search in Google Scholar PubMed
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/ijb-2023-0025).
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods