Abstract
In epidemiological studies, the capture-recapture (CRC) method is a powerful tool that can be used to estimate the number of diseased cases or potentially disease prevalence based on data from overlapping surveillance systems. Estimators derived from log-linear models are widely applied by epidemiologists when analyzing CRC data. The popularity of the log-linear model framework is largely associated with its accessibility and the fact that interaction terms can allow for certain types of dependency among data streams. In this work, we shed new light on significant pitfalls associated with the log-linear model framework in the context of CRC using real data examples and simulation studies. First, we demonstrate that the log-linear model paradigm is highly exclusionary. That is, it can exclude, by design, many possible estimates that are potentially consistent with the observed data. Second, we clarify the ways in which regularly used model selection metrics (e.g., information criteria) are fundamentally deceiving in the effort to select a “best” model in this setting. By focusing attention on these important cautionary points and on the fundamental untestable dependency assumption made when fitting a log-linear model to CRC data, we hope to improve the quality of and transparency associated with subsequent surveillance-based CRC estimates of case counts.
Funding source: National Center for Advancing Translational Sciences of the National Institutes of Health
Award Identifier / Grant number: UL1TR002378
Funding source: National Institutes of Health
Award Identifier / Grant number: 1R01CA266574-01A1
Funding source: National Institute of Health
Award Identifier / Grant number: P30AI050409
Acknowledgments
We thank Drs. Howard Chang and Sarita Shah for motivation and helpful discussions.
-
Ethical approval: This study uses publicly available data and information and therefore ethical approval is not required.
-
Informed consent: Not applicable.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: Authors state no conflict of interest.
-
Research funding: Partial support was provided by the National Institute of Health-funded Emory Center for AIDS Research (P30AI050409; Del Rio PI), the National Center for Advancing Translational Sciences of the National Institutes of Health (UL1TR002378; Taylor PI), and by the National Institutes of Health (1R01CA266574-01A1; Lyles/Waller MPIs). The content is the sole responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
-
Data availability: All data are incorporated into the article.
References
1. Lyles, RH, Wilkinson, AL, Williamson, JM, Chen, J, Taylor, AW, Jambai, A, et al.. Alternative capture-recapture point and interval estimators based on two surveillance streams. In: Modern statistical methods for health research. New York, NY, USA: Springer; 2021:43–81 pp.10.1007/978-3-030-72437-5_3Search in Google Scholar
2. Chao, A, Pan, HY, Chiang, SC. The Petersen–Lincoln Estimator and its extension to estimate the size of a shared population. Biom J: J Math Methods Biosci 2008;50:957–70. https://doi.org/10.1002/bimj.200810482.Search in Google Scholar PubMed
3. Fienberg, SE. The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 1972;59:591–603. https://doi.org/10.2307/2334810.Search in Google Scholar
4. Cormack, RM. Log-linear models for capture-recapture. Biometrics 1989;45:395–413. https://doi.org/10.2307/2531485.Search in Google Scholar
5. SAS. SAS Institute Inc. 2013. SAS/STAT® 13.1 user’s guide. Cary North Caroline, USA: SAS; 2013.Search in Google Scholar
6. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: The R-Foundation; 2023. Available from: https://www.R-project.org/.Search in Google Scholar
7. Hook, EB, Regal, RR. Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995;17:243–64. https://doi.org/10.1093/oxfordjournals.epirev.a036192.Search in Google Scholar PubMed
8. Huggins, RM. On the statistical analysis of capture experiments. Biometrika 1989;76:133–40. https://doi.org/10.1093/biomet/76.1.133.Search in Google Scholar
9. Alho, JM. Logistic regression in capture-recapture models. Biometrics 1990;46:623–35. https://doi.org/10.2307/2532083.Search in Google Scholar
10. Zwane, E, van der Heijden, P. Population estimation using the multiple system estimator in the presence of continuous covariates. Stat Model Int J 2005;5:39–52. https://doi.org/10.1191/1471082x05st086oa.Search in Google Scholar
11. Akaike, H. A new look at the statistical model identification. IEEE Trans Automat Control 1974;19:716–23. https://doi.org/10.1109/tac.1974.1100705.Search in Google Scholar
12. Héraud-Bousquet, V, Lot, F, Esvan, M, Cazein, F, Laurent, C, Warszawski, J, et al.. A three-source capture-recapture estimate of the number of new HIV diagnoses in children in France from 2003–2006 with multiple imputation of a variable of heterogeneous catchability. BMC Infect Dis 2012;12:1–9. https://doi.org/10.1186/1471-2334-12-251.Search in Google Scholar PubMed PubMed Central
13. Hook, EB, Regal, RR. Validity of methods for model selection, weighting for model uncertainty, and small sample adjustment in capture-recapture estimation. Am J Epidemiol 1997;145:1138–44. https://doi.org/10.1093/oxfordjournals.aje.a009077.Search in Google Scholar PubMed
14. Schwarz, G. Estimating the dimension of a model. Ann Stat 1978;6:461–4. https://doi.org/10.1214/aos/1176344136.Search in Google Scholar
15. Barocas, JA, White, LF, Wang, J, Walley, AY, LaRochelle, MR, Bernson, D, et al.. Estimated prevalence of opioid use disorder in Massachusetts, 2011–2015: a capture–recapture analysis. Am J Public Health 2018;108:1675–81. https://doi.org/10.2105/ajph.2018.304673.Search in Google Scholar PubMed PubMed Central
16. Poorolajal, J, Mohammadi, Y, Farzinara, F. Using the capture-recapture method to estimate the human immunodeficiency virus-positive population. Epidemiol Health 2017;39:e2017042. https://doi.org/10.4178/epih.e2017042.Search in Google Scholar PubMed PubMed Central
17. Zhang, B, Small, DS. Number of healthcare workers who have died of COVID-19. Epidemiology 2020;31:e46. https://doi.org/10.1097/ede.0000000000001229.Search in Google Scholar PubMed
18. Ramos, PL, Sousa, I, Santana, R, Morgan, WH, Gordon, K, Crewe, J, et al.. A review of capture-recapture methods and its possibilities in ophthalmology and vision sciences. Ophthalmic Epidemiol 2020;27:310–24. https://doi.org/10.1080/09286586.2020.1749286.Search in Google Scholar PubMed
19. Jones, HE, Hickman, M, Welton, NJ, De Angelis, D, Harris, RJ, Ades, AE. Recapture or precapture? Fallibility of standard capture-recapture methods in the presence of referrals between sources. Am J Epidemiol 2014;179:1383–93. https://doi.org/10.1093/aje/kwu056.Search in Google Scholar PubMed PubMed Central
20. Abeni, DD, Brancato, G, Perucci, CA. Capture-recapture to estimate the size of the population with human immunodeficiency virus type 1 infection. Epidemiology 1994;5:410–14. https://doi.org/10.1097/00001648-199407000-00006.Search in Google Scholar PubMed
21. Darroch, JN. The multiple-recapture census: I. Estimation of a closed population. Biometrika 1958;45:343–59. https://doi.org/10.2307/2333183.Search in Google Scholar
22. Chen, J. Sensitivity and uncertainty analysis for two-stream capture-recapture in epidemiological surveillance [Master of Science in Public Health thesis]. Atlanta, GA: Department of Biostatistics and Bioinformatics, The Rollins School of Public Health, Emory University; 2020.Search in Google Scholar
23. Zhang, Y, Chen, J, Ge, L, Williamson, JM, Waller, LA, Lyles, RH. Sensitivity and uncertainty analysis for two-stream capture–recapture methods in disease surveillance. Epidemiology 2023;34:601–10. https://doi.org/10.1097/ede.0000000000001614.Search in Google Scholar
24. Hook, EB, Regal, RR. Accuracy of alternative approaches to capture-recapture estimates of disease frequency: internal validity analysis of data from five sources. Am J Epidemiol 2000;152:771–9. https://doi.org/10.1093/aje/152.8.771.Search in Google Scholar PubMed
25. Cormack, RM, Jupp, PE. Inference for Poisson and multinomial models for capture-recapture experiments. Biometrika 1991;78:911–16. https://doi.org/10.1093/biomet/78.4.911.Search in Google Scholar
26. Coull, BA, Agresti, A. The use of mixed logit models to reflect heterogeneity in capture‐recapture studies. Biometrics 1999;55:294–301. https://doi.org/10.1111/j.0006-341x.1999.00294.x.Search in Google Scholar PubMed
27. Lum, K, Ball, P. Estimating undocumented homicides with two lists and list dependence. Human Rights Data Analysis Group 2015. Available from: https://hrdag.org/wp-content/uploads/2015/07/2015-hrdag-estimating-undoc-homicides.pdf.Search in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/em-2023-0019).
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Research Articles
- Development and application of an evidence-based directed acyclic graph to evaluate the associations between metal mixtures and cardiometabolic outcomes
- Addressing substantial covariate imbalance with propensity score stratification and balancing weights: connections and recommendations
- Tutorial
- On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance
Articles in the same Issue
- Research Articles
- Development and application of an evidence-based directed acyclic graph to evaluate the associations between metal mixtures and cardiometabolic outcomes
- Addressing substantial covariate imbalance with propensity score stratification and balancing weights: connections and recommendations
- Tutorial
- On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance