On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance

Yuzi Zhang; Lin Ge; Lance A. Waller; Robert H. Lyles

doi:10.1515/em-2023-0019

Abstract

In epidemiological studies, the capture-recapture (CRC) method is a powerful tool that can be used to estimate the number of diseased cases or potentially disease prevalence based on data from overlapping surveillance systems. Estimators derived from log-linear models are widely applied by epidemiologists when analyzing CRC data. The popularity of the log-linear model framework is largely associated with its accessibility and the fact that interaction terms can allow for certain types of dependency among data streams. In this work, we shed new light on significant pitfalls associated with the log-linear model framework in the context of CRC using real data examples and simulation studies. First, we demonstrate that the log-linear model paradigm is highly exclusionary. That is, it can exclude, by design, many possible estimates that are potentially consistent with the observed data. Second, we clarify the ways in which regularly used model selection metrics (e.g., information criteria) are fundamentally deceiving in the effort to select a “best” model in this setting. By focusing attention on these important cautionary points and on the fundamental untestable dependency assumption made when fitting a log-linear model to CRC data, we hope to improve the quality of and transparency associated with subsequent surveillance-based CRC estimates of case counts.

Keywords: capture-recapture methods; identifiability; log-linear models; model selection

Corresponding author: Yuzi Zhang, Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Atlanta, GA 30322, USA, E-mail: yuzi.zhang@emory.edu

Funding source: National Center for Advancing Translational Sciences of the National Institutes of Health

Award Identifier / Grant number: UL1TR002378

Funding source: National Institutes of Health

Award Identifier / Grant number: 1R01CA266574-01A1

Funding source: National Institute of Health

Award Identifier / Grant number: P30AI050409

Acknowledgments

We thank Drs. Howard Chang and Sarita Shah for motivation and helpful discussions.

Ethical approval: This study uses publicly available data and information and therefore ethical approval is not required.
Informed consent: Not applicable.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: Authors state no conflict of interest.
Research funding: Partial support was provided by the National Institute of Health-funded Emory Center for AIDS Research (P30AI050409; Del Rio PI), the National Center for Advancing Translational Sciences of the National Institutes of Health (UL1TR002378; Taylor PI), and by the National Institutes of Health (1R01CA266574-01A1; Lyles/Waller MPIs). The content is the sole responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Data availability: All data are incorporated into the article.

References

1. Lyles, RH, Wilkinson, AL, Williamson, JM, Chen, J, Taylor, AW, Jambai, A, et al.. Alternative capture-recapture point and interval estimators based on two surveillance streams. In: Modern statistical methods for health research. New York, NY, USA: Springer; 2021:43–81 pp.10.1007/978-3-030-72437-5_3Search in Google Scholar

2. Chao, A, Pan, HY, Chiang, SC. The Petersen–Lincoln Estimator and its extension to estimate the size of a shared population. Biom J: J Math Methods Biosci 2008;50:957–70. https://doi.org/10.1002/bimj.200810482.Search in Google Scholar PubMed

3. Fienberg, SE. The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 1972;59:591–603. https://doi.org/10.2307/2334810.Search in Google Scholar

4. Cormack, RM. Log-linear models for capture-recapture. Biometrics 1989;45:395–413. https://doi.org/10.2307/2531485.Search in Google Scholar

5. SAS. SAS Institute Inc. 2013. SAS/STAT® 13.1 user’s guide. Cary North Caroline, USA: SAS; 2013.Search in Google Scholar

6. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: The R-Foundation; 2023. Available from: https://www.R-project.org/.Search in Google Scholar

7. Hook, EB, Regal, RR. Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995;17:243–64. https://doi.org/10.1093/oxfordjournals.epirev.a036192.Search in Google Scholar PubMed

8. Huggins, RM. On the statistical analysis of capture experiments. Biometrika 1989;76:133–40. https://doi.org/10.1093/biomet/76.1.133.Search in Google Scholar

9. Alho, JM. Logistic regression in capture-recapture models. Biometrics 1990;46:623–35. https://doi.org/10.2307/2532083.Search in Google Scholar

10. Zwane, E, van der Heijden, P. Population estimation using the multiple system estimator in the presence of continuous covariates. Stat Model Int J 2005;5:39–52. https://doi.org/10.1191/1471082x05st086oa.Search in Google Scholar

11. Akaike, H. A new look at the statistical model identification. IEEE Trans Automat Control 1974;19:716–23. https://doi.org/10.1109/tac.1974.1100705.Search in Google Scholar

12. Héraud-Bousquet, V, Lot, F, Esvan, M, Cazein, F, Laurent, C, Warszawski, J, et al.. A three-source capture-recapture estimate of the number of new HIV diagnoses in children in France from 2003–2006 with multiple imputation of a variable of heterogeneous catchability. BMC Infect Dis 2012;12:1–9. https://doi.org/10.1186/1471-2334-12-251.Search in Google Scholar PubMed PubMed Central

13. Hook, EB, Regal, RR. Validity of methods for model selection, weighting for model uncertainty, and small sample adjustment in capture-recapture estimation. Am J Epidemiol 1997;145:1138–44. https://doi.org/10.1093/oxfordjournals.aje.a009077.Search in Google Scholar PubMed

14. Schwarz, G. Estimating the dimension of a model. Ann Stat 1978;6:461–4. https://doi.org/10.1214/aos/1176344136.Search in Google Scholar

15. Barocas, JA, White, LF, Wang, J, Walley, AY, LaRochelle, MR, Bernson, D, et al.. Estimated prevalence of opioid use disorder in Massachusetts, 2011–2015: a capture–recapture analysis. Am J Public Health 2018;108:1675–81. https://doi.org/10.2105/ajph.2018.304673.Search in Google Scholar PubMed PubMed Central

16. Poorolajal, J, Mohammadi, Y, Farzinara, F. Using the capture-recapture method to estimate the human immunodeficiency virus-positive population. Epidemiol Health 2017;39:e2017042. https://doi.org/10.4178/epih.e2017042.Search in Google Scholar PubMed PubMed Central

17. Zhang, B, Small, DS. Number of healthcare workers who have died of COVID-19. Epidemiology 2020;31:e46. https://doi.org/10.1097/ede.0000000000001229.Search in Google Scholar PubMed

18. Ramos, PL, Sousa, I, Santana, R, Morgan, WH, Gordon, K, Crewe, J, et al.. A review of capture-recapture methods and its possibilities in ophthalmology and vision sciences. Ophthalmic Epidemiol 2020;27:310–24. https://doi.org/10.1080/09286586.2020.1749286.Search in Google Scholar PubMed

19. Jones, HE, Hickman, M, Welton, NJ, De Angelis, D, Harris, RJ, Ades, AE. Recapture or precapture? Fallibility of standard capture-recapture methods in the presence of referrals between sources. Am J Epidemiol 2014;179:1383–93. https://doi.org/10.1093/aje/kwu056.Search in Google Scholar PubMed PubMed Central

20. Abeni, DD, Brancato, G, Perucci, CA. Capture-recapture to estimate the size of the population with human immunodeficiency virus type 1 infection. Epidemiology 1994;5:410–14. https://doi.org/10.1097/00001648-199407000-00006.Search in Google Scholar PubMed

21. Darroch, JN. The multiple-recapture census: I. Estimation of a closed population. Biometrika 1958;45:343–59. https://doi.org/10.2307/2333183.Search in Google Scholar

22. Chen, J. Sensitivity and uncertainty analysis for two-stream capture-recapture in epidemiological surveillance [Master of Science in Public Health thesis]. Atlanta, GA: Department of Biostatistics and Bioinformatics, The Rollins School of Public Health, Emory University; 2020.Search in Google Scholar

23. Zhang, Y, Chen, J, Ge, L, Williamson, JM, Waller, LA, Lyles, RH. Sensitivity and uncertainty analysis for two-stream capture–recapture methods in disease surveillance. Epidemiology 2023;34:601–10. https://doi.org/10.1097/ede.0000000000001614.Search in Google Scholar

24. Hook, EB, Regal, RR. Accuracy of alternative approaches to capture-recapture estimates of disease frequency: internal validity analysis of data from five sources. Am J Epidemiol 2000;152:771–9. https://doi.org/10.1093/aje/152.8.771.Search in Google Scholar PubMed

25. Cormack, RM, Jupp, PE. Inference for Poisson and multinomial models for capture-recapture experiments. Biometrika 1991;78:911–16. https://doi.org/10.1093/biomet/78.4.911.Search in Google Scholar

26. Coull, BA, Agresti, A. The use of mixed logit models to reflect heterogeneity in capture‐recapture studies. Biometrics 1999;55:294–301. https://doi.org/10.1111/j.0006-341x.1999.00294.x.Search in Google Scholar PubMed

27. Lum, K, Ball, P. Estimating undocumented homicides with two lists and list dependence. Human Rights Data Analysis Group 2015. Available from: https://hrdag.org/wp-content/uploads/2015/07/2015-hrdag-estimating-undoc-homicides.pdf.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/em-2023-0019).

Received: 2023-06-18

Accepted: 2023-09-05

Published Online: 2023-10-20

You are currently not able to access this content.

Supplementary Material Details