Abstract
Objectives
This study investigates the use of multilevel models within capture-recapture experiments, a common procedure to estimate the size of an elusive population, accounting for random effects to accurately calculate the probabilities of being observed.
Methods
We review single data-source capture-recapture estimators. We provide a general framework to account for random effects to accurately calculate the probabilities of being observed and apply the method to estimate the number of powder cocaine, crack cocaine, heroin users and people who inject drugs in France in 2019. Using two-level, random-intercepts logistic regression models that account for both individual and center factors, we conduct the calculations once with fixed effects only, and once with predicted center-level random effects.
Results
There are substantial differences between estimates based on fixed-effects only predictions and fixed and random effects combined, reflecting the influence of the treatment center-level. The application of multilevel modelling in capture-recapture allows the researcher to include individual and contextual information, and provide estimates at a large geographical scale.
Conclusions
Multilevel capture-recapture modelling offers a credible alternative to estimate the size of elusive populations in large geographical settings. The influence of higher-level clusters in capture-recapture studies using multilevel modelling should be explicitly considered in future studies.
Acknowledgments
The authors wish to thank Christophe Palle and Léo Bouthier (OFDT) for updating the data and Thomas Seyler (EUDA) for providing the opportunity to expose a first draft of our research during the PDU Expert Meeting held in Lisbon, May 2019.
-
Research ethics: The survey was approved by an internal steering committee which acts as the equivalent of an Institutional Review Board, and by the National Data Protection Authority (CNIL, authorization #04-1059).
-
Informed consent: Informed consent was obtained from all individuals included in this study, or their legal guardians or wards.
-
Author contributions: EJ: conceptualization, data curation and first draft of the manuscript. MV: conceptualization, methodological assessment and major contribution in writing the final version of the manuscript. All authors read and approved the final manuscript.
-
Use of Large Language Models, AI and Machine Learning Tools: None to declare.
-
Conflict of interest: None to declare.
-
Research funding: This study was supported by the French Monitoring Centre for Drugs and Drug Addictions (OFDT), which provided financial support for conducting the survey and writing this paper. The authors received no financial support for the research, authorship, and/or publication of this article.
-
Data availability: The datasets generated and/or analysed during the current study are not publicly available. The data contain sensitive information which allows the identification of individuals. It is therefore protected, and access can only be granted with special permission.
References
1. van der Heijden, PGM, Cruts, G, Cruyff, M. Methods for population size estimation of problem drug users using a single registration. Int J Drug Policy 2013;24:614–8. https://doi.org/10.1016/j.drugpo.2013.04.002.Search in Google Scholar PubMed
2. Bishop, Y, Fienberg, S, Holland, P. Estimating the size of a closed population. Discrete multivariate analysis: theory and practice. Cambridge, MA: MIT Press; 1975:227–56 pp.Search in Google Scholar
3. Hook, EB, Regal, RR. Capture–recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995;17:243–64. https://doi.org/10.1093/oxfordjournals.epirev.a036192.Search in Google Scholar PubMed
4. Bloor, M, Leyland, A, Barnard, M, McKeganey, N. Estimating hidden populations: a new method of calculating the prevalence of drug-injecting and non-injecting female street prostitution. Br J Addict 1991;86:1477–83. https://doi.org/10.1111/j.1360-0443.1991.tb01733.x.Search in Google Scholar PubMed
5. Jones, HE, Welton, NJ, Ades, AE, Pierce, M, Davies, W, Coleman, B, et al.. Problem drug use prevalence estimation revisited: heterogeneity in capture-recapture and the role of external evidence. Addiction 2016;111:438–47. https://doi.org/10.1111/add.13222.Search in Google Scholar PubMed PubMed Central
6. Pérez, AO, Cruyff, MJLF, Benschop, A, Korf, DJ. Estimating the prevalence of crack dependence using capture-recapture with institutional and field data: a three-city study in the Netherlands. Subst Use Misuse 2013;48:173–80. https://doi.org/10.3109/10826084.2012.748073.Search in Google Scholar PubMed
7. Hope, VD, Hickman, M, Tilling, K. Capturing crack cocaine use: estimating the prevalence of crack cocaine use in London using capture-recapture with covariates. Addiction 2005;100:1701–8. https://doi.org/10.1111/j.1360-0443.2005.01244.x.Search in Google Scholar PubMed
8. Hay, G. Capture-recapture estimates of drug misuse in urban and non-urban settings in the north east of Scotland. Addiction 2000;95:1795–803. https://doi.org/10.1046/j.1360-0443.2000.951217959.x.Search in Google Scholar PubMed
9. Hickman, M, Cox, S, Harvey, J, Howes, S, Farrell, M, Frischer, M, et al.. Estimating the prevalence of problem drug use in inner London: a discussion of three capture-recapture studies. Addiction 1999;94:1653–62. https://doi.org/10.1046/j.1360-0443.1999.941116534.x.Search in Google Scholar PubMed
10. Platt, L, Hickman, M, Rhodes, T, Mikhailova, L, Karavashkin, V, Vlasov, A, et al.. The prevalence of injecting drug use in a Russian city: implications for harm reduction and coverage. Addiction 2004;99:1430–8. https://doi.org/10.1111/j.1360-0443.2004.00848.x.Search in Google Scholar PubMed
11. Kimber, J, Hickman, M, Degenhardt, L, Coulson, T, Van Beek, I. Estimating the size and dynamics of an injecting drug user population and implications for health service coverage: comparison of indirect prevalence estimation methods. Addiction 2008;103:1604–13. https://doi.org/10.1111/j.1360-0443.2008.02276.x.Search in Google Scholar PubMed
12. Kraus, L, Augustin, R, Frischer, M, Kümmler, P, Uhl, A, Wiessing, L. Estimating prevalence of problem drug use at national level in countries of the European Union and Norway. Addiction 2003;98:471–85. https://doi.org/10.1046/j.1360-0443.2003.00326.x.Search in Google Scholar PubMed
13. King, R, Bird, SM, Overstall, A, Hay, G, Hutchinson, SJ. Estimating prevalence of injecting drug users and associated heroin-related death rates in England by using regional data and incorporating prior information. J R Stat Soc A Stat 2014;177:209–36. https://doi.org/10.1111/rssa.12011.Search in Google Scholar
14. Rabe-Hesketh, S, Skrondal, A. Multilevel and longitudinal modeling using Stata, volume I: continuous responses, 3rd ed. College Station, TX: Stata Press; 2012.Search in Google Scholar
15. Raudenbush, SW, Bryk, AS. Hierarchical linear models: applications and data analysis methods. Thousand Oaks: Sage; 2002.Search in Google Scholar
16. Snijders, TAB, Bosker, RJ. Multilevel analysis: an introduction to basic and advanced multilevel modeling, 2nd ed. Thousand Oaks, CA: Sage; 2012.Search in Google Scholar
17. Fienberg, SE, Johnson, MS, Junker, BW. Classical multilevel and Bayesian approaches to population size estimation using multiple lists. J R Stat Soc Ser A 1999;162:383–405. https://doi.org/10.1111/1467-985x.00143.Search in Google Scholar
18. Papadatou, E, Pradel, R, Schaub, M, Dolch, D, Geiger, H, Ibañez, C, et al.. Comparing survival among species with imperfect detection using multilevel analysis of mark – recapture data: a case study on bats. Ecography 2012;35:153–61. https://doi.org/10.1111/j.1600-0587.2011.07084.x.Search in Google Scholar
19. Rose, JP, Wylie, GD, Casazza, ML, Halstead, BJ. Integrating growth and capture–mark–recapture models reveals size-dependent survival in an elusive species. Ecosphere 2018;9:e02384. https://doi.org/10.1002/ecs2.2384.Search in Google Scholar
20. Schofield, MR, Barker, RJ. Hierarchical modeling of abundance in closed population capture–recapture models under heterogeneity. Environ Ecol Stat 2014;21:435–51. https://doi.org/10.1007/s10651-013-0262-3.Search in Google Scholar
21. Bao, L, Raftery, AE, Reddy, A. Estimating the sizes of populations at risk of HIV infection from multiple data sources using a Bayesian hierarchical model. Stat Interface 2015;8:125–36. https://doi.org/10.4310/sii.2015.v8.n2.a1.Search in Google Scholar
22. Feldman, JM, Gruskin, S, Coull, BA, Krieger, N. Quantifying underreporting of law-enforcement-related deaths in United States vital statistics and news-media-based data sources: a capture-recapture analysis. PLoS Med 2017;14:e1002399. https://doi.org/10.1371/journal.pmed.1002399.Search in Google Scholar PubMed PubMed Central
23. Janssen, E. Estimating the number of heroin users in metropolitan France using treatment centres data. An exploratory analysis. Subst Use Misuse 2017;52:683–7. https://doi.org/10.1080/10826084.2016.1245340.Search in Google Scholar PubMed
24. Janssen, E. Estimating the number of people who inject drugs: a proposal to provide figures nationwide and its application to France. J Publ Health 2018;40:e180–8. https://doi.org/10.1093/pubmed/fdx059.Search in Google Scholar PubMed
25. Bell, A, Fairbrother, M, Jones, K. Fixed and random effects models: making an informed choice. Qual Quant 2019;53:1051–74. https://doi.org/10.1007/s11135-018-0802-x.Search in Google Scholar
26. Ni, H, Groenwold, RHH, Nielen, M, Klugkist, I. Prediction models for clustered data with informative priors for the random effects: a simulation study. BMC Med Res Methodol 2018;18:83. https://doi.org/10.1186/s12874-018-0543-5.Search in Google Scholar PubMed PubMed Central
27. Sohn, SY, Kim, HS. Random effects logistic regression model for default prediction of technology credit guarantee fund. Eur J Oper Res 2007;183:472–8. https://doi.org/10.1016/j.ejor.2006.10.006.Search in Google Scholar
28. Uggen, C, Vuolo, M, Lageson, S, Ruhland, E, Whitham, HK. The edge of stigma: an experimental audit of the effects of low-level criminal records on employment. Criminology 2014;52:627–54. https://doi.org/10.1111/1745-9125.12051.Search in Google Scholar
29. Pavlou, M, Ambler, G, Seaman, S, Omar, RZ. A note on obtaining correct marginal predictions from a random intercepts model for binary outcomes. BMC Med Res Methodol 2015;15:59. https://doi.org/10.1186/s12874-015-0046-6.Search in Google Scholar PubMed PubMed Central
30. Bunge, J, Fitzpatrick, M. Estimating the number of species: a review. J Am Stat Assoc 1993;88:364–73. https://doi.org/10.1080/01621459.1993.10594330.Search in Google Scholar
31. Wilson, RM, Collins, MF. Capture-recapture estimation with samples of size one using frequency data. Biometrika 1992;79:543–53. https://doi.org/10.1093/biomet/79.3.543.Search in Google Scholar
32. Böhning, D, van der Heijden, PMG, Bunge, J. Capture-recapture methods for the social and medical sciences. Boca Raton: Chapman & Hall/CRC; 2017.10.4324/9781315151939Search in Google Scholar
33. Böhning, D, Schön, D. Nonparametric maximum likelihood estimation of population size based on the counting distribution. J R Stat Soc Ser C 2005;54:721–37. https://doi.org/10.1111/j.1467-9876.2005.05324.x.Search in Google Scholar
34. van der Heijden, PGM, Bustami, R, Cruyff, MJLF, Engbersen, G, Van Houwelingen, HC. Point and interval estimation of the population size using the truncated Poisson regression model. Stat Model 2003;3:305–22. https://doi.org/10.1191/1471082x03st057oa.Search in Google Scholar
35. Cruyff, MJLF, Van der Heijden, PGM. Point and interval estimation of the population size using a zero-truncated negative binomial regression model. Biom J 2008;50:1035–50. https://doi.org/10.1002/bimj.200810455.Search in Google Scholar PubMed
36. Böhning, D, van der Heijden, PGM. A covariate adjustment for zero-truncated approaches to estimating the size of hidden and elusive populations. Ann Appl Stat 2009;3:595–610. https://doi.org/10.1214/08-aoas214.Search in Google Scholar
37. Böhning, D, Vidal-Diez, A, Lerdsuwansri, R, Viwatwongkasem, C, Arnold, M. A generalization of Chao’s estimator for covariate information. Biometrics 2013;69:1033–42. https://doi.org/10.1111/biom.12082.Search in Google Scholar PubMed
38. Janssen, E, Cadet-Taïrou, A, Gérome, C, Vuolo, M. Estimating the size of crack cocaine users in France: methods for an elusive population with high heterogeneity. Int J Drug Policy 2020;76:e102637. https://doi.org/10.1016/j.drugpo.2019.102637.Search in Google Scholar PubMed
39. Janssen, E, Vuolo, M, Gérome, C, Cadet-Taïrou, A. Mixed methods to assess the use of rare illicit psychoactive substances: a case study. Epidemiol Methods 2021;10. https://doi.org/10.1515/em-2020-0031.Search in Google Scholar
40. Gelman, A, Hill, J. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press; 2006.10.1017/CBO9780511790942Search in Google Scholar
41. Beck, N, Katz, JN. Random coefficient models for time-series – cross-section data: Monte Carlo experiments. Polit Anal 2007;15:182–95. https://doi.org/10.1093/pan/mpl001.Search in Google Scholar
42. Merlo, J, Chaix, B, Ohlsson, H, Beckman, A, Johnell, K, Hjerpe, P, et al.. A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena. J Epidemiol Community Health 2006;60:290–7. https://doi.org/10.1136/jech.2004.029454.Search in Google Scholar PubMed PubMed Central
43. Johnson, PC. Extension of Nakagawa & Schielzeth’s R(2)(GLMM) to random slopes models. Methods Ecol Evol 2014;5:944–6. https://doi.org/10.1111/2041-210X.12225.Search in Google Scholar PubMed PubMed Central
44. Goldstein, H, Browne, W, Rasbash, J. Partitioning variation in multilevel models. Underst Stat 2002;1:223–31. https://doi.org/10.1207/s15328031us0104_02.Search in Google Scholar
45. Franceschini, S, Tsai, C, Marani, M. Point estimate methods based on Taylor series expansion – the perturbance moments method – a more coherent derivation of the second order statistical moment. Appl Math Model 2012;36:5445–54. https://doi.org/10.1016/j.apm.2011.11.079.Search in Google Scholar
46. Simeone, R, Nottingham, W, Holland, L. Estimating the size of a heroin using population: an examination of the use of treatment admissions data. Int J Addict 1993;28:107–28. https://doi.org/10.3109/10826089309039618.Search in Google Scholar PubMed
47. Hox, J. Multilevel analysis. Techniques and applications, 2nd ed. London: Lawrence Erlbaum; 2010.Search in Google Scholar
48. Wright, D. Extra-binomial variation in multilevel logistic models with sparse structure. Br J Math Stat Psychol 2007;50:21–9. https://doi.org/10.1111/j.2044-8317.1997.tb01099.x.Search in Google Scholar
49. Royle, JA, Link, WA. Random effects and shrinkage estimation in capture-recapture models. J Appl Stat 2002;29:329–51. https://doi.org/10.1080/02664760120108746.Search in Google Scholar
50. Degenhardt, L, Glantz, M, Evans-Lacko, S, Sadikova, E, Sampson, N, Thornicroft, G, et al.. Estimating treatment coverage for people with substance use disorders: an analysis of data from the world mental health surveys. World Psychiatry 2017;16:299–307. https://doi.org/10.1002/wps.20457.Search in Google Scholar PubMed PubMed Central
51. Yang, LH, Wong, LY, Grivel, MM, Hasin, DS. Stigma and substance use disorders: an international phenomenon. Curr Opin Psychiatr 2017;30:378–88. https://doi.org/10.1097/yco.0000000000000351.Search in Google Scholar PubMed PubMed Central
52. Lahaie, E, Janssen, E, Cadet-Taïrou, A. Determinants of heroin retail prices in metropolitan France: discounts, purity and local markets. Drug Alcohol Rev 2015;35:597–604.10.1111/dar.12355Search in Google Scholar PubMed
53. Cicero, TJ, Ellis, MS, Surratt, HL, Kurtz, SP. The changing face of heroin use in the United States. A retrospective analysis of the past 50 years. JAMA Psychiatry 2014;71:821–6. https://doi.org/10.1001/jamapsychiatry.2014.366.Search in Google Scholar PubMed
54. Vuolo, M, Janssen, E, Flores Laffont, I. Using crack or smoking cocaine, that is the question: the association of sociodemographic factors with self-labeling choices in France. Deviant Behav 2023;44:920–34. https://doi.org/10.1080/01639625.2022.2111671.Search in Google Scholar
55. Bryan, ML, Jenkins, SP. Regression analysis of country effects using multilevel data: a cautionary tale. Eur Socio Rev 2016;32:3–22. https://doi.org/10.1093/esr/jcv059.Search in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/em-2025-0011).
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Causal mediation analysis for difference-in-difference design and panel data
- Research Articles
- Extending the scope of the capture-recapture experiment: a multilevel approach with random effects to provide reliable estimates at national level
- Discrete-time compartmental models with partially observed data: a comparison among frequentist and Bayesian approaches for addressing likelihood intractability
- Sensitivity analysis for unmeasured confounding for a joint effect with an application to survey data
- Investigating the association between school substance programs and student substance use: accounting for informative cluster size
- The quantiles of extreme differences matrix for evaluating discriminant validity
- Finite-sample improved confidence intervals based on the estimating equation theory for the modified Poisson and least-squares regressions
- What if dependent causes of death were independent?
- Bot invasion: protecting the integrity of online surveys against spamming
- A study of a stochastic model and extinction phenomenon of meningitis epidemic
- Understanding the impact of media and latency in information response on the disease propagation: a mathematical model and analysis
- Time-varying reproductive number estimation for practical application in structured populations
- Perspective
- Should we still use pointwise confidence intervals for the Kaplan–Meier estimator?
- Leveraging data from multiple sources in epidemiologic research: transportability, dynamic borrowing, external controls, and beyond
- Regression calibration for time-to-event outcomes: mitigating bias due to measurement error in real-world endpoints
Articles in the same Issue
- Causal mediation analysis for difference-in-difference design and panel data
- Research Articles
- Extending the scope of the capture-recapture experiment: a multilevel approach with random effects to provide reliable estimates at national level
- Discrete-time compartmental models with partially observed data: a comparison among frequentist and Bayesian approaches for addressing likelihood intractability
- Sensitivity analysis for unmeasured confounding for a joint effect with an application to survey data
- Investigating the association between school substance programs and student substance use: accounting for informative cluster size
- The quantiles of extreme differences matrix for evaluating discriminant validity
- Finite-sample improved confidence intervals based on the estimating equation theory for the modified Poisson and least-squares regressions
- What if dependent causes of death were independent?
- Bot invasion: protecting the integrity of online surveys against spamming
- A study of a stochastic model and extinction phenomenon of meningitis epidemic
- Understanding the impact of media and latency in information response on the disease propagation: a mathematical model and analysis
- Time-varying reproductive number estimation for practical application in structured populations
- Perspective
- Should we still use pointwise confidence intervals for the Kaplan–Meier estimator?
- Leveraging data from multiple sources in epidemiologic research: transportability, dynamic borrowing, external controls, and beyond
- Regression calibration for time-to-event outcomes: mitigating bias due to measurement error in real-world endpoints